Documentation Index
Fetch the complete documentation index at: https://docs.uselumina.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
Common questions about Lumina observability platform.
General
What is Lumina?
Lumina is an open-source, OpenTelemetry-native observability platform for AI applications. It tracks costs, latency, and quality across LLM calls with automatic instrumentation and real-time alerting.
Is Lumina free?
Yes, the self-hosted version is free forever with:
- 50,000 traces per day
- 7-day retention
- All features included
For unlimited traces and managed hosting, contact us for cloud pricing.
What makes Lumina different?
- OpenTelemetry-native — Not retrofitted, works with existing observability stacks
- Self-hostable — Full data control and sovereignty
- Replay testing — Test changes against real production data
- Cost + quality correlation — Connect spending with quality metrics
Do I need API keys?
API keys are only required for the replay feature to re-execute LLM calls. Core functionality (tracing, cost tracking, alerts) works without API keys.
Installation
What are the system requirements?
Minimum:
- 4GB RAM
- 10GB disk space
- Docker 20.10+ or Kubernetes 1.20+
Recommended:
- 8GB RAM
- 50GB disk space
- PostgreSQL 14+
- Redis 6+
- NATS 2.9+
Can I run Lumina in production?
Yes, Lumina is production-ready with:
- PostgreSQL backend (battle-tested)
- NATS JetStream (reliable message queue)
- Horizontal scaling support
- Real-time alerting (<500ms)
Handles 10M+ traces/day per cluster.
How do I upgrade Lumina?
Docker Compose:
cd infra/docker
docker compose pull
docker compose up -d
Kubernetes:
helm repo update
helm upgrade lumina lumina/lumina
Database migrations run automatically on startup.
Tracing
What is a trace?
A trace represents a complete operation in your application, from request to response. It consists of one or more spans representing individual steps.
Example:
Trace: user_query_abc123 (1.2s)
└── chat_completion (1.2s)
Model: claude-sonnet-4-5
Cost: $0.003
What is a span?
A span represents a single operation within a trace (e.g., an LLM call, database query, or API request).
What is multi-span tracing?
Multi-span tracing tracks complex workflows with multiple steps, showing parent-child relationships.
Example:
Trace: rag_pipeline (2.5s, $0.05)
├── retrieval (1.8s, $0.01)
│ ├── embedding (0.2s, $0.001)
│ └── vector_search (1.6s, $0.009)
└── synthesis (0.7s, $0.04)
See Multi-Span Tracing for details.
How do I trace custom operations?
Use lumina.trace() for any operation:
await lumina.trace('custom_operation', async (span) => {
span.setAttribute('custom_attr', 'value');
// Your code here
return result;
});
Cost Tracking
How does cost calculation work?
Lumina automatically calculates costs based on:
- Model name (e.g., “claude-sonnet-4-5”)
- Token counts (prompt + completion)
- Provider pricing (updated monthly)
Example:
Input: 120 tokens × $0.003/1k = $0.00036
Output: 80 tokens × $0.015/1k = $0.00120
Total: $0.00156
Which providers are supported?
| Provider | Models | Auto-Calculation |
|---|
| OpenAI | GPT-4, GPT-3.5, GPT-4 Turbo | ✓ |
| Anthropic | Claude 3 Opus, Sonnet, Haiku | ✓ |
| Anthropic | Claude 3.5 Sonnet | ✓ |
| Anthropic | Claude Sonnet 4.5 | ✓ |
For other providers, specify costs manually via metadata.
How accurate is cost tracking?
Cost calculations are based on official provider pricing. Accuracy depends on:
- Correct model name
- Accurate token counts from provider API
- Up-to-date pricing data
Lumina updates pricing monthly. For critical cost tracking, verify against provider invoices.
Quality Monitoring
How is quality measured?
Lumina uses hybrid quality detection:
Exact match: Hash-based comparison (instant)
Semantic similarity: AI-powered scoring (~100ms)
What is semantic similarity?
A score from 0.0 to 1.0 indicating how similar two responses are in meaning:
- 0.9-1.0: Nearly identical meaning
- 0.7-0.9: Similar meaning, different wording
- 0.5-0.7: Related but different
- 0.0-0.5: Completely different
Can I use custom quality metrics?
Yes, add custom scoring via attributes:
await lumina.traceLLM(
() => llm.generate(prompt),
{
metadata: {
custom_quality_score: calculateMyScore(response),
factual_accuracy: 0.95,
relevance_score: 0.88,
},
}
);
Replay Testing
What is replay testing?
Replay testing re-executes real production traces with new code to detect regressions before deployment.
Workflow:
- Capture baseline from production
- Modify prompts/code
- Replay against captured traces
- Compare outputs with semantic diff
Does replay cost money?
Yes, replay makes real LLM API calls. Costs are the same as production calls.
Example:
- Capture 100 traces
- Replay costs = 100 × average cost per call
- If average cost is 0.01,replaycosts1.00
Can I replay without API keys?
No, replay requires LLM provider API keys to re-execute calls. Use simulation mode for testing without costs:
await lumina.replayTraces({
replaySetId: 'replay_001',
mode: 'simulation', // No real API calls
});
Alerting
How does alerting work?
Lumina detects anomalies automatically by comparing current metrics to baselines:
Cost alerts: Spending exceeds baseline by threshold
Quality alerts: Similarity score drops below threshold
Alerts are sent via webhooks to Slack, PagerDuty, or custom endpoints.
How are baselines calculated?
Baselines are calculated from historical data:
Rolling average: Last 7 days
Update frequency: Every hour
Minimum data: 100 traces
Yes, set thresholds per service:
# Dashboard: Settings → Alerts → Configure
Cost spike threshold: +200%
Quality drop threshold: -20%
Window: 1 hour
Or via API:
curl -X POST http://api:8081/api/alerts/configure \
-d '{"service":"chat-api","costThreshold":2.0,"qualityThreshold":0.2}'
What is the ingestion throughput?
Per ingestion pod:
- 10,000 traces/second
- 500MB/second data throughput
Horizontally scalable:
- 10 pods = 100,000 traces/second
What is the query latency?
Query API performance:
- P50: <50ms
- P95: <100ms
- P99: <200ms
On 10M+ trace tables with proper indexes.
For ingestion:
- Batch traces client-side
- Enable compression
- Scale ingestion pods horizontally
For queries:
- Enable Redis caching
- Add database indexes
- Configure retention policies
See Performance Guide for details.
Security
Is Lumina secure?
Yes, with proper configuration:
Self-hosted:
- Deploy behind VPN or private network
- Enable SSL/TLS for database
- Rotate secrets regularly
Network security:
- Use Kubernetes NetworkPolicies
- Enable mTLS between services
- Configure firewall rules
See Security Guide for full checklist.
Does Lumina store sensitive data?
Traces may contain sensitive data in prompts and responses. Best practices:
-
Redact PII before sending:
const redacted = redactPII(userPrompt);
await lumina.traceLLM(
() => llm.generate(redacted),
{ prompt: redacted }
);
-
Configure short retention:
-
Implement data export/deletion for GDPR compliance
Can I disable authentication?
Yes, self-hosted Lumina runs without authentication by default for ease of use in controlled environments.
Enable authentication for internet-exposed deployments:
AUTH_REQUIRED=true
JWT_SECRET=<generate-strong-secret>
Integrations
Which LLM providers are supported?
Supported:
- OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
- Anthropic (Claude 3.x, Claude 3.5, Claude Sonnet 4.5)
Coming soon:
- Cohere
- Replicate
- Together AI
Can I integrate with LangChain?
Yes, wrap LangChain calls:
await lumina.traceLLM(
() => chain.call({ input: userQuery }),
{ name: 'langchain_call' }
);
See LangChain Integration for details.
Can I integrate with Vercel AI SDK?
Yes:
import { streamText } from 'ai';
await lumina.traceLLM(
() => streamText({ model, prompt }),
{ name: 'vercel_ai_stream', system: 'openai' }
);
See Vercel AI SDK Integration for details.
Deployment
Can I deploy to Kubernetes?
Yes, Lumina is designed for Kubernetes with:
- Helm charts
- StatefulSets for stateful services
- Horizontal Pod Autoscaling
- Service mesh compatibility
See Kubernetes Deployment for details.
Can I deploy to AWS/GCP/Azure?
Yes, deploy with:
- AWS: EKS + RDS PostgreSQL + ElastiCache Redis
- GCP: GKE + Cloud SQL + Memorystore
- Azure: AKS + Azure Database for PostgreSQL
Cloud-managed services recommended for databases.
Do I need Redis?
Redis is used for:
- Query result caching
- Rate limiting
- Session storage (if auth enabled)
Optional but recommended for production.
Troubleshooting
Traces not appearing?
Check 1: Verify endpoint
const lumina = initLumina({
endpoint: 'http://localhost:9411/v1/traces', // Correct
// NOT: http://localhost:8081 (query API)
});
Check 2: View ingestion logs
docker compose logs ingestion | grep ERROR
Check 3: Test with cURL
curl -X POST http://localhost:9411/v1/traces \
-H "Content-Type: application/json" \
-d '{"resourceSpans":[...]}'
Costs showing as $0?
Possible causes:
- Model name not recognized
- Missing token counts
- Provider not supported
Solution: Add manual cost via metadata:
{
metadata: {
cost_per_prompt_token: 0.000003,
cost_per_completion_token: 0.000015,
}
}
High memory usage?
Possible causes:
- Large trace volumes
- Insufficient PostgreSQL shared_buffers
- Redis cache too large
Solutions:
- Enable sampling for high volume
- Increase PostgreSQL memory
- Configure Redis maxmemory
See Performance Guide.
Comparison
How does Lumina compare to LangSmith?
| Feature | Lumina | LangSmith |
|---|
| Self-hosted | ✓ | ✗ |
| OpenTelemetry | Native | Adapter |
| Replay testing | ✓ | ✗ |
| Free tier | 50k/day | 5k/month |
| Real-time alerts | <500ms | Minutes |
Choose Lumina if: You need self-hosting, OpenTelemetry compatibility, or replay testing.
Choose LangSmith if: You use LangChain heavily and prefer cloud-only.
How does Lumina compare to Langfuse?
| Feature | Lumina | Langfuse |
|---|
| Self-hosted | ✓ (50k/day) | ✓ (unlimited) |
| OpenTelemetry | Native | No |
| Replay testing | ✓ | ✗ |
| Prompt management | Basic | Advanced |
| Real-time alerts | ✓ | ✓ |
Choose Lumina if: You need OpenTelemetry or replay testing.
Choose Langfuse if: You need advanced prompt management or unlimited free traces.
How does Lumina compare to Helicone?
| Feature | Lumina | Helicone |
|---|
| Self-hosted | ✓ | ✗ |
| Multi-span tracing | ✓ | ✗ |
| Replay testing | ✓ | ✗ |
| Gateway approach | ✗ | ✓ |
| Free tier | 50k/day | 100k requests/month |
Choose Lumina if: You need multi-span tracing or replay testing.
Choose Helicone if: You prefer gateway-based instrumentation.
Getting Help
Where can I get support?
Community:
Documentation:
Commercial:
How do I report a bug?
- Check existing issues
- Create new issue with:
- Lumina version
- Steps to reproduce
- Expected vs actual behavior
- Logs (if applicable)
How do I request a feature?
- Check discussions
- Create feature request with:
- Use case description
- Proposed solution
- Alternative solutions considered
Can I contribute?
Yes! See Contributing Guide.
Good first issues:
- Documentation improvements
- Bug fixes
- Integration guides
- Example applications