General
What is Lumina?
Lumina is an open-source, OpenTelemetry-native observability platform for AI applications. It tracks costs, latency, and quality across LLM calls with automatic instrumentation and real-time alerting.Is Lumina free?
Yes, the self-hosted version is free forever with:- 50,000 traces per day
- 7-day retention
- All features included
What makes Lumina different?
- OpenTelemetry-native — Not retrofitted, works with existing observability stacks
- Self-hostable — Full data control and sovereignty
- Replay testing — Test changes against real production data
- Cost + quality correlation — Connect spending with quality metrics
Do I need API keys?
API keys are only required for the replay feature to re-execute LLM calls. Core functionality (tracing, cost tracking, alerts) works without API keys.Installation
What are the system requirements?
Minimum:- 4GB RAM
- 10GB disk space
- Docker 20.10+ or Kubernetes 1.20+
- 8GB RAM
- 50GB disk space
- PostgreSQL 14+
- Redis 6+
- NATS 2.9+
Can I run Lumina in production?
Yes, Lumina is production-ready with:- PostgreSQL backend (battle-tested)
- NATS JetStream (reliable message queue)
- Horizontal scaling support
- Real-time alerting (<500ms)
How do I upgrade Lumina?
Docker Compose:Tracing
What is a trace?
A trace represents a complete operation in your application, from request to response. It consists of one or more spans representing individual steps. Example:What is a span?
A span represents a single operation within a trace (e.g., an LLM call, database query, or API request).What is multi-span tracing?
Multi-span tracing tracks complex workflows with multiple steps, showing parent-child relationships. Example:How do I trace custom operations?
Uselumina.trace() for any operation:
Cost Tracking
How does cost calculation work?
Lumina automatically calculates costs based on:- Model name (e.g., “claude-sonnet-4-5”)
- Token counts (prompt + completion)
- Provider pricing (updated monthly)
Which providers are supported?
| Provider | Models | Auto-Calculation |
|---|---|---|
| OpenAI | GPT-4, GPT-3.5, GPT-4 Turbo | ✓ |
| Anthropic | Claude 3 Opus, Sonnet, Haiku | ✓ |
| Anthropic | Claude 3.5 Sonnet | ✓ |
| Anthropic | Claude Sonnet 4.5 | ✓ |
How accurate is cost tracking?
Cost calculations are based on official provider pricing. Accuracy depends on:- Correct model name
- Accurate token counts from provider API
- Up-to-date pricing data
Quality Monitoring
How is quality measured?
Lumina uses hybrid quality detection: Exact match: Hash-based comparison (instant) Semantic similarity: AI-powered scoring (~100ms)What is semantic similarity?
A score from 0.0 to 1.0 indicating how similar two responses are in meaning:- 0.9-1.0: Nearly identical meaning
- 0.7-0.9: Similar meaning, different wording
- 0.5-0.7: Related but different
- 0.0-0.5: Completely different
Can I use custom quality metrics?
Yes, add custom scoring via attributes:Replay Testing
What is replay testing?
Replay testing re-executes real production traces with new code to detect regressions before deployment. Workflow:- Capture baseline from production
- Modify prompts/code
- Replay against captured traces
- Compare outputs with semantic diff
Does replay cost money?
Yes, replay makes real LLM API calls. Costs are the same as production calls. Example:- Capture 100 traces
- Replay costs = 100 × average cost per call
- If average cost is 1.00
Can I replay without API keys?
No, replay requires LLM provider API keys to re-execute calls. Use simulation mode for testing without costs:Alerting
How does alerting work?
Lumina detects anomalies automatically by comparing current metrics to baselines: Cost alerts: Spending exceeds baseline by threshold Quality alerts: Similarity score drops below threshold Alerts are sent via webhooks to Slack, PagerDuty, or custom endpoints.How are baselines calculated?
Baselines are calculated from historical data: Rolling average: Last 7 days Update frequency: Every hour Minimum data: 100 tracesCan I configure alert thresholds?
Yes, set thresholds per service:Performance
What is the ingestion throughput?
Per ingestion pod:- 10,000 traces/second
- 500MB/second data throughput
- 10 pods = 100,000 traces/second
What is the query latency?
Query API performance:- P50: <50ms
- P95: <100ms
- P99: <200ms
How do I optimize performance?
For ingestion:- Batch traces client-side
- Enable compression
- Scale ingestion pods horizontally
- Enable Redis caching
- Add database indexes
- Configure retention policies
Security
Is Lumina secure?
Yes, with proper configuration: Self-hosted:- Deploy behind VPN or private network
- Enable SSL/TLS for database
- Rotate secrets regularly
- Use Kubernetes NetworkPolicies
- Enable mTLS between services
- Configure firewall rules
Does Lumina store sensitive data?
Traces may contain sensitive data in prompts and responses. Best practices:-
Redact PII before sending:
-
Configure short retention:
- Implement data export/deletion for GDPR compliance
Can I disable authentication?
Yes, self-hosted Lumina runs without authentication by default for ease of use in controlled environments. Enable authentication for internet-exposed deployments:Integrations
Which LLM providers are supported?
Supported:- OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
- Anthropic (Claude 3.x, Claude 3.5, Claude Sonnet 4.5)
- Cohere
- Replicate
- Together AI
Can I integrate with LangChain?
Yes, wrap LangChain calls:Can I integrate with Vercel AI SDK?
Yes:Deployment
Can I deploy to Kubernetes?
Yes, Lumina is designed for Kubernetes with:- Helm charts
- StatefulSets for stateful services
- Horizontal Pod Autoscaling
- Service mesh compatibility
Can I deploy to AWS/GCP/Azure?
Yes, deploy with:- AWS: EKS + RDS PostgreSQL + ElastiCache Redis
- GCP: GKE + Cloud SQL + Memorystore
- Azure: AKS + Azure Database for PostgreSQL
Do I need Redis?
Redis is used for:- Query result caching
- Rate limiting
- Session storage (if auth enabled)
Troubleshooting
Traces not appearing?
Check 1: Verify endpointCosts showing as $0?
Possible causes:- Model name not recognized
- Missing token counts
- Provider not supported
High memory usage?
Possible causes:- Large trace volumes
- Insufficient PostgreSQL shared_buffers
- Redis cache too large
- Enable sampling for high volume
- Increase PostgreSQL memory
- Configure Redis maxmemory
Comparison
How does Lumina compare to LangSmith?
| Feature | Lumina | LangSmith |
|---|---|---|
| Self-hosted | ✓ | ✗ |
| OpenTelemetry | Native | Adapter |
| Replay testing | ✓ | ✗ |
| Free tier | 50k/day | 5k/month |
| Real-time alerts | <500ms | Minutes |
How does Lumina compare to Langfuse?
| Feature | Lumina | Langfuse |
|---|---|---|
| Self-hosted | ✓ (50k/day) | ✓ (unlimited) |
| OpenTelemetry | Native | No |
| Replay testing | ✓ | ✗ |
| Prompt management | Basic | Advanced |
| Real-time alerts | ✓ | ✓ |
How does Lumina compare to Helicone?
| Feature | Lumina | Helicone |
|---|---|---|
| Self-hosted | ✓ | ✗ |
| Multi-span tracing | ✓ | ✗ |
| Replay testing | ✓ | ✗ |
| Gateway approach | ✗ | ✓ |
| Free tier | 50k/day | 100k requests/month |
Getting Help
Where can I get support?
Community: Documentation: Commercial:- Email: support@uselumina.io
- Managed cloud customers: priority support
How do I report a bug?
- Check existing issues
- Create new issue with:
- Lumina version
- Steps to reproduce
- Expected vs actual behavior
- Logs (if applicable)
How do I request a feature?
- Check discussions
- Create feature request with:
- Use case description
- Proposed solution
- Alternative solutions considered
Can I contribute?
Yes! See Contributing Guide. Good first issues:- Documentation improvements
- Bug fixes
- Integration guides
- Example applications