The Problem
Traditional observability tools fall short for AI applications. APM platforms like Datadog and New Relic treat LLM calls as opaque HTTP requests. They capture latency and status codes but miss critical AI-specific metrics:- Cost per request — LLM calls have variable costs based on token usage
- Quality degradation — Responses can become less accurate without errors
- Complex workflows — RAG pipelines and agents involve multiple steps
- Prompt debugging — Need to see full prompts and responses to debug
- Vendor lock-in — Proprietary formats and APIs
- Cloud-only — Cannot self-host with data control
- Expensive — Usage-based pricing scales unpredictably
- Incomplete — Missing critical features like replay testing
The Lumina Approach
OpenTelemetry-Native
Built on OpenTelemetry from day one, not retrofitted. Benefits:- Works with your existing observability stack
- Send traces to multiple backends simultaneously
- Industry-standard instrumentation
- No vendor lock-in
Self-Hostable
Deploy on your infrastructure with full data control. Benefits:- Complete data ownership
- Deploy in regulated environments
- No external data transfer
- Predictable costs
- Docker Compose (development)
- Kubernetes (production)
- Bare metal (custom setups)
Cost + Quality Correlation
The only platform that connects cost and quality metrics. Example query:Production-Grade Architecture
Reliable ingestion:- NATS JetStream for high-throughput message queue
- Handles 10M+ traces/day per cluster
- Automatic retry and backpressure
- PostgreSQL with optimized indexes
- Redis caching for frequent queries
- <100ms P95 latency on 10M+ trace tables
- <500ms from trace to alert
- Webhook integration (Slack, PagerDuty)
- Automatic baseline detection
Built for Backend Engineers
Unlike AI observability platforms built for data scientists, Lumina uses familiar observability patterns. Familiar concepts:- Traces and spans (not “runs” or “chains”)
- P95 latency (not “average duration”)
- PagerDuty webhooks (not email reports)
- SQL queries (not proprietary DSLs)
Feature Comparison
| Feature | Lumina | LangSmith | Langfuse | Helicone | Datadog |
|---|---|---|---|---|---|
| Self-Hosted | ✓ | ✗ | ✓ | ✗ | ✗ |
| OpenTelemetry | Native | Adapter | No | No | Yes |
| Multi-Span Tracing | ✓ | ✓ | ✓ | ✗ | ✓ |
| Cost Tracking | Automatic | Automatic | Manual | Automatic | Manual |
| Replay Testing | ✓ | ✗ | ✗ | ✗ | ✗ |
| Semantic Diff | ✓ | ✗ | ✗ | ✗ | ✗ |
| Real-Time Alerts | <500ms | Minutes | Minutes | N/A | Seconds |
| Free Tier | 50k traces/day | 5k traces/month | Unlimited | 100k requests/month | Limited |
Use Cases
Startup to Scale
Early Stage:- Self-host for free (50k traces/day)
- Track costs from day one
- Iterate quickly with replay testing
- Scale to millions of traces
- Configure advanced alerting
- Deploy multi-region
- Upgrade to managed cloud
- SSO integration
- SLA guarantees
Regulated Industries
Healthcare, Finance, Government:- Self-host for data sovereignty
- Audit trail for compliance
- On-premises deployment
Multi-Model Applications
Routing and Fallback:- Track costs across providers
- Compare model quality
- Optimize routing logic
What Makes Lumina Different
Replay Testing
Capture production traces and replay with new prompts before deployment. Example workflow:- Capture baseline from production
- Modify prompt in your codebase
- Replay against captured traces
- Compare responses with semantic diff
- Deploy with confidence