Skip to main content
Common questions about Lumina observability platform.

General

What is Lumina?

Lumina is an open-source, OpenTelemetry-native observability platform for AI applications. It tracks costs, latency, and quality across LLM calls with automatic instrumentation and real-time alerting.

Is Lumina free?

Yes, the self-hosted version is free forever with:
  • 50,000 traces per day
  • 7-day retention
  • All features included
For unlimited traces and managed hosting, contact us for cloud pricing.

What makes Lumina different?

  1. OpenTelemetry-native — Not retrofitted, works with existing observability stacks
  2. Self-hostable — Full data control and sovereignty
  3. Replay testing — Test changes against real production data
  4. Cost + quality correlation — Connect spending with quality metrics

Do I need API keys?

API keys are only required for the replay feature to re-execute LLM calls. Core functionality (tracing, cost tracking, alerts) works without API keys.

Installation

What are the system requirements?

Minimum:
  • 4GB RAM
  • 10GB disk space
  • Docker 20.10+ or Kubernetes 1.20+
Recommended:
  • 8GB RAM
  • 50GB disk space
  • PostgreSQL 14+
  • Redis 6+
  • NATS 2.9+

Can I run Lumina in production?

Yes, Lumina is production-ready with:
  • PostgreSQL backend (battle-tested)
  • NATS JetStream (reliable message queue)
  • Horizontal scaling support
  • Real-time alerting (<500ms)
Handles 10M+ traces/day per cluster.

How do I upgrade Lumina?

Docker Compose:
cd infra/docker
docker compose pull
docker compose up -d
Kubernetes:
helm repo update
helm upgrade lumina lumina/lumina
Database migrations run automatically on startup.

Tracing

What is a trace?

A trace represents a complete operation in your application, from request to response. It consists of one or more spans representing individual steps. Example:
Trace: user_query_abc123 (1.2s)
└── chat_completion (1.2s)
    Model: claude-sonnet-4-5
    Cost: $0.003

What is a span?

A span represents a single operation within a trace (e.g., an LLM call, database query, or API request).

What is multi-span tracing?

Multi-span tracing tracks complex workflows with multiple steps, showing parent-child relationships. Example:
Trace: rag_pipeline (2.5s, $0.05)
├── retrieval (1.8s, $0.01)
│   ├── embedding (0.2s, $0.001)
│   └── vector_search (1.6s, $0.009)
└── synthesis (0.7s, $0.04)
See Multi-Span Tracing for details.

How do I trace custom operations?

Use lumina.trace() for any operation:
await lumina.trace('custom_operation', async (span) => {
  span.setAttribute('custom_attr', 'value');
  // Your code here
  return result;
});

Cost Tracking

How does cost calculation work?

Lumina automatically calculates costs based on:
  1. Model name (e.g., “claude-sonnet-4-5”)
  2. Token counts (prompt + completion)
  3. Provider pricing (updated monthly)
Example:
Input:  120 tokens × $0.003/1k = $0.00036
Output:  80 tokens × $0.015/1k = $0.00120
Total: $0.00156

Which providers are supported?

ProviderModelsAuto-Calculation
OpenAIGPT-4, GPT-3.5, GPT-4 Turbo
AnthropicClaude 3 Opus, Sonnet, Haiku
AnthropicClaude 3.5 Sonnet
AnthropicClaude Sonnet 4.5
For other providers, specify costs manually via metadata.

How accurate is cost tracking?

Cost calculations are based on official provider pricing. Accuracy depends on:
  • Correct model name
  • Accurate token counts from provider API
  • Up-to-date pricing data
Lumina updates pricing monthly. For critical cost tracking, verify against provider invoices.

Quality Monitoring

How is quality measured?

Lumina uses hybrid quality detection: Exact match: Hash-based comparison (instant) Semantic similarity: AI-powered scoring (~100ms)

What is semantic similarity?

A score from 0.0 to 1.0 indicating how similar two responses are in meaning:
  • 0.9-1.0: Nearly identical meaning
  • 0.7-0.9: Similar meaning, different wording
  • 0.5-0.7: Related but different
  • 0.0-0.5: Completely different

Can I use custom quality metrics?

Yes, add custom scoring via attributes:
await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      custom_quality_score: calculateMyScore(response),
      factual_accuracy: 0.95,
      relevance_score: 0.88,
    },
  }
);

Replay Testing

What is replay testing?

Replay testing re-executes real production traces with new code to detect regressions before deployment. Workflow:
  1. Capture baseline from production
  2. Modify prompts/code
  3. Replay against captured traces
  4. Compare outputs with semantic diff

Does replay cost money?

Yes, replay makes real LLM API calls. Costs are the same as production calls. Example:
  • Capture 100 traces
  • Replay costs = 100 × average cost per call
  • If average cost is 0.01,replaycosts0.01, replay costs 1.00

Can I replay without API keys?

No, replay requires LLM provider API keys to re-execute calls. Use simulation mode for testing without costs:
await lumina.replayTraces({
  replaySetId: 'replay_001',
  mode: 'simulation', // No real API calls
});

Alerting

How does alerting work?

Lumina detects anomalies automatically by comparing current metrics to baselines: Cost alerts: Spending exceeds baseline by threshold Quality alerts: Similarity score drops below threshold Alerts are sent via webhooks to Slack, PagerDuty, or custom endpoints.

How are baselines calculated?

Baselines are calculated from historical data: Rolling average: Last 7 days Update frequency: Every hour Minimum data: 100 traces

Can I configure alert thresholds?

Yes, set thresholds per service:
# Dashboard: Settings → Alerts → Configure
Cost spike threshold: +200%
Quality drop threshold: -20%
Window: 1 hour
Or via API:
curl -X POST http://api:8081/api/alerts/configure \
  -d '{"service":"chat-api","costThreshold":2.0,"qualityThreshold":0.2}'

Performance

What is the ingestion throughput?

Per ingestion pod:
  • 10,000 traces/second
  • 500MB/second data throughput
Horizontally scalable:
  • 10 pods = 100,000 traces/second

What is the query latency?

Query API performance:
  • P50: <50ms
  • P95: <100ms
  • P99: <200ms
On 10M+ trace tables with proper indexes.

How do I optimize performance?

For ingestion:
  1. Batch traces client-side
  2. Enable compression
  3. Scale ingestion pods horizontally
For queries:
  1. Enable Redis caching
  2. Add database indexes
  3. Configure retention policies
See Performance Guide for details.

Security

Is Lumina secure?

Yes, with proper configuration: Self-hosted:
  • Deploy behind VPN or private network
  • Enable SSL/TLS for database
  • Rotate secrets regularly
Network security:
  • Use Kubernetes NetworkPolicies
  • Enable mTLS between services
  • Configure firewall rules
See Security Guide for full checklist.

Does Lumina store sensitive data?

Traces may contain sensitive data in prompts and responses. Best practices:
  1. Redact PII before sending:
    const redacted = redactPII(userPrompt);
    await lumina.traceLLM(
      () => llm.generate(redacted),
      { prompt: redacted }
    );
    
  2. Configure short retention:
    TRACE_RETENTION_DAYS=7
    
  3. Implement data export/deletion for GDPR compliance

Can I disable authentication?

Yes, self-hosted Lumina runs without authentication by default for ease of use in controlled environments. Enable authentication for internet-exposed deployments:
AUTH_REQUIRED=true
JWT_SECRET=<generate-strong-secret>

Integrations

Which LLM providers are supported?

Supported:
  • OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
  • Anthropic (Claude 3.x, Claude 3.5, Claude Sonnet 4.5)
Coming soon:
  • Cohere
  • Replicate
  • Together AI

Can I integrate with LangChain?

Yes, wrap LangChain calls:
await lumina.traceLLM(
  () => chain.call({ input: userQuery }),
  { name: 'langchain_call' }
);
See LangChain Integration for details.

Can I integrate with Vercel AI SDK?

Yes:
import { streamText } from 'ai';

await lumina.traceLLM(
  () => streamText({ model, prompt }),
  { name: 'vercel_ai_stream', system: 'openai' }
);
See Vercel AI SDK Integration for details.

Deployment

Can I deploy to Kubernetes?

Yes, Lumina is designed for Kubernetes with:
  • Helm charts
  • StatefulSets for stateful services
  • Horizontal Pod Autoscaling
  • Service mesh compatibility
See Kubernetes Deployment for details.

Can I deploy to AWS/GCP/Azure?

Yes, deploy with:
  • AWS: EKS + RDS PostgreSQL + ElastiCache Redis
  • GCP: GKE + Cloud SQL + Memorystore
  • Azure: AKS + Azure Database for PostgreSQL
Cloud-managed services recommended for databases.

Do I need Redis?

Redis is used for:
  • Query result caching
  • Rate limiting
  • Session storage (if auth enabled)
Optional but recommended for production.

Troubleshooting

Traces not appearing?

Check 1: Verify endpoint
const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces', // Correct
  // NOT: http://localhost:8081 (query API)
});
Check 2: View ingestion logs
docker compose logs ingestion | grep ERROR
Check 3: Test with cURL
curl -X POST http://localhost:9411/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[...]}'

Costs showing as $0?

Possible causes:
  1. Model name not recognized
  2. Missing token counts
  3. Provider not supported
Solution: Add manual cost via metadata:
{
  metadata: {
    cost_per_prompt_token: 0.000003,
    cost_per_completion_token: 0.000015,
  }
}

High memory usage?

Possible causes:
  1. Large trace volumes
  2. Insufficient PostgreSQL shared_buffers
  3. Redis cache too large
Solutions:
  1. Enable sampling for high volume
  2. Increase PostgreSQL memory
  3. Configure Redis maxmemory
See Performance Guide.

Comparison

How does Lumina compare to LangSmith?

FeatureLuminaLangSmith
Self-hosted
OpenTelemetryNativeAdapter
Replay testing
Free tier50k/day5k/month
Real-time alerts<500msMinutes
Choose Lumina if: You need self-hosting, OpenTelemetry compatibility, or replay testing. Choose LangSmith if: You use LangChain heavily and prefer cloud-only.

How does Lumina compare to Langfuse?

FeatureLuminaLangfuse
Self-hosted✓ (50k/day)✓ (unlimited)
OpenTelemetryNativeNo
Replay testing
Prompt managementBasicAdvanced
Real-time alerts
Choose Lumina if: You need OpenTelemetry or replay testing. Choose Langfuse if: You need advanced prompt management or unlimited free traces.

How does Lumina compare to Helicone?

FeatureLuminaHelicone
Self-hosted
Multi-span tracing
Replay testing
Gateway approach
Free tier50k/day100k requests/month
Choose Lumina if: You need multi-span tracing or replay testing. Choose Helicone if: You prefer gateway-based instrumentation.

Getting Help

Where can I get support?

Community: Documentation: Commercial:

How do I report a bug?

  1. Check existing issues
  2. Create new issue with:
    • Lumina version
    • Steps to reproduce
    • Expected vs actual behavior
    • Logs (if applicable)

How do I request a feature?

  1. Check discussions
  2. Create feature request with:
    • Use case description
    • Proposed solution
    • Alternative solutions considered

Can I contribute?

Yes! See Contributing Guide. Good first issues:
  • Documentation improvements
  • Bug fixes
  • Integration guides
  • Example applications