FAQ - Lumina Documentation

Common questions about Lumina observability platform.

General

What is Lumina?

Lumina is an open-source, OpenTelemetry-native observability platform for AI applications. It tracks costs, latency, and quality across LLM calls with automatic instrumentation and real-time alerting.

Is Lumina free?

Yes, the self-hosted version is free forever with:

50,000 traces per day
7-day retention
All features included

For unlimited traces and managed hosting, contact us for cloud pricing.

What makes Lumina different?

OpenTelemetry-native — Not retrofitted, works with existing observability stacks
Self-hostable — Full data control and sovereignty
Replay testing — Test changes against real production data
Cost + quality correlation — Connect spending with quality metrics

Do I need API keys?

API keys are only required for the replay feature to re-execute LLM calls. Core functionality (tracing, cost tracking, alerts) works without API keys.

Installation

What are the system requirements?

Minimum:

4GB RAM
10GB disk space
Docker 20.10+ or Kubernetes 1.20+

Recommended:

8GB RAM
50GB disk space
PostgreSQL 14+
Redis 6+
NATS 2.9+

Can I run Lumina in production?

Yes, Lumina is production-ready with:

PostgreSQL backend (battle-tested)
NATS JetStream (reliable message queue)
Horizontal scaling support
Real-time alerting (<500ms)

Handles 10M+ traces/day per cluster.

How do I upgrade Lumina?

Docker Compose:

cd infra/docker
docker compose pull
docker compose up -d

Kubernetes:

helm repo update
helm upgrade lumina lumina/lumina

Database migrations run automatically on startup.

Tracing

What is a trace?

A trace represents a complete operation in your application, from request to response. It consists of one or more spans representing individual steps. Example:

Trace: user_query_abc123 (1.2s)
└── chat_completion (1.2s)
    Model: claude-sonnet-4-5
    Cost: $0.003

What is a span?

A span represents a single operation within a trace (e.g., an LLM call, database query, or API request).

What is multi-span tracing?

Multi-span tracing tracks complex workflows with multiple steps, showing parent-child relationships. Example:

Trace: rag_pipeline (2.5s, $0.05)
├── retrieval (1.8s, $0.01)
│   ├── embedding (0.2s, $0.001)
│   └── vector_search (1.6s, $0.009)
└── synthesis (0.7s, $0.04)

See Multi-Span Tracing for details.

How do I trace custom operations?

Use lumina.trace() for any operation:

await lumina.trace('custom_operation', async (span) => {
  span.setAttribute('custom_attr', 'value');
  // Your code here
  return result;
});

Cost Tracking

How does cost calculation work?

Lumina automatically calculates costs based on:

Model name (e.g., “claude-sonnet-4-5”)
Token counts (prompt + completion)
Provider pricing (updated monthly)

Example:

Input:  120 tokens × $0.003/1k = $0.00036
Output:  80 tokens × $0.015/1k = $0.00120
Total: $0.00156

Which providers are supported?

Provider	Models	Auto-Calculation
OpenAI	GPT-4, GPT-3.5, GPT-4 Turbo	✓
Anthropic	Claude 3 Opus, Sonnet, Haiku	✓
Anthropic	Claude 3.5 Sonnet	✓
Anthropic	Claude Sonnet 4.5	✓

For other providers, specify costs manually via metadata.

How accurate is cost tracking?

Cost calculations are based on official provider pricing. Accuracy depends on:

Correct model name
Accurate token counts from provider API
Up-to-date pricing data

Lumina updates pricing monthly. For critical cost tracking, verify against provider invoices.

Quality Monitoring

How is quality measured?

Lumina uses hybrid quality detection: Exact match: Hash-based comparison (instant) Semantic similarity: AI-powered scoring (~100ms)

What is semantic similarity?

A score from 0.0 to 1.0 indicating how similar two responses are in meaning:

0.9-1.0: Nearly identical meaning
0.7-0.9: Similar meaning, different wording
0.5-0.7: Related but different
0.0-0.5: Completely different

Can I use custom quality metrics?

Yes, add custom scoring via attributes:

await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      custom_quality_score: calculateMyScore(response),
      factual_accuracy: 0.95,
      relevance_score: 0.88,
    },
  }
);

Replay Testing

What is replay testing?

Replay testing re-executes real production traces with new code to detect regressions before deployment. Workflow:

Capture baseline from production
Modify prompts/code
Replay against captured traces
Compare outputs with semantic diff

Does replay cost money?

Yes, replay makes real LLM API calls. Costs are the same as production calls. Example:

Capture 100 traces
Replay costs = 100 × average cost per call
If average cost is $0.01, replay costs$ 1.00

Can I replay without API keys?

No, replay requires LLM provider API keys to re-execute calls. Use simulation mode for testing without costs:

await lumina.replayTraces({
  replaySetId: 'replay_001',
  mode: 'simulation', // No real API calls
});

Alerting

How does alerting work?

Lumina detects anomalies automatically by comparing current metrics to baselines: Cost alerts: Spending exceeds baseline by threshold Quality alerts: Similarity score drops below threshold Alerts are sent via webhooks to Slack, PagerDuty, or custom endpoints.

How are baselines calculated?

Baselines are calculated from historical data: Rolling average: Last 7 days Update frequency: Every hour Minimum data: 100 traces

Can I configure alert thresholds?

Yes, set thresholds per service:

# Dashboard: Settings → Alerts → Configure
Cost spike threshold: +200%
Quality drop threshold: -20%
Window: 1 hour

Or via API:

curl -X POST http://api:8081/api/alerts/configure \
  -d '{"service":"chat-api","costThreshold":2.0,"qualityThreshold":0.2}'

Performance

What is the ingestion throughput?

Per ingestion pod:

10,000 traces/second
500MB/second data throughput

Horizontally scalable:

10 pods = 100,000 traces/second

What is the query latency?

Query API performance:

P50: <50ms
P95: <100ms
P99: <200ms

On 10M+ trace tables with proper indexes.

How do I optimize performance?

For ingestion:

Batch traces client-side
Enable compression
Scale ingestion pods horizontally

For queries:

Enable Redis caching
Add database indexes
Configure retention policies

See Performance Guide for details.

Security

Is Lumina secure?

Yes, with proper configuration: Self-hosted:

Deploy behind VPN or private network
Enable SSL/TLS for database
Rotate secrets regularly

Network security:

Use Kubernetes NetworkPolicies
Enable mTLS between services
Configure firewall rules

See Security Guide for full checklist.

Does Lumina store sensitive data?

Traces may contain sensitive data in prompts and responses. Best practices:

Redact PII before sending:

const redacted = redactPII(userPrompt);
await lumina.traceLLM(
  () => llm.generate(redacted),
  { prompt: redacted }
);

Configure short retention:
```
TRACE_RETENTION_DAYS=7
```
Implement data export/deletion for GDPR compliance

Can I disable authentication?

Yes, self-hosted Lumina runs without authentication by default for ease of use in controlled environments. Enable authentication for internet-exposed deployments:

AUTH_REQUIRED=true
JWT_SECRET=<generate-strong-secret>

Integrations

Which LLM providers are supported?

Supported:

OpenAI (GPT-4, GPT-3.5, GPT-4 Turbo)
Anthropic (Claude 3.x, Claude 3.5, Claude Sonnet 4.5)

Coming soon:

Cohere
Replicate
Together AI

Can I integrate with LangChain?

Yes, wrap LangChain calls:

await lumina.traceLLM(
  () => chain.call({ input: userQuery }),
  { name: 'langchain_call' }
);

See LangChain Integration for details.

Can I integrate with Vercel AI SDK?

Yes:

import { streamText } from 'ai';

await lumina.traceLLM(
  () => streamText({ model, prompt }),
  { name: 'vercel_ai_stream', system: 'openai' }
);

See Vercel AI SDK Integration for details.

Deployment

Can I deploy to Kubernetes?

Yes, Lumina is designed for Kubernetes with:

Helm charts
StatefulSets for stateful services
Horizontal Pod Autoscaling
Service mesh compatibility

See Kubernetes Deployment for details.

Can I deploy to AWS/GCP/Azure?

Yes, deploy with:

AWS: EKS + RDS PostgreSQL + ElastiCache Redis
GCP: GKE + Cloud SQL + Memorystore
Azure: AKS + Azure Database for PostgreSQL

Cloud-managed services recommended for databases.

Do I need Redis?

Redis is used for:

Query result caching
Rate limiting
Session storage (if auth enabled)

Optional but recommended for production.

Troubleshooting

Traces not appearing?

Check 1: Verify endpoint

const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces', // Correct
  // NOT: http://localhost:8081 (query API)
});

Check 2: View ingestion logs

docker compose logs ingestion | grep ERROR

Check 3: Test with cURL

curl -X POST http://localhost:9411/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[...]}'

Costs showing as $0?

Possible causes:

Model name not recognized
Missing token counts
Provider not supported

Solution: Add manual cost via metadata:

{
  metadata: {
    cost_per_prompt_token: 0.000003,
    cost_per_completion_token: 0.000015,
  }
}

High memory usage?

Possible causes:

Large trace volumes
Insufficient PostgreSQL shared_buffers
Redis cache too large

Solutions:

Enable sampling for high volume
Increase PostgreSQL memory
Configure Redis maxmemory

See Performance Guide.

Comparison

How does Lumina compare to LangSmith?

Feature	Lumina	LangSmith
Self-hosted	✓	✗
OpenTelemetry	Native	Adapter
Replay testing	✓	✗
Free tier	50k/day	5k/month
Real-time alerts	<500ms	Minutes

Choose Lumina if: You need self-hosting, OpenTelemetry compatibility, or replay testing. Choose LangSmith if: You use LangChain heavily and prefer cloud-only.

How does Lumina compare to Langfuse?

Feature	Lumina	Langfuse
Self-hosted	✓ (50k/day)	✓ (unlimited)
OpenTelemetry	Native	No
Replay testing	✓	✗
Prompt management	Basic	Advanced
Real-time alerts	✓	✓

Choose Lumina if: You need OpenTelemetry or replay testing. Choose Langfuse if: You need advanced prompt management or unlimited free traces.

How does Lumina compare to Helicone?

Feature	Lumina	Helicone
Self-hosted	✓	✗
Multi-span tracing	✓	✗
Replay testing	✓	✗
Gateway approach	✗	✓
Free tier	50k/day	100k requests/month

Choose Lumina if: You need multi-span tracing or replay testing. Choose Helicone if: You prefer gateway-based instrumentation.

Getting Help

Where can I get support?

Community:

Documentation:

Commercial:

Email: support@uselumina.io
Managed cloud customers: priority support

How do I report a bug?

Check existing issues
Create new issue with:
- Lumina version
- Steps to reproduce
- Expected vs actual behavior
- Logs (if applicable)

How do I request a feature?

Check discussions
Create feature request with:
- Use case description
- Proposed solution
- Alternative solutions considered

Can I contribute?

Yes! See Contributing Guide. Good first issues:

Documentation improvements
Bug fixes
Integration guides
Example applications

Introduction

Quickstart

Documentation Index

​General

​What is Lumina?

​Is Lumina free?

​What makes Lumina different?

​Do I need API keys?

​Installation

​What are the system requirements?

​Can I run Lumina in production?

​How do I upgrade Lumina?

​Tracing

​What is a trace?

​What is a span?

​What is multi-span tracing?

​How do I trace custom operations?

​Cost Tracking

​How does cost calculation work?

​Which providers are supported?

​How accurate is cost tracking?

​Quality Monitoring

​How is quality measured?

​What is semantic similarity?

​Can I use custom quality metrics?

​Replay Testing

​What is replay testing?

​Does replay cost money?

​Can I replay without API keys?

​Alerting

​How does alerting work?

​How are baselines calculated?

​Can I configure alert thresholds?

​Performance

​What is the ingestion throughput?

​What is the query latency?

​How do I optimize performance?

​Security

​Is Lumina secure?

​Does Lumina store sensitive data?

​Can I disable authentication?

​Integrations

​Which LLM providers are supported?

​Can I integrate with LangChain?

​Can I integrate with Vercel AI SDK?

​Deployment

​Can I deploy to Kubernetes?

​Can I deploy to AWS/GCP/Azure?

​Do I need Redis?

​Troubleshooting

​Traces not appearing?

​Costs showing as $0?

​High memory usage?

​Comparison

​How does Lumina compare to LangSmith?

​How does Lumina compare to Langfuse?

​How does Lumina compare to Helicone?

​Getting Help

​Where can I get support?

​How do I report a bug?

​How do I request a feature?

​Can I contribute?