Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.uselumina.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

Understanding these core concepts will help you get the most out of Lumina.

Traces and Spans

Lumina uses OpenTelemetry’s trace model.

Trace

A trace represents a complete request or operation in your application. Properties:
  • Unique trace_id
  • One or more spans
  • Start and end timestamps
  • Service name
Example:
Trace: user_query_abc123
Duration: 1.2s
Service: chat-api
Spans: 3

Span

A span represents a single operation within a trace. Properties:
  • Unique span_id
  • Parent span_id (for child spans)
  • Operation name
  • Start and end timestamps
  • Attributes (key-value metadata)
Example:
{
  span_id: "span_001",
  parent_span_id: null,
  name: "chat_completion",
  attributes: {
    model: "claude-sonnet-4-5",
    prompt_tokens: 120,
    completion_tokens: 80,
    cost_usd: 0.003
  }
}

Single-Span vs Multi-Span

Single-Span Trace: One operation per trace (simple LLM calls).
Trace
└── chat_completion (1.2s)
Multi-Span Trace: Multiple nested operations (complex workflows).
Trace: rag_pipeline (2.5s)
├── retrieval (1.8s)
│   ├── embedding (0.2s)
│   └── vector_search (1.6s)
└── synthesis (0.7s)

Attributes

Attributes are key-value pairs attached to spans.

Standard Attributes

Automatically extracted by Lumina:
AttributeTypeDescription
modelstringLLM model name
prompt_tokensintInput token count
completion_tokensintOutput token count
cost_usdfloatCalculated cost
latency_msintDuration in milliseconds
statusstringok, error, degraded

Custom Attributes

Add your own metadata:
await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      userId: 'user-123',
      sessionId: 'session-456',
      intent: 'customer_support',
      priority: 'high',
    },
  }
);
Use cases:
  • User identification
  • Session tracking
  • Feature flags
  • A/B test variants
  • Custom dimensions for analytics

Cost Calculation

Lumina automatically calculates costs based on provider pricing.

Supported Providers

ProviderModelsPricing Source
OpenAIGPT-4, GPT-3.5, GPT-4 Turboopenai.com/pricing
AnthropicClaude 3 Opus, Sonnet, Haikuanthropic.com/pricing
AnthropicClaude 3.5 Sonnetanthropic.com/pricing
AnthropicClaude Sonnet 4.5anthropic.com/pricing

How It Works

1. Extract model name
{
  model: "claude-sonnet-4-5"
}
2. Get token counts
{
  prompt_tokens: 120,
  completion_tokens: 80
}
3. Calculate cost
Input cost:  120 tokens × $0.003/1k = $0.00036
Output cost:  80 tokens × $0.015/1k = $0.00120
Total cost: $0.00156

Custom Pricing

For unlisted models, specify costs manually:
await lumina.traceLLM(
  () => customLLM.generate(prompt),
  {
    metadata: {
      cost_per_prompt_token: 0.000003,
      cost_per_completion_token: 0.000015,
    },
  }
);

Quality Monitoring

Lumina tracks response quality with hybrid detection.

Exact Match (Fast)

Hash-based comparison for identical responses. Use case: Detect unexpected changes
// Original response
hash: "abc123..."

// New response (identical)
hash: "abc123..."
match: true
Speed: Instant (hash comparison)

Semantic Similarity (Accurate)

AI-powered semantic comparison for meaning preservation. Use case: Detect quality degradation
// Original: "The capital of France is Paris."
// New: "Paris is the capital city of France."

semantic_similarity: 0.95 // High similarity
Speed: ~100ms (Claude API call)

When to Use Each

Exact match:
  • Regression detection
  • Template responses
  • Structured output (JSON)
Semantic:
  • Natural language responses
  • Prompt engineering
  • A/B testing

Alerting

Lumina detects anomalies and sends alerts.

Cost Alerts

Triggered when costs spike above baseline. Example:
Alert: Cost Spike Detected
Service: chat-api
Endpoint: /api/chat
Baseline: $0.50/hour
Current: $2.30/hour (+360%)
Threshold: +200%

Quality Alerts

Triggered when response quality drops. Example:
Alert: Quality Drop Detected
Service: rag-pipeline
Endpoint: /api/search
Baseline: 0.92 similarity
Current: 0.65 similarity (-29%)
Threshold: -20%

Configuration

Set thresholds per service:
{
  service: "chat-api",
  alerts: {
    cost_spike: {
      threshold: 2.0,  // 200% increase
      window: "1h"
    },
    quality_drop: {
      threshold: 0.2,  // 20% decrease
      window: "1h"
    }
  }
}

Replay Testing

Test changes against real production data.

Workflow

1. Capture baseline
await lumina.createReplaySet({
  name: 'Pre-prompt-change',
  filters: {
    service: 'chat-api',
    timeRange: '24h',
    sampleSize: 100,
  },
});
2. Make changes Update your prompt in code. 3. Replay
await lumina.replayTraces({
  replaySetId: 'replay_001',
  // Uses new prompt automatically
});
4. Compare View side-by-side diff with semantic scoring.

Use Cases

Prompt engineering:
  • Test prompt variations
  • Optimize system messages
  • Validate few-shot examples
Model changes:
  • Compare model outputs
  • Validate cost/quality tradeoffs
  • Test provider switching
Infrastructure:
  • Test rate limiting
  • Validate caching
  • Load testing

Sampling

For high-volume workloads, sample a subset of traces.

Head-Based Sampling

Decide at trace start whether to record. Example:
const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1, // Sample 10%
});
Pros:
  • Low overhead
  • Predictable costs
Cons:
  • May miss rare errors
  • No dynamic adjustment

Conditional Sampling

Always sample important traces. Example:
const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1,
  alwaysSampleConditions: {
    error: true,              // Always sample errors
    costUsd: { $gt: 0.5 },   // Always sample expensive calls
    userId: { $in: ['vip1', 'vip2'] }, // Always sample VIP users
  },
});
Best of both worlds:
  • Sample routine operations at 10%
  • Capture all important events at 100%

Data Retention

Lumina automatically manages trace lifecycle.

Self-Hosted Defaults

  • Daily limit: 50,000 traces
  • Retention: 7 days
  • Cleanup: Automatic at midnight UTC

Custom Configuration

# .env
TRACE_RETENTION_DAYS=30        # Keep for 30 days
DAILY_TRACE_LIMIT=100000       # Increase limit
CLEANUP_SCHEDULE="0 2 * * *"   # Run at 2 AM

Archival

Export traces before deletion:
# Export to JSON
curl http://api:8081/api/traces?before=$(date -d '7 days ago' +%s) \
  > traces_archive.json

# Export to PostgreSQL dump
pg_dump -t traces lumina > traces_backup.sql

Architecture Components

Understanding the system architecture helps with deployment and troubleshooting.

Ingestion Service

Purpose: Receive and validate traces Responsibilities:
  • Accept OTLP/HTTP traces
  • Validate schema
  • Publish to NATS queue
Port: 9411

Worker Pool

Purpose: Process traces asynchronously Responsibilities:
  • Calculate costs
  • Extract metadata
  • Store in PostgreSQL
Scaling: Horizontal (add more workers)

Query API

Purpose: Retrieve traces and analytics Responsibilities:
  • Serve dashboard queries
  • Provide REST API
  • Cache frequent queries
Port: 8081

Replay Engine

Purpose: Re-execute traces with new parameters Responsibilities:
  • Capture replay sets
  • Execute with LLM APIs
  • Compare results
Port: 8082

Dashboard

Purpose: Visualization and management Responsibilities:
  • Display traces
  • Show analytics
  • Manage alerts
Port: 3000

Best Practices

Attribute Naming

Use consistent naming conventions: Good:
{
  user_id: "user-123",
  session_id: "session-456",
  feature_flag: "new_prompt_v2"
}
Bad:
{
  uid: "user-123",        // Inconsistent
  SessionID: "session-456", // Wrong case
  flag: "new_prompt_v2"   // Too generic
}

Span Naming

Use descriptive, hierarchical names: Good:
rag_pipeline
  retrieval
    embedding
    vector_search
  synthesis
    llm_call
Bad:
main
  step1
  step2
  step3

Error Handling

Always capture errors:
try {
  await lumina.traceLLM(
    () => llm.generate(prompt),
    { name: 'chat' }
  );
} catch (error) {
  // Error automatically recorded in trace
  // Original error re-thrown
  throw error;
}

Cost Management

Monitor costs proactively:
  1. Set up cost alerts
  2. Review expensive queries daily
  3. Optimize high-cost endpoints
  4. Consider model downgrading for simple tasks

Next Steps

Quickstart

Install Lumina and send your first trace

SDK Reference

Complete SDK documentation

Multi-Span Tracing

Learn hierarchical tracing

Production Deployment

Deploy to production