Skip to main content
Understanding these core concepts will help you get the most out of Lumina.

Traces and Spans

Lumina uses OpenTelemetry’s trace model.

Trace

A trace represents a complete request or operation in your application. Properties:
  • Unique trace_id
  • One or more spans
  • Start and end timestamps
  • Service name
Example:
Trace: user_query_abc123
Duration: 1.2s
Service: chat-api
Spans: 3

Span

A span represents a single operation within a trace. Properties:
  • Unique span_id
  • Parent span_id (for child spans)
  • Operation name
  • Start and end timestamps
  • Attributes (key-value metadata)
Example:
{
  span_id: "span_001",
  parent_span_id: null,
  name: "chat_completion",
  attributes: {
    model: "claude-sonnet-4-5",
    prompt_tokens: 120,
    completion_tokens: 80,
    cost_usd: 0.003
  }
}

Single-Span vs Multi-Span

Single-Span Trace: One operation per trace (simple LLM calls).
Trace
└── chat_completion (1.2s)
Multi-Span Trace: Multiple nested operations (complex workflows).
Trace: rag_pipeline (2.5s)
├── retrieval (1.8s)
│   ├── embedding (0.2s)
│   └── vector_search (1.6s)
└── synthesis (0.7s)

Attributes

Attributes are key-value pairs attached to spans.

Standard Attributes

Automatically extracted by Lumina:
AttributeTypeDescription
modelstringLLM model name
prompt_tokensintInput token count
completion_tokensintOutput token count
cost_usdfloatCalculated cost
latency_msintDuration in milliseconds
statusstringok, error, degraded

Custom Attributes

Add your own metadata:
await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      userId: 'user-123',
      sessionId: 'session-456',
      intent: 'customer_support',
      priority: 'high',
    },
  }
);
Use cases:
  • User identification
  • Session tracking
  • Feature flags
  • A/B test variants
  • Custom dimensions for analytics

Cost Calculation

Lumina automatically calculates costs based on provider pricing.

Supported Providers

ProviderModelsPricing Source
OpenAIGPT-4, GPT-3.5, GPT-4 Turboopenai.com/pricing
AnthropicClaude 3 Opus, Sonnet, Haikuanthropic.com/pricing
AnthropicClaude 3.5 Sonnetanthropic.com/pricing
AnthropicClaude Sonnet 4.5anthropic.com/pricing

How It Works

1. Extract model name
{
  model: "claude-sonnet-4-5"
}
2. Get token counts
{
  prompt_tokens: 120,
  completion_tokens: 80
}
3. Calculate cost
Input cost:  120 tokens × $0.003/1k = $0.00036
Output cost:  80 tokens × $0.015/1k = $0.00120
Total cost: $0.00156

Custom Pricing

For unlisted models, specify costs manually:
await lumina.traceLLM(
  () => customLLM.generate(prompt),
  {
    metadata: {
      cost_per_prompt_token: 0.000003,
      cost_per_completion_token: 0.000015,
    },
  }
);

Quality Monitoring

Lumina tracks response quality with hybrid detection.

Exact Match (Fast)

Hash-based comparison for identical responses. Use case: Detect unexpected changes
// Original response
hash: "abc123..."

// New response (identical)
hash: "abc123..."
match: true
Speed: Instant (hash comparison)

Semantic Similarity (Accurate)

AI-powered semantic comparison for meaning preservation. Use case: Detect quality degradation
// Original: "The capital of France is Paris."
// New: "Paris is the capital city of France."

semantic_similarity: 0.95 // High similarity
Speed: ~100ms (Claude API call)

When to Use Each

Exact match:
  • Regression detection
  • Template responses
  • Structured output (JSON)
Semantic:
  • Natural language responses
  • Prompt engineering
  • A/B testing

Alerting

Lumina detects anomalies and sends alerts.

Cost Alerts

Triggered when costs spike above baseline. Example:
Alert: Cost Spike Detected
Service: chat-api
Endpoint: /api/chat
Baseline: $0.50/hour
Current: $2.30/hour (+360%)
Threshold: +200%

Quality Alerts

Triggered when response quality drops. Example:
Alert: Quality Drop Detected
Service: rag-pipeline
Endpoint: /api/search
Baseline: 0.92 similarity
Current: 0.65 similarity (-29%)
Threshold: -20%

Configuration

Set thresholds per service:
{
  service: "chat-api",
  alerts: {
    cost_spike: {
      threshold: 2.0,  // 200% increase
      window: "1h"
    },
    quality_drop: {
      threshold: 0.2,  // 20% decrease
      window: "1h"
    }
  }
}

Replay Testing

Test changes against real production data.

Workflow

1. Capture baseline
await lumina.createReplaySet({
  name: 'Pre-prompt-change',
  filters: {
    service: 'chat-api',
    timeRange: '24h',
    sampleSize: 100,
  },
});
2. Make changes Update your prompt in code. 3. Replay
await lumina.replayTraces({
  replaySetId: 'replay_001',
  // Uses new prompt automatically
});
4. Compare View side-by-side diff with semantic scoring.

Use Cases

Prompt engineering:
  • Test prompt variations
  • Optimize system messages
  • Validate few-shot examples
Model changes:
  • Compare model outputs
  • Validate cost/quality tradeoffs
  • Test provider switching
Infrastructure:
  • Test rate limiting
  • Validate caching
  • Load testing

Sampling

For high-volume workloads, sample a subset of traces.

Head-Based Sampling

Decide at trace start whether to record. Example:
const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1, // Sample 10%
});
Pros:
  • Low overhead
  • Predictable costs
Cons:
  • May miss rare errors
  • No dynamic adjustment

Conditional Sampling

Always sample important traces. Example:
const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1,
  alwaysSampleConditions: {
    error: true,              // Always sample errors
    costUsd: { $gt: 0.5 },   // Always sample expensive calls
    userId: { $in: ['vip1', 'vip2'] }, // Always sample VIP users
  },
});
Best of both worlds:
  • Sample routine operations at 10%
  • Capture all important events at 100%

Data Retention

Lumina automatically manages trace lifecycle.

Self-Hosted Defaults

  • Daily limit: 50,000 traces
  • Retention: 7 days
  • Cleanup: Automatic at midnight UTC

Custom Configuration

# .env
TRACE_RETENTION_DAYS=30        # Keep for 30 days
DAILY_TRACE_LIMIT=100000       # Increase limit
CLEANUP_SCHEDULE="0 2 * * *"   # Run at 2 AM

Archival

Export traces before deletion:
# Export to JSON
curl http://api:8081/api/traces?before=$(date -d '7 days ago' +%s) \
  > traces_archive.json

# Export to PostgreSQL dump
pg_dump -t traces lumina > traces_backup.sql

Architecture Components

Understanding the system architecture helps with deployment and troubleshooting.

Ingestion Service

Purpose: Receive and validate traces Responsibilities:
  • Accept OTLP/HTTP traces
  • Validate schema
  • Publish to NATS queue
Port: 9411

Worker Pool

Purpose: Process traces asynchronously Responsibilities:
  • Calculate costs
  • Extract metadata
  • Store in PostgreSQL
Scaling: Horizontal (add more workers)

Query API

Purpose: Retrieve traces and analytics Responsibilities:
  • Serve dashboard queries
  • Provide REST API
  • Cache frequent queries
Port: 8081

Replay Engine

Purpose: Re-execute traces with new parameters Responsibilities:
  • Capture replay sets
  • Execute with LLM APIs
  • Compare results
Port: 8082

Dashboard

Purpose: Visualization and management Responsibilities:
  • Display traces
  • Show analytics
  • Manage alerts
Port: 3000

Best Practices

Attribute Naming

Use consistent naming conventions: Good:
{
  user_id: "user-123",
  session_id: "session-456",
  feature_flag: "new_prompt_v2"
}
Bad:
{
  uid: "user-123",        // Inconsistent
  SessionID: "session-456", // Wrong case
  flag: "new_prompt_v2"   // Too generic
}

Span Naming

Use descriptive, hierarchical names: Good:
rag_pipeline
  retrieval
    embedding
    vector_search
  synthesis
    llm_call
Bad:
main
  step1
  step2
  step3

Error Handling

Always capture errors:
try {
  await lumina.traceLLM(
    () => llm.generate(prompt),
    { name: 'chat' }
  );
} catch (error) {
  // Error automatically recorded in trace
  // Original error re-thrown
  throw error;
}

Cost Management

Monitor costs proactively:
  1. Set up cost alerts
  2. Review expensive queries daily
  3. Optimize high-cost endpoints
  4. Consider model downgrading for simple tasks

Next Steps