Core Concepts - Lumina Documentation

Understanding these core concepts will help you get the most out of Lumina.

Traces and Spans

Lumina uses OpenTelemetry’s trace model.

Trace

A trace represents a complete request or operation in your application. Properties:

Unique trace_id
One or more spans
Start and end timestamps
Service name

Example:

Trace: user_query_abc123
Duration: 1.2s
Service: chat-api
Spans: 3

Span

A span represents a single operation within a trace. Properties:

Unique span_id
Parent span_id (for child spans)
Operation name
Start and end timestamps
Attributes (key-value metadata)

Example:

{
  span_id: "span_001",
  parent_span_id: null,
  name: "chat_completion",
  attributes: {
    model: "claude-sonnet-4-5",
    prompt_tokens: 120,
    completion_tokens: 80,
    cost_usd: 0.003
  }
}

Single-Span vs Multi-Span

Single-Span Trace: One operation per trace (simple LLM calls).

Trace
└── chat_completion (1.2s)

Multi-Span Trace: Multiple nested operations (complex workflows).

Trace: rag_pipeline (2.5s)
├── retrieval (1.8s)
│   ├── embedding (0.2s)
│   └── vector_search (1.6s)
└── synthesis (0.7s)

Attributes

Attributes are key-value pairs attached to spans.

Standard Attributes

Automatically extracted by Lumina:

Attribute	Type	Description
`model`	string	LLM model name
`prompt_tokens`	int	Input token count
`completion_tokens`	int	Output token count
`cost_usd`	float	Calculated cost
`latency_ms`	int	Duration in milliseconds
`status`	string	`ok`, `error`, `degraded`

Custom Attributes

Add your own metadata:

await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      userId: 'user-123',
      sessionId: 'session-456',
      intent: 'customer_support',
      priority: 'high',
    },
  }
);

Use cases:

User identification
Session tracking
Feature flags
A/B test variants
Custom dimensions for analytics

Cost Calculation

Lumina automatically calculates costs based on provider pricing.

Supported Providers

Provider	Models	Pricing Source
OpenAI	GPT-4, GPT-3.5, GPT-4 Turbo	openai.com/pricing
Anthropic	Claude 3 Opus, Sonnet, Haiku	anthropic.com/pricing
Anthropic	Claude 3.5 Sonnet	anthropic.com/pricing
Anthropic	Claude Sonnet 4.5	anthropic.com/pricing

How It Works

1. Extract model name

{
  model: "claude-sonnet-4-5"
}

2. Get token counts

{
  prompt_tokens: 120,
  completion_tokens: 80
}

3. Calculate cost

Input cost:  120 tokens × $0.003/1k = $0.00036
Output cost:  80 tokens × $0.015/1k = $0.00120
Total cost: $0.00156

Custom Pricing

For unlisted models, specify costs manually:

await lumina.traceLLM(
  () => customLLM.generate(prompt),
  {
    metadata: {
      cost_per_prompt_token: 0.000003,
      cost_per_completion_token: 0.000015,
    },
  }
);

Quality Monitoring

Lumina tracks response quality with hybrid detection.

Exact Match (Fast)

Hash-based comparison for identical responses. Use case: Detect unexpected changes

// Original response
hash: "abc123..."

// New response (identical)
hash: "abc123..."
match: true

Speed: Instant (hash comparison)

Semantic Similarity (Accurate)

AI-powered semantic comparison for meaning preservation. Use case: Detect quality degradation

// Original: "The capital of France is Paris."
// New: "Paris is the capital city of France."

semantic_similarity: 0.95 // High similarity

Speed: ~100ms (Claude API call)

When to Use Each

Exact match:

Regression detection
Template responses
Structured output (JSON)

Semantic:

Natural language responses
Prompt engineering
A/B testing

Alerting

Lumina detects anomalies and sends alerts.

Cost Alerts

Triggered when costs spike above baseline. Example:

Alert: Cost Spike Detected
Service: chat-api
Endpoint: /api/chat
Baseline: $0.50/hour
Current: $2.30/hour (+360%)
Threshold: +200%

Quality Alerts

Triggered when response quality drops. Example:

Alert: Quality Drop Detected
Service: rag-pipeline
Endpoint: /api/search
Baseline: 0.92 similarity
Current: 0.65 similarity (-29%)
Threshold: -20%

Configuration

Set thresholds per service:

{
  service: "chat-api",
  alerts: {
    cost_spike: {
      threshold: 2.0,  // 200% increase
      window: "1h"
    },
    quality_drop: {
      threshold: 0.2,  // 20% decrease
      window: "1h"
    }
  }
}

Replay Testing

Test changes against real production data.

Workflow

1. Capture baseline

await lumina.createReplaySet({
  name: 'Pre-prompt-change',
  filters: {
    service: 'chat-api',
    timeRange: '24h',
    sampleSize: 100,
  },
});

2. Make changes Update your prompt in code. 3. Replay

await lumina.replayTraces({
  replaySetId: 'replay_001',
  // Uses new prompt automatically
});

4. Compare View side-by-side diff with semantic scoring.

Use Cases

Prompt engineering:

Test prompt variations
Optimize system messages
Validate few-shot examples

Model changes:

Compare model outputs
Validate cost/quality tradeoffs
Test provider switching

Infrastructure:

Test rate limiting
Validate caching
Load testing

Sampling

For high-volume workloads, sample a subset of traces.

Head-Based Sampling

Decide at trace start whether to record. Example:

const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1, // Sample 10%
});

Pros:

Low overhead
Predictable costs

Cons:

May miss rare errors
No dynamic adjustment

Conditional Sampling

Always sample important traces. Example:

const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'high-volume-service',
  samplingRate: 0.1,
  alwaysSampleConditions: {
    error: true,              // Always sample errors
    costUsd: { $gt: 0.5 },   // Always sample expensive calls
    userId: { $in: ['vip1', 'vip2'] }, // Always sample VIP users
  },
});

Best of both worlds:

Sample routine operations at 10%
Capture all important events at 100%

Data Retention

Lumina automatically manages trace lifecycle.

Self-Hosted Defaults

Daily limit: 50,000 traces
Retention: 7 days
Cleanup: Automatic at midnight UTC

Custom Configuration

# .env
TRACE_RETENTION_DAYS=30        # Keep for 30 days
DAILY_TRACE_LIMIT=100000       # Increase limit
CLEANUP_SCHEDULE="0 2 * * *"   # Run at 2 AM

Archival

Export traces before deletion:

# Export to JSON
curl http://api:8081/api/traces?before=$(date -d '7 days ago' +%s) \
  > traces_archive.json

# Export to PostgreSQL dump
pg_dump -t traces lumina > traces_backup.sql

Architecture Components

Understanding the system architecture helps with deployment and troubleshooting.

Ingestion Service

Purpose: Receive and validate traces Responsibilities:

Accept OTLP/HTTP traces
Validate schema
Publish to NATS queue

Port: 9411

Worker Pool

Purpose: Process traces asynchronously Responsibilities:

Calculate costs
Extract metadata
Store in PostgreSQL

Scaling: Horizontal (add more workers)

Query API

Purpose: Retrieve traces and analytics Responsibilities:

Serve dashboard queries
Provide REST API
Cache frequent queries

Port: 8081

Replay Engine

Purpose: Re-execute traces with new parameters Responsibilities:

Capture replay sets
Execute with LLM APIs
Compare results

Port: 8082

Dashboard

Purpose: Visualization and management Responsibilities:

Display traces
Show analytics
Manage alerts

Port: 3000

Best Practices

Attribute Naming

Use consistent naming conventions: Good:

{
  user_id: "user-123",
  session_id: "session-456",
  feature_flag: "new_prompt_v2"
}

Bad:

{
  uid: "user-123",        // Inconsistent
  SessionID: "session-456", // Wrong case
  flag: "new_prompt_v2"   // Too generic
}

Span Naming

Use descriptive, hierarchical names: Good:

rag_pipeline
  retrieval
    embedding
    vector_search
  synthesis
    llm_call

Bad:

main
  step1
  step2
  step3

Error Handling

Always capture errors:

try {
  await lumina.traceLLM(
    () => llm.generate(prompt),
    { name: 'chat' }
  );
} catch (error) {
  // Error automatically recorded in trace
  // Original error re-thrown
  throw error;
}

Cost Management

Monitor costs proactively:

Set up cost alerts
Review expensive queries daily
Optimize high-cost endpoints
Consider model downgrading for simple tasks

Next Steps

Quickstart

Install Lumina and send your first trace

SDK Reference

Complete SDK documentation

Multi-Span Tracing

Learn hierarchical tracing

Production Deployment

Deploy to production

Introduction

Quickstart

Documentation Index

​Traces and Spans

​Trace

​Span

​Single-Span vs Multi-Span

​Attributes

​Standard Attributes

​Custom Attributes

​Cost Calculation

​Supported Providers

​How It Works

​Custom Pricing

​Quality Monitoring

​Exact Match (Fast)

​Semantic Similarity (Accurate)

​When to Use Each

​Alerting

​Cost Alerts

​Quality Alerts

​Configuration

​Replay Testing

​Workflow

​Use Cases

​Sampling

​Head-Based Sampling

​Conditional Sampling

​Data Retention

​Self-Hosted Defaults

​Custom Configuration

​Archival

​Architecture Components

​Ingestion Service

​Worker Pool

​Query API

​Replay Engine

​Dashboard

​Best Practices

​Attribute Naming

​Span Naming

​Error Handling

​Cost Management

​Next Steps

Quickstart

SDK Reference

Multi-Span Tracing

Production Deployment

Traces and Spans

Trace

Span

Single-Span vs Multi-Span

Attributes

Standard Attributes

Custom Attributes

Cost Calculation

Supported Providers

How It Works

Custom Pricing

Quality Monitoring

Exact Match (Fast)

Semantic Similarity (Accurate)

When to Use Each

Alerting

Cost Alerts

Quality Alerts

Configuration

Replay Testing

Workflow

Use Cases

Sampling

Head-Based Sampling

Conditional Sampling

Data Retention

Self-Hosted Defaults

Custom Configuration

Archival

Architecture Components

Ingestion Service

Worker Pool

Query API

Replay Engine

Dashboard

Best Practices

Attribute Naming

Span Naming

Error Handling

Cost Management

Next Steps