Overview - Lumina Documentation

Lumina is an OpenTelemetry-native platform for monitoring LLM applications in production. Track costs, latency, and quality across distributed AI systems with full trace visibility and regression testing. Self-hosted. Open source. Zero vendor lock-in.

What Lumina Does

Cost Tracking

Automatic cost calculation for OpenAI, Anthropic, and major providers. Track spending per service, model, user, or query.

Distributed Tracing

Visualize complex AI workflows including RAG pipelines, agent loops, and multi-model systems with hierarchical span support.

Quality Monitoring

Detect semantic degradation and quality regressions with built-in similarity scoring and custom evaluation metrics.

Replay Testing

Test prompt changes against real production queries before deployment. Compare responses with semantic diff visualization.

Why Lumina

Traditional APM tools treat LLM calls as opaque HTTP requests. Lumina provides native observability for AI systems. The Problem: AI applications are fundamentally different from traditional software. Token costs accumulate rapidly. Response quality degrades silently. Latency compounds across multi-step workflows. Production incidents require full trace context to debug. The Solution: Lumina provides automatic cost calculation, quality tracking, and hierarchical tracing for complex pipelines like RAG and agents. Built on OpenTelemetry standards, Lumina integrates into your existing infrastructure without vendor lock-in.

Quick Links

5-Minute Quickstart

Get Lumina running locally with Docker Compose

SDK Reference

Complete API documentation for TypeScript/JavaScript

Production Deployment

Deploy to production with Kubernetes

Architecture

System design and component details

Trust Signals

OpenTelemetry-Native Built on OpenTelemetry from day one. Works with your existing observability stack without vendor lock-in. Production Ready PostgreSQL backend with production-tested schema. NATS-based reliable ingestion pipeline handles 10M+ traces/day. Self-Hostable Full data control with self-hosting. Deploy on your infrastructure with Docker, Kubernetes, or bare metal. Open Source Apache 2.0 licensed. Free forever with all features included. No paywalled functionality.

How It Works

import { initLumina } from '@uselumina/sdk';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces',
  service_name: 'my-service',
});

// Wrap your LLM calls
const response = await lumina.traceLLM(
  async () =>
    anthropic.messages.create({
      model: 'claude-sonnet-4-5',
      max_tokens: 1024,
      messages: [{ role: 'user', content: 'Hello!' }],
    }),
  {
    name: 'chat-completion',
    system: 'anthropic',
    prompt: 'Hello!',
    metadata: { userId: 'user-123' },
  }
);

// Traces appear in dashboard with automatic cost calculation

Traces appear in the dashboard immediately with:

Automatic cost calculation
Token tracking (prompt, completion)
Latency measurement
Error tracking
Custom metadata

See SDK Reference for complete documentation.

Features

Single-Span Tracing Track individual LLM calls with automatic attribute extraction.

await lumina.traceLLM(
  () => openai.chat.completions.create({ /* ... */ }),
  { name: 'chat', system: 'openai' }
);

Multi-Span Tracing Trace complex workflows like RAG pipelines with parent-child span relationships.

await lumina.trace('rag_pipeline', async (parentSpan) => {
  const docs = await lumina.trace('retrieval', async () => {
    return await vectorDB.search(query);
  });

  const response = await lumina.traceLLM(
    () => llm.generate(buildPrompt(query, docs)),
    { name: 'synthesis', system: 'anthropic' }
  );

  return response;
});

Cost Analytics Track spending by service, model, user, or time period. Set budgets and get alerted when thresholds are exceeded. Quality Monitoring Detect semantic degradation with built-in similarity scoring. Compare responses against baselines with semantic diff. Replay Testing Capture production traces and replay with modified prompts. Compare outputs with semantic scoring before deployment. See Features Documentation for details.

Self-Hosted Limits

The open-source version includes usage limits to prevent abuse:

50,000 traces per day — Resets at midnight UTC
7-day retention — Automatic cleanup of older traces
All features enabled — No paywalled functionality

For unlimited usage, deploy with custom configuration or consider the managed cloud offering.

Architecture

┌─────────────────────┐
│  Application        │
│  + @uselumina/sdk   │
└──────────┬──────────┘
           │ OTLP/HTTP
           v
┌──────────────────────────────────────────────┐
│  Lumina Platform                             │
│                                              │
│  ┌──────────┐    ┌──────────┐   ┌────────┐ │
│  │Ingestion │───►│   NATS   │──►│Workers │ │
│  │  :9411   │    │  Queue   │   │(Async) │ │
│  └──────────┘    └──────────┘   └───┬────┘ │
│                                      │      │
│  ┌──────────┐    ┌──────────────────▼────┐ │
│  │ Query    │◄───│    PostgreSQL         │ │
│  │ API      │    │  (Traces + Analytics) │ │
│  │  :8081   │    └───────────────────────┘ │
│  └────┬─────┘                               │
│       │                                     │
│  ┌────▼──────┐   ┌──────────────┐          │
│  │ Dashboard │   │Replay Engine │          │
│  │  :3000    │   │    :8082     │          │
│  └───────────┘   └──────────────┘          │
└──────────────────────────────────────────────┘

Components:

Ingestion Service — Receives OTLP traces over HTTP. Publishes to NATS for async processing.
Worker Pool — Consumes traces from queue. Calculates costs and persists to PostgreSQL.
Query API — REST endpoints for trace retrieval and analytics. Powers dashboard.
Replay Engine — Captures production traces and re-executes with modified parameters.
Dashboard — Next.js application for visualization and trace inspection.

See Architecture Documentation for full details.

Use Cases

Production Monitoring Track all LLM calls across microservices. Identify expensive queries, slow endpoints, and quality degradations. Cost Optimization Analyze spending by model, service, and user. Find opportunities to switch models or optimize prompts. Regression Testing Test prompt changes against real production queries before deployment. Catch quality regressions with semantic scoring. Debugging Reproduce production issues with full trace context. View prompt, response, model parameters, and execution timeline. Compliance Audit all AI interactions with complete logs. Filter by user, timestamp, or custom metadata.

Next Steps

Install Lumina

Get running in 5 minutes with Docker Compose

Instrument Your App

Add the SDK to your application

View Examples

Explore example applications

Join Community

Ask questions and get help

Support

Documentation: docs.uselumina.io
GitHub Issues: Bug reports & feature requests
Discussions: Community forum
Email: support@uselumina.io

Introduction

Quickstart

Documentation Index

​What Lumina Does

Cost Tracking

Distributed Tracing

Quality Monitoring

Replay Testing

​Why Lumina

​Quick Links

5-Minute Quickstart

SDK Reference

Production Deployment

Architecture

​Trust Signals

​How It Works

​Features

​Self-Hosted Limits

​Architecture

​Use Cases

​Next Steps

Install Lumina

Instrument Your App

View Examples

Join Community

​Support

What Lumina Does

Why Lumina

Quick Links

Trust Signals

How It Works

Features

Self-Hosted Limits

Architecture

Use Cases

Next Steps

Support