Skip to main content
Lumina is an OpenTelemetry-native platform for monitoring LLM applications in production. Track costs, latency, and quality across distributed AI systems with full trace visibility and regression testing. Self-hosted. Open source. Zero vendor lock-in.

What Lumina Does

Cost Tracking

Automatic cost calculation for OpenAI, Anthropic, and major providers. Track spending per service, model, user, or query.

Distributed Tracing

Visualize complex AI workflows including RAG pipelines, agent loops, and multi-model systems with hierarchical span support.

Quality Monitoring

Detect semantic degradation and quality regressions with built-in similarity scoring and custom evaluation metrics.

Replay Testing

Test prompt changes against real production queries before deployment. Compare responses with semantic diff visualization.

Why Lumina

Traditional APM tools treat LLM calls as opaque HTTP requests. Lumina provides native observability for AI systems. The Problem: AI applications are fundamentally different from traditional software. Token costs accumulate rapidly. Response quality degrades silently. Latency compounds across multi-step workflows. Production incidents require full trace context to debug. The Solution: Lumina provides automatic cost calculation, quality tracking, and hierarchical tracing for complex pipelines like RAG and agents. Built on OpenTelemetry standards, Lumina integrates into your existing infrastructure without vendor lock-in.

Trust Signals

OpenTelemetry-Native Built on OpenTelemetry from day one. Works with your existing observability stack without vendor lock-in. Production Ready PostgreSQL backend with production-tested schema. NATS-based reliable ingestion pipeline handles 10M+ traces/day. Self-Hostable Full data control with self-hosting. Deploy on your infrastructure with Docker, Kubernetes, or bare metal. Open Source Apache 2.0 licensed. Free forever with all features included. No paywalled functionality.

How It Works

import { initLumina } from '@uselumina/sdk';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces',
  service_name: 'my-service',
});

// Wrap your LLM calls
const response = await lumina.traceLLM(
  async () =>
    anthropic.messages.create({
      model: 'claude-sonnet-4-5',
      max_tokens: 1024,
      messages: [{ role: 'user', content: 'Hello!' }],
    }),
  {
    name: 'chat-completion',
    system: 'anthropic',
    prompt: 'Hello!',
    metadata: { userId: 'user-123' },
  }
);

// Traces appear in dashboard with automatic cost calculation
Traces appear in the dashboard immediately with:
  • Automatic cost calculation
  • Token tracking (prompt, completion)
  • Latency measurement
  • Error tracking
  • Custom metadata
See SDK Reference for complete documentation.

Features

Single-Span Tracing Track individual LLM calls with automatic attribute extraction.
await lumina.traceLLM(
  () => openai.chat.completions.create({ /* ... */ }),
  { name: 'chat', system: 'openai' }
);
Multi-Span Tracing Trace complex workflows like RAG pipelines with parent-child span relationships.
await lumina.trace('rag_pipeline', async (parentSpan) => {
  const docs = await lumina.trace('retrieval', async () => {
    return await vectorDB.search(query);
  });

  const response = await lumina.traceLLM(
    () => llm.generate(buildPrompt(query, docs)),
    { name: 'synthesis', system: 'anthropic' }
  );

  return response;
});
Cost Analytics Track spending by service, model, user, or time period. Set budgets and get alerted when thresholds are exceeded. Quality Monitoring Detect semantic degradation with built-in similarity scoring. Compare responses against baselines with semantic diff. Replay Testing Capture production traces and replay with modified prompts. Compare outputs with semantic scoring before deployment. See Features Documentation for details.

Self-Hosted Limits

The open-source version includes usage limits to prevent abuse:
  • 50,000 traces per day — Resets at midnight UTC
  • 7-day retention — Automatic cleanup of older traces
  • All features enabled — No paywalled functionality
For unlimited usage, deploy with custom configuration or consider the managed cloud offering.

Architecture

┌─────────────────────┐
│  Application        │
│  + @uselumina/sdk   │
└──────────┬──────────┘
           │ OTLP/HTTP
           v
┌──────────────────────────────────────────────┐
│  Lumina Platform                             │
│                                              │
│  ┌──────────┐    ┌──────────┐   ┌────────┐ │
│  │Ingestion │───►│   NATS   │──►│Workers │ │
│  │  :9411   │    │  Queue   │   │(Async) │ │
│  └──────────┘    └──────────┘   └───┬────┘ │
│                                      │      │
│  ┌──────────┐    ┌──────────────────▼────┐ │
│  │ Query    │◄───│    PostgreSQL         │ │
│  │ API      │    │  (Traces + Analytics) │ │
│  │  :8081   │    └───────────────────────┘ │
│  └────┬─────┘                               │
│       │                                     │
│  ┌────▼──────┐   ┌──────────────┐          │
│  │ Dashboard │   │Replay Engine │          │
│  │  :3000    │   │    :8082     │          │
│  └───────────┘   └──────────────┘          │
└──────────────────────────────────────────────┘
Components:
  • Ingestion Service — Receives OTLP traces over HTTP. Publishes to NATS for async processing.
  • Worker Pool — Consumes traces from queue. Calculates costs and persists to PostgreSQL.
  • Query API — REST endpoints for trace retrieval and analytics. Powers dashboard.
  • Replay Engine — Captures production traces and re-executes with modified parameters.
  • Dashboard — Next.js application for visualization and trace inspection.
See Architecture Documentation for full details.

Use Cases

Production Monitoring Track all LLM calls across microservices. Identify expensive queries, slow endpoints, and quality degradations. Cost Optimization Analyze spending by model, service, and user. Find opportunities to switch models or optimize prompts. Regression Testing Test prompt changes against real production queries before deployment. Catch quality regressions with semantic scoring. Debugging Reproduce production issues with full trace context. View prompt, response, model parameters, and execution timeline. Compliance Audit all AI interactions with complete logs. Filter by user, timestamp, or custom metadata.

Next Steps


Support