Lumina Architecture

System Overview

Lumina is a microservices-based observability platform for LLM applications, consisting of three main services with async message processing via NATS JetStream and caching via Redis.

                    ┌─────────────────────────────────────────────────┐
                    │         CLIENT APPLICATIONS                     │
                    │  Next.js │ Node.js │ Python │ Other Apps        │
                    └──────────────────┬──────────────────────────────┘
                                       │
                                       ▼
                             ┌─────────────────┐
                             │  @lumina/sdk    │
                             │  (Trace Wrapper)│
                             └────────┬────────┘
                                      │ POST /v1/traces
                                      ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │                      LUMINA PLATFORM                             │
    │                                                                  │
    │  ┌─────────────────────┐   ┌─────────────────────┐             │
    │  │ INGESTION :9411     │   │  QUERY API :8081    │             │
    │  ├─────────────────────┤   ├─────────────────────┤             │
    │  │ • Receive Traces    │   │ • Query Traces      │             │
    │  │ • Validate          │   │ • Analytics Engine  │             │
    │  │ • Publish to NATS   │   │ • Filter Engine     │             │
    │  │ • Queue Consumer    │   │ • Aggregations      │             │
    │  └──────────┬──────────┘   └──────────┬──────────┘             │
    │             │                           │                        │
    │             ▼                           │                        │
    │  ┌──────────────────────┐              │                        │
    │  │  NATS JetStream      │              │                        │
    │  │  :4222 / :8222       │              │                        │
    │  │  • Async Queue       │              │                        │
    │  │  • Persistence       │              │                        │
    │  │  • Replay Buffer     │              │                        │
    │  └──────────┬───────────┘              │                        │
    │             │                           │                        │
    │             └───────────┐   ┌───────────┘                        │
    │                         ▼   ▼                                    │
    │                  ┌──────────────────┐      ┌──────────────┐     │
    │                  │   PostgreSQL     │      │    Redis     │     │
    │                  │   :5432          │      │    :6379     │     │
    │                  │   • traces       │      │   • Cache    │     │
    │                  │   • replay_sets  │      │   • Semantic │     │
    │                  │   • replay_results      │     Scores   │     │
    │                  └────────┬─────────┘      └──────────────┘     │
    │                           │                                      │
    │             ┌─────────────┴────────────┐                         │
    │             ▼                          │                         │
    │  ┌─────────────────────┐               │                         │
    │  │ REPLAY ENGINE :8082 │               │                         │
    │  ├─────────────────────┤               │                         │
    │  │ • Capture Sets      │               │                         │
    │  │ • Execute Replays   │               │                         │
    │  │ • Compare Results   │               │                         │
    │  │ • Diff Engine       │               │                         │
    │  └─────────────────────┘               │                         │
    │                                         │                         │
    │  ┌──────────────────────────────────────┘                        │
    │  │                                                                │
    │  ▼                                                                │
    │  ┌─────────────────────────────────────────┐                     │
    │  │         @lumina/core (Shared)           │                     │
    │  ├─────────────────────────────────────────┤                     │
    │  │ • Cost Calculator    • Hash/Similarity  │                     │
    │  │ • Diff Engine        • Baseline Engine  │                     │
    │  │ • Alert Engine       • Semantic Scorer  │                     │
    │  └─────────────────────────────────────────┘                     │
    │                                                                  │
    └──────────────────────────────────────────────────────────────────┘

Component Details

1. Client Applications

Any application instrumented with the Lumina SDK:

Next.js web applications
Node.js backend services
Python services (via OpenTelemetry)
Any OpenTelemetry-compatible client

2. Lumina SDK (`@lumina/sdk`)

Location: /packages/sdk

Responsibility: Client-side instrumentation

Key Features:

Wraps LLM API calls with traceLLM()
Automatically extracts token usage, cost, latency
Sends traces to Ingestion Service
Minimal performance overhead
Configurable endpoints and metadata

Example:

const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces',
  service_name: 'my-app',
});

const result = await lumina.traceLLM(
  async () => anthropic.messages.create(...),
  { name: 'chat', system: 'anthropic', prompt: '...' }
);

3. Ingestion Service (Port 9411)

Location: /services/ingestion

Responsibility: Receive and store traces

Data Flow:

Receive - Accept OpenTelemetry JSON traces via POST
Validate - Check required fields, data types
Enrich - Calculate costs, extract tokens, add timestamps
Store - Insert into PostgreSQL traces table

Key Endpoints:

POST /v1/traces - Ingest traces
GET /health - Health check

Storage Schema:

CREATE TABLE traces (
  trace_id VARCHAR(255),
  span_id VARCHAR(255),
  service_name TEXT,
  model TEXT,
  prompt TEXT,
  response TEXT,
  cost_usd DECIMAL(10, 6),
  latency_ms INTEGER,
  prompt_tokens INTEGER,
  completion_tokens INTEGER,
  tags TEXT[],
  metadata JSONB,
  timestamp TIMESTAMPTZ,
  PRIMARY KEY (trace_id, span_id)
);

4. NATS JetStream (Ports 4222, 8222)

Location: Runs via Docker Compose (infra/docker/docker-compose.yml)

Responsibility: Async message queue for trace processing

Key Features:

Persistent message storage (file-based)
At-least-once delivery guarantee
Message replay capability
Automatic reconnection and retry
Stream configuration with retention policies

Configuration:

// Stream: TRACES
// Subject: traces.ingest
// Retention: 24 hours or 1M messages or 1GB (whichever first)
// Storage: File-based persistence
// Discard: Oldest messages when full

Use Cases:

Async Trace Processing - Decouples ingestion from database writes
Buffer Spike Traffic - Handles bursts of traces without overwhelming database
Replay Failed Messages - Automatically retries failed trace insertions
Observability - Monitor queue depth and processing lag

Client Implementation: /services/ingestion/src/queue/nats-client.ts

5. Redis (Port 6379)

Location: Runs via Docker Compose (infra/docker/docker-compose.yml)

Responsibility: Caching layer for semantic scores and frequent queries

Key Features:

In-memory key-value store
Fast read/write operations
Persistence with AOF (Append-Only File)
TTL support for cache expiration

Use Cases:

Semantic Score Caching - Cache embedding-based similarity scores
Query Result Caching - Cache frequent analytics queries
Session Storage - Store temporary session data (future)
Rate Limiting - Track API request counts per key (future)

Future Integration: Currently provisioned but not yet fully integrated in MVP

6. Query Service (Port 8081)

Location: /services/query

Responsibility: Query traces and generate analytics

Key Features:

Flexible filtering (service, model, tags, date range, cost, latency)
Pagination support
Cost analytics with aggregation
Latency percentile calculations
Tag-based search

Key Endpoints:

GET /api/traces - Query traces with filters
GET /api/traces/{traceId} - Get specific trace
GET /api/analytics/cost - Cost analytics
GET /api/analytics/latency - Latency analytics

Analytics Capabilities:

// Cost Analytics
- Total cost across time ranges
- Average cost per trace
- Cost breakdown by service/model
- Cost trends over time

// Latency Analytics
- P50, P95, P99 percentiles
- Average latency
- Latency breakdown by service/model/endpoint
- Latency trends

5. Replay Engine (Port 8082)

Location: /services/replay

Responsibility: Regression testing via trace replay

Data Flow:

Capture - Select traces and create replay set
Execute - Re-run prompts through LLM APIs
Compare - Calculate diffs using Diff Engine
Store - Save results to replay_results table

Key Endpoints:

POST /replay/capture - Create replay set
POST /replay/run - Execute replay
GET /replay/{id} - Get status and summary
GET /replay/{id}/diff - Get detailed comparisons
GET /replay - List all replay sets

Storage Schema:

CREATE TABLE replay_sets (
  replay_id UUID PRIMARY KEY,
  name TEXT,
  description TEXT,
  trace_ids TEXT[],
  status TEXT, -- pending, running, completed, failed
  total_traces INTEGER,
  completed_traces INTEGER,
  created_at TIMESTAMPTZ
);

CREATE TABLE replay_results (
  result_id UUID PRIMARY KEY,
  replay_id UUID REFERENCES replay_sets,
  trace_id VARCHAR(255),
  span_id VARCHAR(255),
  original_response TEXT,
  replay_response TEXT,
  original_cost DECIMAL,
  replay_cost DECIMAL,
  original_latency INTEGER,
  replay_latency INTEGER,
  hash_similarity DECIMAL,
  semantic_score DECIMAL,
  diff_summary JSONB,
  executed_at TIMESTAMPTZ,
  FOREIGN KEY (trace_id, span_id) REFERENCES traces
);

6. Core Library (`@lumina/core`)

Location: /packages/core

Responsibility: Shared business logic

Modules:

Cost Calculator

// Token-based cost calculation for various models
calculateCost(model, promptTokens, completionTokens);

Hash/Similarity

// Text similarity using Levenshtein distance
textSimilarity(original, replay); // 0.0 - 1.0

Diff Engine

// Comprehensive comparison logic
compareTraces(original, replay); // Returns DiffResult
calculateSemanticScore(original, replay);
calculateCostDelta(originalCost, replayCost);
calculateLatencyDelta(originalLatency, replayLatency);

Baseline Engine

// Track performance baselines
calculateBaseline(traces);
detectDrift(current, baseline);

Alert Engine

// Threshold-based alerting
checkCostThreshold(cost, threshold);
checkLatencyThreshold(latency, threshold);

Data Flow Diagrams

Trace Ingestion Flow

┌─────────────┐          ┌──────────────┐         ┌──────────────────┐       ┌────────────┐
│ Application │          │ @lumina/sdk  │         │ Ingestion Service│       │ PostgreSQL │
└──────┬──────┘          └──────┬───────┘         └────────┬─────────┘       └─────┬──────┘
       │                        │                           │                       │
       │ traceLLM(fn, metadata) │                           │                       │
       │───────────────────────>│                           │                       │
       │                        │                           │                       │
       │  Execute LLM call      │                           │                       │
       │<───────────────────────│                           │                       │
       │                        │                           │                       │
       │  Return result +       │                           │                       │
       │  metadata              │                           │                       │
       │───────────────────────>│                           │                       │
       │                        │  Extract tokens,          │                       │
       │                        │  cost, latency            │                       │
       │                        │                           │                       │
       │                        │  POST /v1/traces          │                       │
       │                        │──────────────────────────>│                       │
       │                        │                           │  Validate trace data  │
       │                        │                           │                       │
       │                        │                           │  Enrich with calcs    │
       │                        │                           │                       │
       │                        │                           │  INSERT INTO traces   │
       │                        │                           │──────────────────────>│
       │                        │                           │                       │
       │                        │                           │      Success          │
       │                        │                           │<──────────────────────│
       │                        │                           │                       │
       │                        │      200 OK               │                       │
       │                        │<──────────────────────────│                       │
       │                        │                           │                       │
       │  Return original result│                           │                       │
       │<───────────────────────│                           │                       │
       │                        │                           │                       │

Query Flow

┌────────┐         ┌───────────────┐      ┌──────────────┐      ┌────────────┐
│ Client │         │ Query Service │      │ @lumina/core │      │ PostgreSQL │
└───┬────┘         └───────┬───────┘      └──────┬───────┘      └─────┬──────┘
    │                      │                     │                     │
    │ GET /api/traces?     │                     │                     │
    │ service=my-app       │                     │                     │
    │─────────────────────>│                     │                     │
    │                      │  Build SQL query    │                     │
    │                      │  with filters       │                     │
    │                      │                     │                     │
    │                      │  SELECT with WHERE clause                │
    │                      │──────────────────────────────────────────>│
    │                      │                     │                     │
    │                      │  Return matching traces                   │
    │                      │<──────────────────────────────────────────│
    │                      │                     │                     │
    │                      │  Apply pagination   │                     │
    │                      │                     │                     │
    │  JSON response       │                     │                     │
    │  (data + pagination) │                     │                     │
    │<─────────────────────│                     │                     │
    │                      │                     │                     │

Replay Flow

┌────────┐      ┌────────────────┐     ┌────────────┐     ┌─────────┐    ┌────────────┐
│ Client │      │ Replay Service │     │ PostgreSQL │     │ LLM API │    │ Diff Engine│
└───┬────┘      └────────┬───────┘     └─────┬──────┘     └────┬────┘    └──────┬─────┘
    │                    │                    │                 │                 │
    │ POST /replay/capture                    │                 │                 │
    │───────────────────>│                    │                 │                 │
    │                    │  Validate trace    │                 │                 │
    │                    │  IDs exist         │                 │                 │
    │                    │───────────────────>│                 │                 │
    │                    │                    │                 │                 │
    │                    │  Traces found      │                 │                 │
    │                    │<───────────────────│                 │                 │
    │                    │                    │                 │                 │
    │                    │  INSERT INTO       │                 │                 │
    │                    │  replay_sets       │                 │                 │
    │                    │───────────────────>│                 │                 │
    │                    │                    │                 │                 │
    │  replay_id         │                    │                 │                 │
    │<───────────────────│                    │                 │                 │
    │                    │                    │                 │                 │
    │ POST /replay/run   │                    │                 │                 │
    │ {replay_id}        │                    │                 │                 │
    │───────────────────>│                    │                 │                 │
    │                    │  SELECT traces     │                 │                 │
    │                    │  from replay_set   │                 │                 │
    │                    │───────────────────>│                 │                 │
    │                    │                    │                 │                 │
    │                    │  Return traces     │                 │                 │
    │                    │<───────────────────│                 │                 │
    │                    │                    │                 │                 │
    │                    │    ╔════════════════════════════════════════════════╗ │
    │                    │    ║ Loop: For each trace                          ║ │
    │                    │    ╠════════════════════════════════════════════════╣ │
    │                    │    ║ Re-execute prompt                             ║ │
    │                    │───────────────────────────────────────────────────────>│
    │                    │                    │                 │                 │
    │                    │    ║ New response + metadata       ║                 │
    │                    │<───────────────────────────────────────────────────────│
    │                    │                    │                 │                 │
    │                    │    ║ compareTraces(original, replay)               ║ │
    │                    │───────────────────────────────────────────────────────>│
    │                    │                    │                 │                 │
    │                    │    ║ DiffResult                    ║                 │
    │                    │<───────────────────────────────────────────────────────│
    │                    │                    │                 │                 │
    │                    │    ║ INSERT INTO replay_results    ║                 │
    │                    │───────────────────>│                 │                 │
    │                    │    ╚════════════════════════════════════════════════╝ │
    │                    │                    │                 │                 │
    │                    │  UPDATE replay_sets│                 │                 │
    │                    │  status='completed'│                 │                 │
    │                    │───────────────────>│                 │                 │
    │                    │                    │                 │                 │
    │  Success + stats   │                    │                 │                 │
    │<───────────────────│                    │                 │                 │
    │                    │                    │                 │                 │

Technology Stack

Runtime & Language

Bun - Fast JavaScript runtime (alternative to Node.js)
TypeScript - Type-safe development

Web Framework

Hono - Lightweight, fast web framework

Database & Storage

PostgreSQL 14+ - Relational database with JSONB support
postgres library - SQL client for Bun
NATS JetStream 2.10 - Message queue with persistence
Redis 7 - In-memory cache with AOF persistence

Client SDKs

Anthropic SDK - For Claude API integration
OpenTelemetry - For trace format compatibility
NATS.js - NATS client for Node.js/Bun

Scalability Considerations

Current MVP Architecture

Single PostgreSQL instance
Services run on localhost
No caching layer
No message queue

Production Recommendations

Database Scaling
- Read replicas for Query Service
- Connection pooling (PgBouncer)
- Partitioning traces table by date
- Archiving old traces to object storage
Service Scaling
- Horizontal scaling behind load balancer
- Container orchestration (Kubernetes)
- Service mesh for inter-service communication
Performance
- Redis cache for frequent queries
- Message queue (RabbitMQ/Kafka) for async ingestion
- CDN for static assets
- Database indexes on common query patterns
Reliability
- Health checks and auto-recovery
- Circuit breakers between services
- Distributed tracing (Jaeger)
- Monitoring (Prometheus + Grafana)

Security Considerations

Current MVP

No authentication
No encryption in transit
No rate limiting

Production Requirements

Authentication & Authorization
- API key management
- JWT tokens for user sessions
- Role-based access control (RBAC)
Encryption
- TLS/HTTPS for all endpoints
- Encrypted database connections
- Encrypted data at rest
Rate Limiting
- Per-API-key limits
- DDoS protection
- Request throttling
Data Privacy
- PII detection and masking
- Prompt/response sanitization
- GDPR compliance features

Port Allocation

Service	Port	Purpose
Ingestion	9411	OpenTelemetry compatibility (Zipkin port)
Query API	8081	Query and analytics
Replay Engine	8082	Replay testing
PostgreSQL	5432	Database (default)
Example App	3000	Next.js demo application

File Structure

lumina/
├── packages/
│   ├── sdk/              # Client SDK
│   └── core/             # Shared business logic
├── services/
│   ├── ingestion/        # Trace ingestion service
│   ├── query/            # Query API service
│   └── replay/           # Replay engine service
├── examples/
│   └── nextjs-rag/       # Example application
└── docs/                 # Documentation

Future Enhancements

Dashboard UI - Web interface for visualizing traces
Real-time Streaming - WebSocket support for live traces
Advanced Analytics - Machine learning-based insights
Multi-tenancy - Support for multiple organizations
Alerting - Webhook notifications for thresholds
Semantic Search - Embedding-based similarity (currently hash-based)
Distributed Tracing - Full span tree visualization
Cost Forecasting - Predict future costs based on trends

Lumina Architecture

System Overview

Component Details

1. Client Applications

2. Lumina SDK (@lumina/sdk)

3. Ingestion Service (Port 9411)

4. NATS JetStream (Ports 4222, 8222)

5. Redis (Port 6379)

6. Query Service (Port 8081)

5. Replay Engine (Port 8082)

6. Core Library (@lumina/core)

Cost Calculator

Hash/Similarity

Diff Engine

Baseline Engine

Alert Engine

Data Flow Diagrams

Trace Ingestion Flow

Query Flow

Replay Flow

Technology Stack

Runtime & Language

Web Framework

Database & Storage

Client SDKs

Scalability Considerations

Current MVP Architecture

Production Recommendations

Security Considerations

Current MVP

Production Requirements

Port Allocation

File Structure

Future Enhancements

Related Documentation

2. Lumina SDK (`@lumina/sdk`)

6. Core Library (`@lumina/core`)