Lumina Architecture
System Overview
Lumina is a microservices-based observability platform for LLM applications, consisting of three main services with async message processing via NATS JetStream and caching via Redis.
┌─────────────────────────────────────────────────┐
│ CLIENT APPLICATIONS │
│ Next.js │ Node.js │ Python │ Other Apps │
└──────────────────┬──────────────────────────────┘
│
▼
┌─────────────────┐
│ @lumina/sdk │
│ (Trace Wrapper)│
└────────┬────────┘
│ POST /v1/traces
▼
┌──────────────────────────────────────────────────────────────────┐
│ LUMINA PLATFORM │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ INGESTION :9411 │ │ QUERY API :8081 │ │
│ ├─────────────────────┤ ├─────────────────────┤ │
│ │ • Receive Traces │ │ • Query Traces │ │
│ │ • Validate │ │ • Analytics Engine │ │
│ │ • Publish to NATS │ │ • Filter Engine │ │
│ │ • Queue Consumer │ │ • Aggregations │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────────┐ │ │
│ │ NATS JetStream │ │ │
│ │ :4222 / :8222 │ │ │
│ │ • Async Queue │ │ │
│ │ • Persistence │ │ │
│ │ • Replay Buffer │ │ │
│ └──────────┬───────────┘ │ │
│ │ │ │
│ └───────────┐ ┌───────────┘ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Redis │ │
│ │ :5432 │ │ :6379 │ │
│ │ • traces │ │ • Cache │ │
│ │ • replay_sets │ │ • Semantic │ │
│ │ • replay_results │ Scores │ │
│ └────────┬─────────┘ └──────────────┘ │
│ │ │
│ ┌─────────────┴────────────┐ │
│ ▼ │ │
│ ┌─────────────────────┐ │ │
│ │ REPLAY ENGINE :8082 │ │ │
│ ├─────────────────────┤ │ │
│ │ • Capture Sets │ │ │
│ │ • Execute Replays │ │ │
│ │ • Compare Results │ │ │
│ │ • Diff Engine │ │ │
│ └─────────────────────┘ │ │
│ │ │
│ ┌──────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ @lumina/core (Shared) │ │
│ ├─────────────────────────────────────────┤ │
│ │ • Cost Calculator • Hash/Similarity │ │
│ │ • Diff Engine • Baseline Engine │ │
│ │ • Alert Engine • Semantic Scorer │ │
│ └─────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Component Details
1. Client Applications
Any application instrumented with the Lumina SDK:
- Next.js web applications
- Node.js backend services
- Python services (via OpenTelemetry)
- Any OpenTelemetry-compatible client
2. Lumina SDK (@lumina/sdk)
Location: /packages/sdk
Responsibility: Client-side instrumentation
Key Features:
- Wraps LLM API calls with
traceLLM() - Automatically extracts token usage, cost, latency
- Sends traces to Ingestion Service
- Minimal performance overhead
- Configurable endpoints and metadata
Example:
const lumina = initLumina({
endpoint: 'http://localhost:9411/v1/traces',
service_name: 'my-app',
});
const result = await lumina.traceLLM(
async () => anthropic.messages.create(...),
{ name: 'chat', system: 'anthropic', prompt: '...' }
);
3. Ingestion Service (Port 9411)
Location: /services/ingestion
Responsibility: Receive and store traces
Data Flow:
- Receive - Accept OpenTelemetry JSON traces via POST
- Validate - Check required fields, data types
- Enrich - Calculate costs, extract tokens, add timestamps
- Store - Insert into PostgreSQL
tracestable
Key Endpoints:
POST /v1/traces- Ingest tracesGET /health- Health check
Storage Schema:
CREATE TABLE traces (
trace_id VARCHAR(255),
span_id VARCHAR(255),
service_name TEXT,
model TEXT,
prompt TEXT,
response TEXT,
cost_usd DECIMAL(10, 6),
latency_ms INTEGER,
prompt_tokens INTEGER,
completion_tokens INTEGER,
tags TEXT[],
metadata JSONB,
timestamp TIMESTAMPTZ,
PRIMARY KEY (trace_id, span_id)
);
4. NATS JetStream (Ports 4222, 8222)
Location: Runs via Docker Compose (infra/docker/docker-compose.yml)
Responsibility: Async message queue for trace processing
Key Features:
- Persistent message storage (file-based)
- At-least-once delivery guarantee
- Message replay capability
- Automatic reconnection and retry
- Stream configuration with retention policies
Configuration:
// Stream: TRACES
// Subject: traces.ingest
// Retention: 24 hours or 1M messages or 1GB (whichever first)
// Storage: File-based persistence
// Discard: Oldest messages when full
Use Cases:
- Async Trace Processing - Decouples ingestion from database writes
- Buffer Spike Traffic - Handles bursts of traces without overwhelming database
- Replay Failed Messages - Automatically retries failed trace insertions
- Observability - Monitor queue depth and processing lag
Client Implementation: /services/ingestion/src/queue/nats-client.ts
5. Redis (Port 6379)
Location: Runs via Docker Compose (infra/docker/docker-compose.yml)
Responsibility: Caching layer for semantic scores and frequent queries
Key Features:
- In-memory key-value store
- Fast read/write operations
- Persistence with AOF (Append-Only File)
- TTL support for cache expiration
Use Cases:
- Semantic Score Caching - Cache embedding-based similarity scores
- Query Result Caching - Cache frequent analytics queries
- Session Storage - Store temporary session data (future)
- Rate Limiting - Track API request counts per key (future)
Future Integration: Currently provisioned but not yet fully integrated in MVP
6. Query Service (Port 8081)
Location: /services/query
Responsibility: Query traces and generate analytics
Key Features:
- Flexible filtering (service, model, tags, date range, cost, latency)
- Pagination support
- Cost analytics with aggregation
- Latency percentile calculations
- Tag-based search
Key Endpoints:
GET /api/traces- Query traces with filtersGET /api/traces/{traceId}- Get specific traceGET /api/analytics/cost- Cost analyticsGET /api/analytics/latency- Latency analytics
Analytics Capabilities:
// Cost Analytics
- Total cost across time ranges
- Average cost per trace
- Cost breakdown by service/model
- Cost trends over time
// Latency Analytics
- P50, P95, P99 percentiles
- Average latency
- Latency breakdown by service/model/endpoint
- Latency trends
5. Replay Engine (Port 8082)
Location: /services/replay
Responsibility: Regression testing via trace replay
Data Flow:
- Capture - Select traces and create replay set
- Execute - Re-run prompts through LLM APIs
- Compare - Calculate diffs using Diff Engine
- Store - Save results to
replay_resultstable
Key Endpoints:
POST /replay/capture- Create replay setPOST /replay/run- Execute replayGET /replay/{id}- Get status and summaryGET /replay/{id}/diff- Get detailed comparisonsGET /replay- List all replay sets
Storage Schema:
CREATE TABLE replay_sets (
replay_id UUID PRIMARY KEY,
name TEXT,
description TEXT,
trace_ids TEXT[],
status TEXT, -- pending, running, completed, failed
total_traces INTEGER,
completed_traces INTEGER,
created_at TIMESTAMPTZ
);
CREATE TABLE replay_results (
result_id UUID PRIMARY KEY,
replay_id UUID REFERENCES replay_sets,
trace_id VARCHAR(255),
span_id VARCHAR(255),
original_response TEXT,
replay_response TEXT,
original_cost DECIMAL,
replay_cost DECIMAL,
original_latency INTEGER,
replay_latency INTEGER,
hash_similarity DECIMAL,
semantic_score DECIMAL,
diff_summary JSONB,
executed_at TIMESTAMPTZ,
FOREIGN KEY (trace_id, span_id) REFERENCES traces
);
6. Core Library (@lumina/core)
Location: /packages/core
Responsibility: Shared business logic
Modules:
Cost Calculator
// Token-based cost calculation for various models
calculateCost(model, promptTokens, completionTokens);
Hash/Similarity
// Text similarity using Levenshtein distance
textSimilarity(original, replay); // 0.0 - 1.0
Diff Engine
// Comprehensive comparison logic
compareTraces(original, replay); // Returns DiffResult
calculateSemanticScore(original, replay);
calculateCostDelta(originalCost, replayCost);
calculateLatencyDelta(originalLatency, replayLatency);
Baseline Engine
// Track performance baselines
calculateBaseline(traces);
detectDrift(current, baseline);
Alert Engine
// Threshold-based alerting
checkCostThreshold(cost, threshold);
checkLatencyThreshold(latency, threshold);
Data Flow Diagrams
Trace Ingestion Flow
┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ ┌────────────┐
│ Application │ │ @lumina/sdk │ │ Ingestion Service│ │ PostgreSQL │
└──────┬──────┘ └──────┬───────┘ └────────┬─────────┘ └─────┬──────┘
│ │ │ │
│ traceLLM(fn, metadata) │ │ │
│───────────────────────>│ │ │
│ │ │ │
│ Execute LLM call │ │ │
│<───────────────────────│ │ │
│ │ │ │
│ Return result + │ │ │
│ metadata │ │ │
│───────────────────────>│ │ │
│ │ Extract tokens, │ │
│ │ cost, latency │ │
│ │ │ │
│ │ POST /v1/traces │ │
│ │──────────────────────────>│ │
│ │ │ Validate trace data │
│ │ │ │
│ │ │ Enrich with calcs │
│ │ │ │
│ │ │ INSERT INTO traces │
│ │ │──────────────────────>│
│ │ │ │
│ │ │ Success │
│ │ │<──────────────────────│
│ │ │ │
│ │ 200 OK │ │
│ │<──────────────────────────│ │
│ │ │ │
│ Return original result│ │ │
│<───────────────────────│ │ │
│ │ │ │
Query Flow
┌────────┐ ┌───────────────┐ ┌──────────────┐ ┌────────────┐
│ Client │ │ Query Service │ │ @lumina/core │ │ PostgreSQL │
└───┬────┘ └───────┬───────┘ └──────┬───────┘ └─────┬──────┘
│ │ │ │
│ GET /api/traces? │ │ │
│ service=my-app │ │ │
│─────────────────────>│ │ │
│ │ Build SQL query │ │
│ │ with filters │ │
│ │ │ │
│ │ SELECT with WHERE clause │
│ │──────────────────────────────────────────>│
│ │ │ │
│ │ Return matching traces │
│ │<──────────────────────────────────────────│
│ │ │ │
│ │ Apply pagination │ │
│ │ │ │
│ JSON response │ │ │
│ (data + pagination) │ │ │
│<─────────────────────│ │ │
│ │ │ │
Replay Flow
┌────────┐ ┌────────────────┐ ┌────────────┐ ┌─────────┐ ┌────────────┐
│ Client │ │ Replay Service │ │ PostgreSQL │ │ LLM API │ │ Diff Engine│
└───┬────┘ └────────┬───────┘ └─────┬──────┘ └────┬────┘ └──────┬─────┘
│ │ │ │ │
│ POST /replay/capture │ │ │
│───────────────────>│ │ │ │
│ │ Validate trace │ │ │
│ │ IDs exist │ │ │
│ │───────────────────>│ │ │
│ │ │ │ │
│ │ Traces found │ │ │
│ │<───────────────────│ │ │
│ │ │ │ │
│ │ INSERT INTO │ │ │
│ │ replay_sets │ │ │
│ │───────────────────>│ │ │
│ │ │ │ │
│ replay_id │ │ │ │
│<───────────────────│ │ │ │
│ │ │ │ │
│ POST /replay/run │ │ │ │
│ {replay_id} │ │ │ │
│───────────────────>│ │ │ │
│ │ SELECT traces │ │ │
│ │ from replay_set │ │ │
│ │───────────────────>│ │ │
│ │ │ │ │
│ │ Return traces │ │ │
│ │<───────────────────│ │ │
│ │ │ │ │
│ │ ╔════════════════════════════════════════════════╗ │
│ │ ║ Loop: For each trace ║ │
│ │ ╠════════════════════════════════════════════════╣ │
│ │ ║ Re-execute prompt ║ │
│ │───────────────────────────────────────────────────────>│
│ │ │ │ │
│ │ ║ New response + metadata ║ │
│ │<───────────────────────────────────────────────────────│
│ │ │ │ │
│ │ ║ compareTraces(original, replay) ║ │
│ │───────────────────────────────────────────────────────>│
│ │ │ │ │
│ │ ║ DiffResult ║ │
│ │<───────────────────────────────────────────────────────│
│ │ │ │ │
│ │ ║ INSERT INTO replay_results ║ │
│ │───────────────────>│ │ │
│ │ ╚════════════════════════════════════════════════╝ │
│ │ │ │ │
│ │ UPDATE replay_sets│ │ │
│ │ status='completed'│ │ │
│ │───────────────────>│ │ │
│ │ │ │ │
│ Success + stats │ │ │ │
│<───────────────────│ │ │ │
│ │ │ │ │
Technology Stack
Runtime & Language
- Bun - Fast JavaScript runtime (alternative to Node.js)
- TypeScript - Type-safe development
Web Framework
- Hono - Lightweight, fast web framework
Database & Storage
- PostgreSQL 14+ - Relational database with JSONB support
- postgres library - SQL client for Bun
- NATS JetStream 2.10 - Message queue with persistence
- Redis 7 - In-memory cache with AOF persistence
Client SDKs
- Anthropic SDK - For Claude API integration
- OpenTelemetry - For trace format compatibility
- NATS.js - NATS client for Node.js/Bun
Scalability Considerations
Current MVP Architecture
- Single PostgreSQL instance
- Services run on localhost
- No caching layer
- No message queue
Production Recommendations
- Database Scaling
- Read replicas for Query Service
- Connection pooling (PgBouncer)
- Partitioning traces table by date
- Archiving old traces to object storage
- Service Scaling
- Horizontal scaling behind load balancer
- Container orchestration (Kubernetes)
- Service mesh for inter-service communication
- Performance
- Redis cache for frequent queries
- Message queue (RabbitMQ/Kafka) for async ingestion
- CDN for static assets
- Database indexes on common query patterns
- Reliability
- Health checks and auto-recovery
- Circuit breakers between services
- Distributed tracing (Jaeger)
- Monitoring (Prometheus + Grafana)
Security Considerations
Current MVP
- No authentication
- No encryption in transit
- No rate limiting
Production Requirements
- Authentication & Authorization
- API key management
- JWT tokens for user sessions
- Role-based access control (RBAC)
- Encryption
- TLS/HTTPS for all endpoints
- Encrypted database connections
- Encrypted data at rest
- Rate Limiting
- Per-API-key limits
- DDoS protection
- Request throttling
- Data Privacy
- PII detection and masking
- Prompt/response sanitization
- GDPR compliance features
Port Allocation
| Service | Port | Purpose |
|---|---|---|
| Ingestion | 9411 | OpenTelemetry compatibility (Zipkin port) |
| Query API | 8081 | Query and analytics |
| Replay Engine | 8082 | Replay testing |
| PostgreSQL | 5432 | Database (default) |
| Example App | 3000 | Next.js demo application |
File Structure
lumina/
├── packages/
│ ├── sdk/ # Client SDK
│ └── core/ # Shared business logic
├── services/
│ ├── ingestion/ # Trace ingestion service
│ ├── query/ # Query API service
│ └── replay/ # Replay engine service
├── examples/
│ └── nextjs-rag/ # Example application
└── docs/ # Documentation
Future Enhancements
- Dashboard UI - Web interface for visualizing traces
- Real-time Streaming - WebSocket support for live traces
- Advanced Analytics - Machine learning-based insights
- Multi-tenancy - Support for multiple organizations
- Alerting - Webhook notifications for thresholds
- Semantic Search - Embedding-based similarity (currently hash-based)
- Distributed Tracing - Full span tree visualization
- Cost Forecasting - Predict future costs based on trends