Troubleshooting Guide

This guide covers common issues you might encounter when setting up and using Lumina.

Table of Contents


Installation Issues

Problem: bun: command not found

Symptoms:

$ bun install
bash: bun: command not found

Solution: Install Bun by running:

curl -fsSL https://bun.sh/install | bash

Then restart your terminal or run:

source ~/.bashrc  # or ~/.zshrc

Verification:

bun --version

Problem: Package installation fails

Symptoms:

$ bun install
error: unable to resolve dependency

Solution:

  1. Clear Bun cache:
rm -rf ~/.bun/install/cache
  1. Delete node_modules and lockfile:
rm -rf node_modules bun.lockb
  1. Reinstall:
bun install

Database Connection Issues

Problem: database "lumina" does not exist

Symptoms:

Failed to connect to database: database "lumina" does not exist

Solution: Create the database:

createdb lumina

Or using psql:

psql postgres
CREATE DATABASE lumina;
\q

Verification:

psql -d lumina -c "SELECT 1"

Problem: PostgreSQL connection refused

Symptoms:

Failed to connect to database: connection refused
ECONNREFUSED localhost:5432

Solution:

  1. Check if PostgreSQL is running:
# macOS
brew services list | grep postgresql

# Linux
systemctl status postgresql
  1. Start PostgreSQL if not running:
# macOS
brew services start postgresql

# Linux
sudo systemctl start postgresql
  1. Verify port 5432 is listening:
lsof -i :5432

Problem: Authentication failed for user

Symptoms:

PostgresError: password authentication failed for user "username"

Solution:

  1. Check your connection string in environment:
echo $DATABASE_URL
  1. Update with correct credentials:
export DATABASE_URL="postgres://username:password@localhost:5432/lumina"
  1. For local development without password:
export DATABASE_URL="postgres://username@localhost:5432/lumina"

Problem: Table already exists error

Symptoms:

PostgresError: relation "traces" already exists

Solution: This is usually harmless. Tables are created with IF NOT EXISTS clause. If you need to reset:

psql -d lumina -c "DROP TABLE IF EXISTS replay_results CASCADE; DROP TABLE IF EXISTS replay_sets CASCADE; DROP TABLE IF EXISTS traces CASCADE;"

Then restart the services to recreate tables.


Service Startup Issues

Problem: Port already in use

Symptoms:

Error: listen EADDRINUSE: address already in use :::9411

Solution:

  1. Find the process using the port:
# macOS/Linux
lsof -i :9411

# Or use netstat
netstat -tuln | grep 9411
  1. Kill the process:
kill -9 <PID>
  1. Or change the port:
# For ingestion service
PORT=9412 bun run dev

# For query service
cd services/query
PORT=8090 bun run dev

# For replay service
cd services/replay
PORT=8091 bun run dev

Problem: Service crashes immediately

Symptoms:

$ bun run dev
🚀 Starting service...
[ERROR] Uncaught Error: ...

Solution:

  1. Check logs for specific error message
  2. Verify DATABASE_URL is set:
echo $DATABASE_URL
  1. Ensure database is accessible:
psql -d lumina -c "SELECT 1"
  1. Check for missing dependencies:
bun install

Problem: Database tables not created

Symptoms: Service starts but queries fail with “relation does not exist”

Solution:

  1. Check that initialize() is called on startup
  2. Manually create tables:
-- Connect to database
psql -d lumina

-- For ingestion service (traces table)
-- See /services/ingestion/src/database/postgres.ts for schema

-- For replay service (replay tables)
-- See /services/replay/src/database/postgres.ts for schema
  1. Restart the service after table creation

Trace Ingestion Issues

Problem: Traces not appearing in database

Symptoms: SDK sends traces but database remains empty

Diagnostic Steps:

  1. Check ingestion service logs:
cd services/ingestion
bun run dev
# Look for incoming requests
  1. Test ingestion directly:
curl -X POST http://localhost:9411/v1/traces \
  -H "Content-Type: application/json" \
  -d '[{
    "traceId": "test123",
    "spanId": "span123",
    "serviceName": "test",
    "name": "test-trace",
    "timestamp": "'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"
  }]'
  1. Check database:
psql -d lumina -c "SELECT COUNT(*) FROM traces;"

Common Causes:

  • Wrong ingestion endpoint in SDK config
  • Ingestion service not running
  • Database connection issue
  • Validation errors in trace data

Problem: Invalid trace format error

Symptoms:

{
  "error": "Invalid trace format",
  "message": "Missing required fields"
}

Solution: Ensure your traces include required fields:

{
  traceId: string,
  spanId: string,
  serviceName: string,
  name: string,
  timestamp: string (ISO 8601)
}

Check SDK configuration:

const lumina = initLumina({
  endpoint: 'http://localhost:9411/v1/traces',
  service_name: 'my-app', // Required
});

Problem: Cost not calculated

Symptoms: Traces ingested but cost_usd is 0 or null

Solution:

  1. Ensure token counts are provided in trace attributes:
{
  attributes: {
    'llm.usage.prompt_tokens': 100,
    'llm.usage.completion_tokens': 50,
    'llm.model': 'claude-sonnet-4-5'
  }
}
  1. Check if model is supported in cost calculator:
// See packages/core/src/cost-calculator.ts
  1. Verify cost calculation is enabled in ingestion service

Query API Issues

Problem: Query returns empty results

Symptoms:

curl "http://localhost:8081/api/traces"
# Returns: {"data": [], "pagination": {...}}

Diagnostic Steps:

  1. Check if traces exist:
psql -d lumina -c "SELECT COUNT(*) FROM traces;"
  1. Check query filters:
# Try without filters
curl "http://localhost:8081/api/traces?limit=10"

# Check specific service
curl "http://localhost:8081/api/traces?service=my-app"
  1. Check date range (default might exclude your traces):
curl "http://localhost:8081/api/traces?startDate=2024-01-01"

Problem: Query API returns 500 error

Symptoms:

{
  "error": "Internal server error",
  "message": "..."
}

Solution:

  1. Check Query service logs for detailed error
  2. Verify database connection:
psql -d lumina -c "SELECT 1"
  1. Check for SQL syntax errors in filters
  2. Ensure indexes exist for query performance

Problem: Analytics returns incorrect values

Symptoms: Cost or latency analytics show unexpected numbers

Solution:

  1. Verify data in database:
psql -d lumina

-- Check cost data
SELECT service_name, AVG(cost_usd), SUM(cost_usd)
FROM traces
GROUP BY service_name;

-- Check latency data
SELECT service_name, AVG(latency_ms)
FROM traces
GROUP BY service_name;
  1. Check for null values:
SELECT COUNT(*) FROM traces WHERE cost_usd IS NULL;
SELECT COUNT(*) FROM traces WHERE latency_ms IS NULL;
  1. Verify aggregation logic in Query service code

Replay Engine Issues

Problem: Replay set creation fails

Symptoms:

{
  "error": "Invalid traces",
  "message": "Found 0 of 5 traces"
}

Solution:

  1. Verify trace IDs exist:
psql -d lumina -c "SELECT trace_id FROM traces LIMIT 10;"
  1. Use exact trace_id values from database:
curl -X POST http://localhost:8082/replay/capture \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Test Replay",
    "traceIds": ["actual-trace-id-from-db"]
  }'

Problem: Replay execution fails

Symptoms: Replay status shows “failed” or execution hangs

Diagnostic Steps:

  1. Check replay service logs
  2. Verify traces have all required fields:
SELECT trace_id, span_id, prompt, response
FROM traces
WHERE trace_id = 'your-trace-id';
  1. Check for null prompts or responses:
SELECT COUNT(*) FROM traces
WHERE prompt IS NULL OR response IS NULL;

Solution:

  • Ensure original traces have complete data
  • Check for API rate limits if calling external LLM APIs
  • Verify LLM API credentials are configured

Problem: Diff results show 0 similarity

Symptoms: All replays show hash_similarity = 0

Solution:

  1. Check if responses are being captured:
SELECT original_response, replay_response
FROM replay_results
LIMIT 1;
  1. Verify Diff Engine is working:
// Test similarity calculation
import { textSimilarity } from '@lumina/core';
const score = textSimilarity('hello', 'hello');
console.log(score); // Should be 1.0
  1. Check for encoding issues in text comparison

Problem: Foreign key constraint error

Symptoms:

PostgresError: there is no unique constraint matching given keys for referenced table

Solution: This was fixed in the codebase. If you still see this:

  1. Ensure replay_results table uses composite foreign key:
FOREIGN KEY (trace_id, span_id)
REFERENCES traces(trace_id, span_id)
  1. Verify trace_ids in replay_sets are TEXT[] not UUID[]:
ALTER TABLE replay_sets
ALTER COLUMN trace_ids TYPE TEXT[]
USING trace_ids::TEXT[];

SDK Integration Issues

Problem: Traces not sent from application

Symptoms: Application runs but no traces appear in Lumina

Diagnostic Steps:

  1. Enable debug logging:
const lumina = initLumina({
  debug: true,
  // ...other config
});
  1. Check network requests in application logs
  2. Verify endpoint is reachable:
curl http://localhost:9411/health

Solution:

  • Check firewall rules
  • Ensure ingestion service is running
  • Verify endpoint URL is correct (no typos)
  • Check for CORS issues if calling from browser

Problem: TypeScript errors with SDK

Symptoms:

Type 'X' is not assignable to type 'Y'

Solution:

  1. Ensure @lumina/sdk is properly installed:
bun add @lumina/sdk
  1. Check TypeScript version compatibility:
bun add -D typescript@latest
  1. Import types correctly:
import { initLumina, type LuminaConfig } from '@lumina/sdk';

Problem: Token counts not captured

Symptoms: Traces ingested but prompt_tokens and completion_tokens are null

Solution:

  1. For Anthropic SDK, tokens are in response.usage:
const response = await anthropic.messages.create({...});
console.log(response.usage); // { input_tokens, output_tokens }
  1. Lumina SDK automatically extracts these. If not working:
// Manually pass token info
await lumina.traceLLM(async () => response, {
  name: 'test',
  system: 'anthropic',
  prompt: '...',
  metadata: {
    tokens: {
      prompt: response.usage.input_tokens,
      completion: response.usage.output_tokens,
    },
  },
});

Performance Issues

Problem: Slow trace ingestion

Symptoms: Traces take several seconds to ingest

Solutions:

  1. Check database performance:
-- Check for missing indexes
\d traces

-- Add indexes if missing
CREATE INDEX IF NOT EXISTS idx_traces_timestamp ON traces(timestamp);
CREATE INDEX IF NOT EXISTS idx_traces_service ON traces(service_name);
  1. Optimize database connection pool:
// In database/postgres.ts
this.sql = postgres(this.connectionString, {
  max: 20, // Increase pool size
  idle_timeout: 20,
  connect_timeout: 10,
});
  1. Consider async ingestion (future enhancement)

Problem: Slow query performance

Symptoms: API queries take > 2 seconds to return

Solutions:

  1. Add database indexes:
CREATE INDEX idx_traces_service ON traces(service_name);
CREATE INDEX idx_traces_model ON traces(model);
CREATE INDEX idx_traces_timestamp ON traces(timestamp DESC);
CREATE INDEX idx_traces_cost ON traces(cost_usd);
CREATE INDEX idx_traces_latency ON traces(latency_ms);
  1. Reduce query limit:
curl "http://localhost:8081/api/traces?limit=20"
  1. Use more specific filters:
# Instead of querying all traces
curl "http://localhost:8081/api/traces"

# Query specific service and date range
curl "http://localhost:8081/api/traces?service=my-app&startDate=2024-01-15"
  1. Implement pagination properly:
# Page through results
curl "http://localhost:8081/api/traces?limit=50&offset=0"
curl "http://localhost:8081/api/traces?limit=50&offset=50"

Problem: High memory usage

Symptoms: Services consume excessive RAM

Solutions:

  1. Reduce connection pool size:
postgres(connectionString, {
  max: 5, // Reduce from 10
});
  1. Implement pagination in queries
  2. Add memory limits to Bun process:
NODE_OPTIONS="--max-old-space-size=512" bun run dev
  1. Archive old traces:
-- Move old traces to archive table
CREATE TABLE traces_archive AS
SELECT * FROM traces
WHERE timestamp < NOW() - INTERVAL '30 days';

DELETE FROM traces
WHERE timestamp < NOW() - INTERVAL '30 days';

Getting Help

If you’ve tried these solutions and still have issues:

  1. Check logs - Enable debug logging in all services
  2. Search GitHub Issues - Someone may have encountered the same issue
  3. Create an Issue - Include:
    • Error messages (full stack trace)
    • Steps to reproduce
    • Environment details (OS, Bun version, PostgreSQL version)
    • Relevant configuration