Troubleshooting Guide
This guide covers common issues you might encounter when setting up and using Lumina.
Table of Contents
- Installation Issues
- Database Connection Issues
- Service Startup Issues
- Trace Ingestion Issues
- Query API Issues
- Replay Engine Issues
- SDK Integration Issues
- Performance Issues
Installation Issues
Problem: bun: command not found
Symptoms:
$ bun install
bash: bun: command not found
Solution: Install Bun by running:
curl -fsSL https://bun.sh/install | bash
Then restart your terminal or run:
source ~/.bashrc # or ~/.zshrc
Verification:
bun --version
Problem: Package installation fails
Symptoms:
$ bun install
error: unable to resolve dependency
Solution:
- Clear Bun cache:
rm -rf ~/.bun/install/cache
- Delete node_modules and lockfile:
rm -rf node_modules bun.lockb
- Reinstall:
bun install
Database Connection Issues
Problem: database "lumina" does not exist
Symptoms:
Failed to connect to database: database "lumina" does not exist
Solution: Create the database:
createdb lumina
Or using psql:
psql postgres
CREATE DATABASE lumina;
\q
Verification:
psql -d lumina -c "SELECT 1"
Problem: PostgreSQL connection refused
Symptoms:
Failed to connect to database: connection refused
ECONNREFUSED localhost:5432
Solution:
- Check if PostgreSQL is running:
# macOS
brew services list | grep postgresql
# Linux
systemctl status postgresql
- Start PostgreSQL if not running:
# macOS
brew services start postgresql
# Linux
sudo systemctl start postgresql
- Verify port 5432 is listening:
lsof -i :5432
Problem: Authentication failed for user
Symptoms:
PostgresError: password authentication failed for user "username"
Solution:
- Check your connection string in environment:
echo $DATABASE_URL
- Update with correct credentials:
export DATABASE_URL="postgres://username:password@localhost:5432/lumina"
- For local development without password:
export DATABASE_URL="postgres://username@localhost:5432/lumina"
Problem: Table already exists error
Symptoms:
PostgresError: relation "traces" already exists
Solution:
This is usually harmless. Tables are created with IF NOT EXISTS clause. If you need to reset:
psql -d lumina -c "DROP TABLE IF EXISTS replay_results CASCADE; DROP TABLE IF EXISTS replay_sets CASCADE; DROP TABLE IF EXISTS traces CASCADE;"
Then restart the services to recreate tables.
Service Startup Issues
Problem: Port already in use
Symptoms:
Error: listen EADDRINUSE: address already in use :::9411
Solution:
- Find the process using the port:
# macOS/Linux
lsof -i :9411
# Or use netstat
netstat -tuln | grep 9411
- Kill the process:
kill -9 <PID>
- Or change the port:
# For ingestion service
PORT=9412 bun run dev
# For query service
cd services/query
PORT=8090 bun run dev
# For replay service
cd services/replay
PORT=8091 bun run dev
Problem: Service crashes immediately
Symptoms:
$ bun run dev
🚀 Starting service...
[ERROR] Uncaught Error: ...
Solution:
- Check logs for specific error message
- Verify DATABASE_URL is set:
echo $DATABASE_URL
- Ensure database is accessible:
psql -d lumina -c "SELECT 1"
- Check for missing dependencies:
bun install
Problem: Database tables not created
Symptoms: Service starts but queries fail with “relation does not exist”
Solution:
- Check that
initialize()is called on startup - Manually create tables:
-- Connect to database
psql -d lumina
-- For ingestion service (traces table)
-- See /services/ingestion/src/database/postgres.ts for schema
-- For replay service (replay tables)
-- See /services/replay/src/database/postgres.ts for schema
- Restart the service after table creation
Trace Ingestion Issues
Problem: Traces not appearing in database
Symptoms: SDK sends traces but database remains empty
Diagnostic Steps:
- Check ingestion service logs:
cd services/ingestion
bun run dev
# Look for incoming requests
- Test ingestion directly:
curl -X POST http://localhost:9411/v1/traces \
-H "Content-Type: application/json" \
-d '[{
"traceId": "test123",
"spanId": "span123",
"serviceName": "test",
"name": "test-trace",
"timestamp": "'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"
}]'
- Check database:
psql -d lumina -c "SELECT COUNT(*) FROM traces;"
Common Causes:
- Wrong ingestion endpoint in SDK config
- Ingestion service not running
- Database connection issue
- Validation errors in trace data
Problem: Invalid trace format error
Symptoms:
{
"error": "Invalid trace format",
"message": "Missing required fields"
}
Solution: Ensure your traces include required fields:
{
traceId: string,
spanId: string,
serviceName: string,
name: string,
timestamp: string (ISO 8601)
}
Check SDK configuration:
const lumina = initLumina({
endpoint: 'http://localhost:9411/v1/traces',
service_name: 'my-app', // Required
});
Problem: Cost not calculated
Symptoms:
Traces ingested but cost_usd is 0 or null
Solution:
- Ensure token counts are provided in trace attributes:
{
attributes: {
'llm.usage.prompt_tokens': 100,
'llm.usage.completion_tokens': 50,
'llm.model': 'claude-sonnet-4-5'
}
}
- Check if model is supported in cost calculator:
// See packages/core/src/cost-calculator.ts
- Verify cost calculation is enabled in ingestion service
Query API Issues
Problem: Query returns empty results
Symptoms:
curl "http://localhost:8081/api/traces"
# Returns: {"data": [], "pagination": {...}}
Diagnostic Steps:
- Check if traces exist:
psql -d lumina -c "SELECT COUNT(*) FROM traces;"
- Check query filters:
# Try without filters
curl "http://localhost:8081/api/traces?limit=10"
# Check specific service
curl "http://localhost:8081/api/traces?service=my-app"
- Check date range (default might exclude your traces):
curl "http://localhost:8081/api/traces?startDate=2024-01-01"
Problem: Query API returns 500 error
Symptoms:
{
"error": "Internal server error",
"message": "..."
}
Solution:
- Check Query service logs for detailed error
- Verify database connection:
psql -d lumina -c "SELECT 1"
- Check for SQL syntax errors in filters
- Ensure indexes exist for query performance
Problem: Analytics returns incorrect values
Symptoms: Cost or latency analytics show unexpected numbers
Solution:
- Verify data in database:
psql -d lumina
-- Check cost data
SELECT service_name, AVG(cost_usd), SUM(cost_usd)
FROM traces
GROUP BY service_name;
-- Check latency data
SELECT service_name, AVG(latency_ms)
FROM traces
GROUP BY service_name;
- Check for null values:
SELECT COUNT(*) FROM traces WHERE cost_usd IS NULL;
SELECT COUNT(*) FROM traces WHERE latency_ms IS NULL;
- Verify aggregation logic in Query service code
Replay Engine Issues
Problem: Replay set creation fails
Symptoms:
{
"error": "Invalid traces",
"message": "Found 0 of 5 traces"
}
Solution:
- Verify trace IDs exist:
psql -d lumina -c "SELECT trace_id FROM traces LIMIT 10;"
- Use exact trace_id values from database:
curl -X POST http://localhost:8082/replay/capture \
-H "Content-Type: application/json" \
-d '{
"name": "Test Replay",
"traceIds": ["actual-trace-id-from-db"]
}'
Problem: Replay execution fails
Symptoms: Replay status shows “failed” or execution hangs
Diagnostic Steps:
- Check replay service logs
- Verify traces have all required fields:
SELECT trace_id, span_id, prompt, response
FROM traces
WHERE trace_id = 'your-trace-id';
- Check for null prompts or responses:
SELECT COUNT(*) FROM traces
WHERE prompt IS NULL OR response IS NULL;
Solution:
- Ensure original traces have complete data
- Check for API rate limits if calling external LLM APIs
- Verify LLM API credentials are configured
Problem: Diff results show 0 similarity
Symptoms: All replays show hash_similarity = 0
Solution:
- Check if responses are being captured:
SELECT original_response, replay_response
FROM replay_results
LIMIT 1;
- Verify Diff Engine is working:
// Test similarity calculation
import { textSimilarity } from '@lumina/core';
const score = textSimilarity('hello', 'hello');
console.log(score); // Should be 1.0
- Check for encoding issues in text comparison
Problem: Foreign key constraint error
Symptoms:
PostgresError: there is no unique constraint matching given keys for referenced table
Solution: This was fixed in the codebase. If you still see this:
- Ensure replay_results table uses composite foreign key:
FOREIGN KEY (trace_id, span_id)
REFERENCES traces(trace_id, span_id)
- Verify trace_ids in replay_sets are TEXT[] not UUID[]:
ALTER TABLE replay_sets
ALTER COLUMN trace_ids TYPE TEXT[]
USING trace_ids::TEXT[];
SDK Integration Issues
Problem: Traces not sent from application
Symptoms: Application runs but no traces appear in Lumina
Diagnostic Steps:
- Enable debug logging:
const lumina = initLumina({
debug: true,
// ...other config
});
- Check network requests in application logs
- Verify endpoint is reachable:
curl http://localhost:9411/health
Solution:
- Check firewall rules
- Ensure ingestion service is running
- Verify endpoint URL is correct (no typos)
- Check for CORS issues if calling from browser
Problem: TypeScript errors with SDK
Symptoms:
Type 'X' is not assignable to type 'Y'
Solution:
- Ensure @lumina/sdk is properly installed:
bun add @lumina/sdk
- Check TypeScript version compatibility:
bun add -D typescript@latest
- Import types correctly:
import { initLumina, type LuminaConfig } from '@lumina/sdk';
Problem: Token counts not captured
Symptoms: Traces ingested but prompt_tokens and completion_tokens are null
Solution:
- For Anthropic SDK, tokens are in response.usage:
const response = await anthropic.messages.create({...});
console.log(response.usage); // { input_tokens, output_tokens }
- Lumina SDK automatically extracts these. If not working:
// Manually pass token info
await lumina.traceLLM(async () => response, {
name: 'test',
system: 'anthropic',
prompt: '...',
metadata: {
tokens: {
prompt: response.usage.input_tokens,
completion: response.usage.output_tokens,
},
},
});
Performance Issues
Problem: Slow trace ingestion
Symptoms: Traces take several seconds to ingest
Solutions:
- Check database performance:
-- Check for missing indexes
\d traces
-- Add indexes if missing
CREATE INDEX IF NOT EXISTS idx_traces_timestamp ON traces(timestamp);
CREATE INDEX IF NOT EXISTS idx_traces_service ON traces(service_name);
- Optimize database connection pool:
// In database/postgres.ts
this.sql = postgres(this.connectionString, {
max: 20, // Increase pool size
idle_timeout: 20,
connect_timeout: 10,
});
- Consider async ingestion (future enhancement)
Problem: Slow query performance
Symptoms: API queries take > 2 seconds to return
Solutions:
- Add database indexes:
CREATE INDEX idx_traces_service ON traces(service_name);
CREATE INDEX idx_traces_model ON traces(model);
CREATE INDEX idx_traces_timestamp ON traces(timestamp DESC);
CREATE INDEX idx_traces_cost ON traces(cost_usd);
CREATE INDEX idx_traces_latency ON traces(latency_ms);
- Reduce query limit:
curl "http://localhost:8081/api/traces?limit=20"
- Use more specific filters:
# Instead of querying all traces
curl "http://localhost:8081/api/traces"
# Query specific service and date range
curl "http://localhost:8081/api/traces?service=my-app&startDate=2024-01-15"
- Implement pagination properly:
# Page through results
curl "http://localhost:8081/api/traces?limit=50&offset=0"
curl "http://localhost:8081/api/traces?limit=50&offset=50"
Problem: High memory usage
Symptoms: Services consume excessive RAM
Solutions:
- Reduce connection pool size:
postgres(connectionString, {
max: 5, // Reduce from 10
});
- Implement pagination in queries
- Add memory limits to Bun process:
NODE_OPTIONS="--max-old-space-size=512" bun run dev
- Archive old traces:
-- Move old traces to archive table
CREATE TABLE traces_archive AS
SELECT * FROM traces
WHERE timestamp < NOW() - INTERVAL '30 days';
DELETE FROM traces
WHERE timestamp < NOW() - INTERVAL '30 days';
Getting Help
If you’ve tried these solutions and still have issues:
- Check logs - Enable debug logging in all services
- Search GitHub Issues - Someone may have encountered the same issue
- Create an Issue - Include:
- Error messages (full stack trace)
- Steps to reproduce
- Environment details (OS, Bun version, PostgreSQL version)
- Relevant configuration