Documentation Index
Fetch the complete documentation index at: https://docs.uselumina.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
Strategies for managing and optimizing LLM costs.
Monitor Costs
Set up alerts:
{
"service": "chat-api",
"costThreshold": 2.0,
"window": "1h"
}
Track by dimension:
- Per service
- Per model
- Per user
- Over time
Identify Expensive Queries
Filter high-cost traces:
curl "http://api:8081/api/traces?costUsd[$gt]=0.5"
Review and optimize expensive queries.
Model Optimization
Use cheaper models for simple tasks:
const model = complexity === 'simple' ? 'gpt-3.5-turbo' : 'gpt-4';
await lumina.traceLLM(
() => openai.chat.completions.create({ model, messages }),
{ name: 'chat', system: 'openai' }
);
Prompt Optimization
Reduce token usage:
- Use concise prompts
- Remove unnecessary context
- Implement semantic caching
Test with replay:
// Test prompt changes before deploying
await lumina.replayTraces({
replaySetId: 'cost-test',
});
Sampling
For high volumes, sample routine operations:
const lumina = initLumina({
endpoint: 'http://lumina:9411/v1/traces',
service_name: 'service',
samplingRate: 0.1,
alwaysSampleConditions: {
costUsd: { $gt: 0.5 }, // Always sample expensive calls
},
});