Skip to main content
Strategies for managing and optimizing LLM costs.

Monitor Costs

Set up alerts:
{
  "service": "chat-api",
  "costThreshold": 2.0,
  "window": "1h"
}
Track by dimension:
  • Per service
  • Per model
  • Per user
  • Over time

Identify Expensive Queries

Filter high-cost traces:
curl "http://api:8081/api/traces?costUsd[$gt]=0.5"
Review and optimize expensive queries.

Model Optimization

Use cheaper models for simple tasks:
const model = complexity === 'simple' ? 'gpt-3.5-turbo' : 'gpt-4';

await lumina.traceLLM(
  () => openai.chat.completions.create({ model, messages }),
  { name: 'chat', system: 'openai' }
);

Prompt Optimization

Reduce token usage:
  • Use concise prompts
  • Remove unnecessary context
  • Implement semantic caching
Test with replay:
// Test prompt changes before deploying
await lumina.replayTraces({
  replaySetId: 'cost-test',
});

Sampling

For high volumes, sample routine operations:
const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'service',
  samplingRate: 0.1,
  alwaysSampleConditions: {
    costUsd: { $gt: 0.5 }, // Always sample expensive calls
  },
});