Cost Management - Lumina Documentation

Strategies for managing and optimizing LLM costs.

Monitor Costs

Set up alerts:

{
  "service": "chat-api",
  "costThreshold": 2.0,
  "window": "1h"
}

Track by dimension:

Per service
Per model
Per user
Over time

Identify Expensive Queries

Filter high-cost traces:

curl "http://api:8081/api/traces?costUsd[$gt]=0.5"

Review and optimize expensive queries.

Model Optimization

Use cheaper models for simple tasks:

const model = complexity === 'simple' ? 'gpt-3.5-turbo' : 'gpt-4';

await lumina.traceLLM(
  () => openai.chat.completions.create({ model, messages }),
  { name: 'chat', system: 'openai' }
);

Prompt Optimization

Reduce token usage:

Use concise prompts
Remove unnecessary context
Implement semantic caching

Test with replay:

// Test prompt changes before deploying
await lumina.replayTraces({
  replaySetId: 'cost-test',
});

Sampling

For high volumes, sample routine operations:

const lumina = initLumina({
  endpoint: 'http://lumina:9411/v1/traces',
  service_name: 'service',
  samplingRate: 0.1,
  alwaysSampleConditions: {
    costUsd: { $gt: 0.5 }, // Always sample expensive calls
  },
});

Guides

Documentation Index

​Monitor Costs

​Identify Expensive Queries

​Model Optimization

​Prompt Optimization

​Sampling

Monitor Costs

Identify Expensive Queries

Model Optimization

Prompt Optimization

Sampling