Skip to main content
Detect quality degradation with hybrid scoring.

Scoring Methods

Exact Match:
  • Hash-based comparison
  • Instant results
  • Best for structured output
Semantic Similarity:
  • AI-powered scoring
  • ~100ms latency
  • Best for natural language

Automatic Baselines

Lumina automatically:
  1. Calculates baselines from history
  2. Compares new responses
  3. Alerts on degradation

Quality Alerts

Triggered when similarity drops:
Baseline: 0.92
Current: 0.65 (-29%)
Threshold: -20%
→ Alert sent

Custom Metrics

Add your own quality scores:
await lumina.traceLLM(
  () => llm.generate(prompt),
  {
    metadata: {
      factual_accuracy: calculateAccuracy(response),
      relevance: calculateRelevance(response),
    },
  }
);