AI Pipeline Best Practices

A comprehensive guide to instrumenting, monitoring, and optimizing AI applications with Lumina.

Instrumentation Best Practices
Cost Optimization
Quality Monitoring
Performance Tuning
Error Handling
Security & Privacy
Production Readiness

Instrumentation Best Practices

Start with the Critical Path

Instrument your most expensive or failure-prone operations first:

// ✅ Priority 1: LLM calls (expensive)
await lumina.trace(() => llm.generate(prompt), {
  name: 'llm-generation',
});

// ✅ Priority 2: Vector searches (can be slow)
await lumina.trace(() => vectorDB.search(query), {
  name: 'vector-search',
});

// ❌ Priority 3: Trivial operations (skip)
// Don't instrument: string formatting, simple math, etc.

Use Semantic Naming

Name spans to describe what they do, not how:

// ✅ Good: Semantic names
'user-authentication';
'product-recommendation';
'order-validation';
'fraud-detection';

// ❌ Bad: Technical names
'api-call-1';
'function-xyz';
'step-3';

Add Business Context

Include attributes that help debug business logic:

await lumina.trace(
  async () => {
    return await processOrder(order);
  },
  {
    name: 'order-processing',
    attributes: {
      // Business context
      'order.id': order.id,
      'order.total': order.total,
      'customer.tier': customer.tier,
      'payment.method': order.paymentMethod,

      // Technical context
      region: process.env.AWS_REGION,
      version: '2.1.0',
    },
  }
);

Use parent spans to organize complex workflows:

// ✅ Good: Grouped workflow
await lumina.trace(
  async () => {
    const user = await lumina.trace(fetchUser, { name: 'fetch-user' });
    const perms = await lumina.trace(checkPerms, { name: 'check-permissions' });
    const data = await lumina.trace(fetchData, { name: 'fetch-data' });
    return format(data);
  },
  { name: 'user-dashboard-load' }
);

// ❌ Bad: Flat structure
await lumina.trace(fetchUser);
await lumina.trace(checkPerms);
await lumina.trace(fetchData);

Cost Optimization

Track Cost Attribution

Tag traces with cost centers:

await lumina.trace(
  async () => {
    // ... LLM call
  },
  {
    name: 'customer-support-response',
    attributes: {
      cost_center: 'customer-support',
      'customer.id': customerId,
      'customer.plan': 'enterprise', // For cost allocation
    },
  }
);

Set Cost Budgets

Monitor spending per service/feature:

// In Lumina dashboard, set alerts:
// - Daily cost for 'customer-support' > $100
// - Weekly cost for 'product-recommendations' > $500
// - Monthly total cost > $5,000

Use Cheaper Models Strategically

async function smartModelSelection(task: Task) {
  return await lumina.trace(
    async () => {
      // Simple tasks: Use cheap models
      if (task.complexity === 'low') {
        return await lumina.trace(
          () => {
            return claudeHaiku.generate(task.prompt);
          },
          {
            name: 'llm-haiku',
            attributes: { 'model.tier': 'budget' },
          }
        );
      }

      // Complex tasks: Use powerful models
      return await lumina.trace(
        () => {
          return claudeSonnet.generate(task.prompt);
        },
        {
          name: 'llm-sonnet',
          attributes: { 'model.tier': 'premium' },
        }
      );
    },
    { name: 'smart-model-selection' }
  );
}

// Track in Lumina:
// - How often is each tier used?
// - What's the cost difference?
// - Is quality maintained with budget tier?

Cache Aggressively

async function cachedLLMCall(prompt: string) {
  return await lumina.trace(
    async () => {
      // Check cache
      const cached = await cache.get(prompt);
      if (cached) {
        // Cache hit: $0 cost!
        return cached;
      }

      // Cache miss: Pay for LLM
      const result = await lumina.trace(
        () => {
          return llm.generate(prompt);
        },
        {
          name: 'llm-call',
          attributes: { 'cache.hit': false },
        }
      );

      await cache.set(prompt, result);
      return result;
    },
    { name: 'cached-llm' }
  );
}

// Track in Lumina:
// - Cache hit rate
// - Cost savings from caching
// - Set alert if hit rate drops below 70%

Monitor Token Usage

await lumina.trace(
  async () => {
    const response = await llm.generate(prompt);

    return response;
  },
  {
    name: 'llm-generation',
    attributes: {
      'tokens.input': prompt.length / 4, // rough estimate
      'tokens.output': response.length / 4,
      'tokens.total': (prompt.length + response.length) / 4,
      'cost.per_1k_tokens': 0.003,
    },
  }
);

// Set alerts:
// - Prompt tokens > 5,000 (context too large?)
// - Output tokens > 2,000 (response too verbose?)

Quality Monitoring

Establish Baselines

Track quality metrics over time:

await lumina.trace(
  async () => {
    const response = await llm.generate(prompt);

    // Track quality indicators
    return response;
  },
  {
    name: 'content-generation',
    attributes: {
      'quality.response_length': response.length,
      'quality.contains_citation': response.includes('['),
      'quality.sentiment': analyzeSentiment(response),
      'quality.reading_level': calculateReadingLevel(response),
    },
  }
);

// Lumina will baseline these metrics
// Alert when they deviate significantly

Detect Hallucinations

async function factCheckResponse(query: string, response: string) {
  return await lumina.trace(
    async () => {
      // Use Lumina's semantic scorer
      // It automatically detects when responses don't match expected quality

      return response;
    },
    {
      name: 'fact-checked-response',
      attributes: {
        query: query,
        'response.length': response.length,
      },
    }
  );
}

// Lumina's semantic scorer will alert you when:
// - Response quality drops below baseline
// - Responses become inconsistent
// - Hallucinations detected

A/B Test Prompts

Use Lumina’s replay feature:

// Version A (current production)
const promptA = 'Summarize this article concisely:';

// Version B (test)
const promptB = 'Provide a brief, accurate summary of this article:';

// 1. Capture 100 production requests
// 2. Run replay with promptA vs promptB
// 3. Compare in Lumina dashboard:
//    - Cost: Which is cheaper?
//    - Quality: Semantic similarity scores
//    - Speed: Which is faster?
//    - Choose winner based on data!

Monitor User Satisfaction

await lumina.trace(
  async () => {
    const response = await generateResponse(query);

    // ... show to user ...

    return response;
  },
  {
    name: 'user-query',
    attributes: {
      'user.id': userId,
      'response.id': responseId, // Track for feedback later
    },
  }
);

// Later, when user gives feedback:
async function recordFeedback(responseId: string, rating: number) {
  // Associate feedback with the trace
  await lumina.addAttribute(responseId, {
    'user.rating': rating,
    'user.satisfied': rating >= 4,
  });
}

// Track in Lumina:
// - Average user rating per endpoint
// - Correlation between cost and satisfaction
// - Alert when satisfaction drops

Performance Tuning

Identify Bottlenecks

Multi-span traces show you where time is spent:

// Your trace shows:
📊 api-request (total: 3.5s)
  ├─ auth (50ms) ✅
  ├─ db-query (100ms) ✅
  ├─ vector-search (2.8s) ⚠️ SLOW!
  └─ llm-call (550ms) ✅

// Action: Focus on vector search optimization!

Parallelize When Possible

// ❌ Bad: Sequential (3s total)
const user = await fetchUser(userId);
const orders = await fetchOrders(userId);
const recommendations = await fetchRecs(userId);

// ✅ Good: Parallel (1s total)
const [user, orders, recommendations] = await Promise.all([
  lumina.trace(() => fetchUser(userId), { name: 'fetch-user' }),
  lumina.trace(() => fetchOrders(userId), { name: 'fetch-orders' }),
  lumina.trace(() => fetchRecs(userId), { name: 'fetch-recs' }),
]);

Use Streaming for Long Responses

async function streamingResponse(prompt: string) {
  return await lumina.trace(
    async () => {
      const stream = await llm.generateStream(prompt);

      // Time to first token (TTFT)
      const startTime = Date.now();
      let ttft: number | null = null;

      for await (const chunk of stream) {
        if (ttft === null) {
          ttft = Date.now() - startTime;
        }
        yield chunk;
      }

      return { ttft };
    },
    {
      name: 'streaming-llm',
      attributes: {
        streaming: true,
        // ttft will be added when known
      },
    }
  );
}

// Track in Lumina:
// - Time to first token (TTFT)
// - Total generation time
// - Streaming vs non-streaming performance

Set Timeouts

async function safeAPICall(input: string) {
  return await lumina.trace(
    async () => {
      // Set timeout for LLM call
      const timeoutMs = 10000; // 10 seconds

      const result = await Promise.race([
        llm.generate(input),
        new Promise((_, reject) => setTimeout(() => reject(new Error('Timeout')), timeoutMs)),
      ]);

      return result;
    },
    {
      name: 'llm-with-timeout',
      attributes: {
        'timeout.ms': 10000,
      },
    }
  );
}

// Track in Lumina:
// - How often do timeouts occur?
// - Which endpoints time out most?
// - Set alerts for timeout rate > 5%

Error Handling

Graceful Degradation

async function robustPipeline(input: string) {
  return await lumina.trace(
    async () => {
      try {
        // Try primary model
        return await lumina.trace(
          () => {
            return primaryModel.generate(input);
          },
          { name: 'primary-model' }
        );
      } catch (primaryError) {
        // Fallback to secondary model
        try {
          return await lumina.trace(
            () => {
              return secondaryModel.generate(input);
            },
            {
              name: 'fallback-model',
              attributes: {
                'fallback.reason': primaryError.message,
              },
            }
          );
        } catch (secondaryError) {
          // Last resort: cached/static response
          return await lumina.trace(
            () => {
              return getCachedResponse(input);
            },
            {
              name: 'cached-fallback',
              attributes: {
                'fallback.level': 'emergency',
              },
            }
          );
        }
      }
    },
    { name: 'robust-pipeline' }
  );
}

// Track in Lumina:
// - Primary success rate
// - Fallback usage rate
// - Alert when fallback rate > 10%

Retry with Exponential Backoff

async function retryableCall(input: string, maxRetries = 3) {
  return await lumina.trace(
    async () => {
      let lastError;
      for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
          return await lumina.trace(
            () => {
              return llm.generate(input);
            },
            {
              name: 'llm-call-attempt',
              attributes: {
                'retry.attempt': attempt,
                'retry.max': maxRetries,
              },
            }
          );
        } catch (error) {
          lastError = error;

          if (attempt < maxRetries) {
            const backoffMs = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
            await sleep(backoffMs);
          }
        }
      }

      throw lastError;
    },
    { name: 'retryable-llm-call' }
  );
}

// Track in Lumina:
// - Success rate by attempt number
// - Most common error types
// - Alert when retry rate > 20%

Track Error Patterns

try {
  await lumina.trace(() => riskyOperation(), { name: 'risky-op' });
} catch (error) {
  // Log error with context
  await lumina.trace(
    () => {
      throw error; // Re-throw to record error
    },
    {
      name: 'error-handler',
      attributes: {
        'error.type': error.name,
        'error.message': error.message,
        'error.code': error.code,
        'error.retryable': isRetryable(error),
      },
    }
  );
}

// Track in Lumina:
// - Most common error types
// - Error rate by endpoint
// - Correlation between errors and cost/latency

Security & Privacy

Redact Sensitive Data

function redactSensitive(text: string): string {
  return text
    .replace(/\b\d{16}\b/g, '[CARD-REDACTED]')
    .replace(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi, '[EMAIL-REDACTED]')
    .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN-REDACTED]');
}

await lumina.trace(
  async () => {
    const response = await llm.generate(prompt);
    return response;
  },
  {
    name: 'customer-support',
    attributes: {
      prompt: redactSensitive(prompt), // Redact before sending to Lumina
      'user.id': userId,
      // Don't log: credit cards, SSNs, passwords, etc.
    },
  }
);

Use API Key Scoping

// Production: Use read-only API key for apps
LUMINA_API_KEY=lumina_live_readonly_...

// Development: Use test environment
LUMINA_API_KEY=lumina_test_...
LUMINA_ENVIRONMENT=development

// CI/CD: Use separate key with limited permissions
LUMINA_API_KEY=lumina_ci_...

Implement Data Retention

// In Lumina dashboard, set retention policies:
// - Production traces: 30 days
// - Development traces: 7 days
// - PII data: Auto-redacted or 24 hours
// - Error traces: 90 days (for debugging)

Production Readiness

Health Checks

// Expose health check endpoint
app.get('/health', async (req, res) => {
  const health = await lumina.trace(
    async () => {
      return {
        status: 'healthy',
        timestamp: new Date(),
        checks: {
          database: await checkDB(),
          vectorDB: await checkVectorDB(),
          llm: await checkLLMAPI(),
        },
      };
    },
    { name: 'health-check' }
  );

  res.json(health);
});

Graceful Shutdown

process.on('SIGTERM', async () => {
  console.log('Shutting down gracefully...');

  // Flush any pending Lumina traces
  await lumina.flush();

  // Close connections
  await db.close();
  await server.close();

  process.exit(0);
});

Load Testing

// Use Lumina to monitor during load tests:

// 1. Run load test (e.g., with k6 or Locust)
// 2. Watch Lumina dashboard in real-time:
//    - Latency percentiles (p50, p95, p99)
//    - Error rate
//    - Cost per request
//    - Throughput
// 3. Identify bottlenecks
// 4. Optimize and re-test

Gradual Rollouts

async function featureFlag(userId: string) {
  const useNewModel = rollout.isEnabled(userId, 'new-model-v2');

  return await lumina.trace(
    async () => {
      if (useNewModel) {
        return await lumina.trace(() => newModel.generate(prompt), {
          name: 'new-model-v2',
          attributes: { feature_flag: 'enabled' },
        });
      } else {
        return await lumina.trace(() => oldModel.generate(prompt), {
          name: 'old-model-v1',
          attributes: { feature_flag: 'disabled' },
        });
      }
    },
    { name: 'model-with-feature-flag' }
  );
}

// Compare in Lumina:
// - New model vs old model performance
// - Cost difference
// - Quality metrics
// - Gradually increase rollout % based on data

Quick Reference Checklist

Before Going to Production

Weekly Reviews

Review cost trends (which features are expensive?)
Review quality metrics (any degradation?)
Review error patterns (new errors?)
Review latency (any regressions?)
Review cache hit rates (can we improve?)
Review alert false positive rate (tune thresholds?)

Monthly Optimization

Run replay tests with new prompts/models
Analyze cost per feature/customer
Identify optimization opportunities
Update cost baselines
Review security/privacy compliance

Next Steps

✅ Follow these best practices in your AI pipelines
📊 Monitor metrics in Lumina dashboard
🔔 Set up proactive alerts
🔄 Use replay for continuous improvement
📈 Optimize based on data, not guesses

Need help? Check out: