How to Build a Cost Monitoring Dashboard for AI
Before You Start
You need structured logging on your AI API calls (see the audit guide for instrumentation details), a time-series database or analytics platform for storing metrics (Grafana with InfluxDB, Datadog, or even PostgreSQL with time-partitioned tables), and a visualization tool (Grafana, Retool, or a custom frontend). If you use an AI gateway like LiteLLM, Portkey, or Helicone, much of the instrumentation and basic dashboarding is provided out of the box, and you can extend it with custom metrics.
Step-by-Step Build
Build your dashboard around these core metrics: total daily spend (the headline number), cost per request (average and p95 for detecting expensive outliers), cost per feature (which parts of your application spend the most), cost per business outcome (cost per resolved ticket, per generated document, per completed task), cache hit rate (percentage of requests served from prompt cache, response cache, or semantic cache), model distribution (percentage of requests routed to each model tier), and token efficiency (average input tokens per request, trending over time). Each metric should have a daily, weekly, and monthly view. The daily view catches anomalies. The weekly view shows trends. The monthly view informs budget conversations.
Every API call should emit a structured log entry with these fields: timestamp, request_id, feature (which application feature triggered the call), user_id or tenant_id, model, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, calculated_cost_usd, latency_ms, and status (success or error type). Add the instrumentation as middleware in your API client wrapper so every call is captured automatically without requiring developers to add logging to each call site. Include both the raw token counts from the API response and the calculated cost based on current pricing.
# Pricing table for cost calculation
PRICING = {
"claude-opus-4-6-20260515": {"input": 15.0, "output": 75.0, "cache_read": 1.50},
"claude-sonnet-4-6-20260414": {"input": 3.0, "output": 15.0, "cache_read": 0.30},
"claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.0, "cache_read": 0.08},
}
def calculate_cost(model, usage):
prices = PRICING.get(model, PRICING["claude-sonnet-4-6-20260414"])
input_cost = (usage.input_tokens / 1_000_000) * prices["input"]
output_cost = (usage.output_tokens / 1_000_000) * prices["output"]
cache_cost = (getattr(usage, "cache_read_input_tokens", 0) / 1_000_000) * prices["cache_read"]
return input_cost + output_cost + cache_costStream structured log entries from your application to your analytics store. For low-volume applications (under 100,000 requests per day), writing directly to PostgreSQL with a time-partitioned table works well and avoids the complexity of a dedicated time-series database. For higher volumes, use a streaming pipeline (application logs to a message queue like Redis Streams or Kafka, then a consumer that writes to InfluxDB, TimescaleDB, or a cloud analytics service). Create aggregation queries that pre-compute hourly and daily rollups for dashboard performance. Keep raw per-request data for at least 30 days for drill-down analysis and anomaly investigation.
Build these panels in order of importance. The top row shows headline numbers: today's total spend, month-to-date spend, projected monthly spend (based on the trailing 7-day average), and the delta from last month. The second row shows time-series charts: daily spend over 30 days, cost per request over 30 days, and cache hit rate over 30 days. The third row shows breakdowns: cost by feature (bar chart), cost by model tier (pie chart), and the top 10 most expensive requests (table). The fourth row shows business metrics: cost per outcome by feature, requests per day trending, and token efficiency trending. Every panel should support drill-down: clicking a feature name shows the detailed breakdown for that feature. Clicking a spike in the time-series shows the requests that caused it.
Set up automated alerts for these conditions: daily spend exceeding 150 percent of the trailing 7-day average (catches unexpected spikes), a single request exceeding 50,000 input tokens (catches prompt injection or runaway context), cache hit rate dropping below 80 percent of its recent average (catches configuration changes that broke caching), model routing shifting more than 10 percentage points toward higher-cost models (catches routing bugs), and projected monthly spend exceeding budget by more than 20 percent. Route alerts to the on-call channel (Slack, PagerDuty, email) with enough context to diagnose the issue: the metric value, the threshold, the affected feature, and a link to the dashboard panel showing the anomaly.
Cost Attribution by Tenant
For multi-tenant applications, per-tenant cost tracking is critical for pricing decisions, fair usage enforcement, and identifying tenants whose usage patterns are disproportionately expensive. Tag every API call with the tenant ID, then aggregate costs by tenant daily. Build a tenant cost leaderboard that shows the top 20 tenants by spending, their average cost per request compared to the global average, and their month-over-month trend. Tenants whose cost per request is 3x or more above average often have usage patterns that can be optimized (excessively long conversations, repeated queries, or workflows that should be batched).
Per-tenant tracking also informs pricing. If your average cost per conversation is $0.12 but some tenants average $0.45 due to their usage patterns, a flat per-conversation pricing model loses money on those tenants. Tiered pricing based on actual consumption ensures that heavy usage is priced appropriately without penalizing light users.
Adaptive Recall's status tool reports memory system health and usage metrics that complement your cost dashboard. See how memory recall reduces per-request token usage and track the cost savings in your monitoring.
Get Started Free