How Much Does Each Extra Token Cost
Current Token Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Cached Input ($/M) |
|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | $1.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Claude Haiku 4.5 | $0.80 | $4.00 | $0.08 |
| GPT-4o | $2.50 | $10.00 | $1.25 |
| GPT-4o mini | $0.15 | $0.60 | $0.075 |
| Gemini 1.5 Pro | $1.25 | $5.00 | N/A |
| Gemini 1.5 Flash | $0.075 | $0.30 | N/A |
What Tokens Cost in Context
Individual token costs are tiny, but they compound across calls. Here is what adding 10,000 extra tokens to your context costs at different call volumes:
| Model | Cost per 10k extra tokens | 1,000 calls/day | 10,000 calls/day | Monthly (10k calls/day) |
|---|---|---|---|---|
| Claude Sonnet 4.6 | $0.030 | $30/day | $300/day | $9,000 |
| GPT-4o | $0.025 | $25/day | $250/day | $7,500 |
| Claude Haiku 4.5 | $0.008 | $8/day | $80/day | $2,400 |
| GPT-4o mini | $0.0015 | $1.50/day | $15/day | $450 |
Adding 10,000 unnecessary tokens to every API call on Claude Sonnet at 10,000 calls per day costs $9,000 per month. That is the cost of a verbose system prompt that could be compressed, retrieval results that are not filtered, or conversation history that is not managed. Every token you eliminate from the context directly reduces this cost.
Input vs Output: The Hidden Asymmetry
Output tokens cost 3 to 5 times more than input tokens on every major provider. A 2,000-token response on Claude Sonnet costs $0.030, which is the same as 10,000 input tokens. This asymmetry means that controlling response length is as cost-effective as reducing context size, but is often overlooked.
Practical techniques for controlling output cost include setting max_tokens to a reasonable limit for your use case, instructing the model to be concise for simple queries, and using structured output formats (JSON, bullet points) that are naturally shorter than prose. A query that can be answered in 200 tokens should not generate a 2,000-token response just because the model defaulted to verbose mode.
The Compounding Effect
Token costs compound across multiple factors. A chatbot that starts with 5,000 tokens of context and grows to 50,000 tokens over a long conversation has a 10x cost increase per call by the end. If the response also grows longer as the conversation gets more complex (from 500 to 2,000 tokens), the total cost per call has increased 12x from the first message to the last.
This compounding is why context management is primarily a cost engineering discipline. The techniques described in other articles in this pillar (sliding windows, compression, prompt caching, external memory) are not just about fitting within the context limit. They are about keeping per-call costs constant regardless of conversation length, knowledge base size, or query complexity.
When External Memory Is Cheaper Than Tokens
External memory systems have their own costs: storage for memories, computation for retrieval, and API costs for the memory service itself. The question is whether these costs are less than the token costs they replace.
For an application that includes 20,000 tokens of persistent knowledge in every prompt, switching to external memory that retrieves 2,000 relevant tokens per query saves 18,000 tokens per call. On Claude Sonnet at 10,000 calls per day, that is $0.054 saved per call, or $16,200 per month. If the memory system costs $500 per month to operate, the net savings are $15,700 per month.
The break-even point depends on your token volume and the cost of the memory system, but for most applications making more than 1,000 calls per day with persistent knowledge in the context, external memory pays for itself many times over.
Every token you remove from context saves money. Adaptive Recall stores knowledge externally so you pay only for the tokens each query actually needs.
Get Started Free