Home » AI Cost Optimization » Cost Per Conversation

What Does an AI Conversation Actually Cost

A typical AI conversation costs $0.02 to $0.15 for a 4-turn exchange using Claude Sonnet, and $0.08 to $0.50 for a 10-turn conversation. Cost grows quadratically with conversation length because each turn resends all previous messages as history. A 10-turn conversation costs roughly 5x more than a 4-turn conversation, not 2.5x, because accumulated history dominates input tokens in later turns. Understanding this cost curve is essential for pricing AI-powered products and identifying where optimization has the most impact.

Anatomy of Conversation Cost

Every turn in a conversation sends the following to the model: the system prompt (fixed, typically 1,500 to 2,500 tokens), conversation history (growing by roughly 300 to 600 tokens per turn), RAG retrieval chunks if applicable (1,000 to 2,500 tokens per turn), tool definitions if applicable (500 to 3,000 tokens, fixed), and the current user message (50 to 200 tokens). The model returns a response of 100 to 500 tokens.

On turn 1, a typical request with a 2,000-token system prompt, no history, 1,500 tokens of RAG retrieval, and 1,000 tokens of tool definitions processes about 4,600 input tokens. On turn 5, the same request now includes 2,000 tokens of accumulated history (roughly 400 tokens added per turn), pushing the total to 6,600 input tokens. On turn 10, history has grown to 4,000 tokens, bringing the total to 8,600 input tokens. Each additional turn adds not just the new message but forces reprocessing of all previous messages.

Cost by Conversation Length

Using Claude Sonnet pricing ($3 per million input, $15 per million output) and typical token counts for a customer support conversation with RAG retrieval:

A 2-turn conversation (one question, one response, one follow-up, one response) processes roughly 10,000 total input tokens and generates 600 output tokens. Cost: approximately $0.04.

A 5-turn conversation processes roughly 30,000 total input tokens and generates 1,500 output tokens. Cost: approximately $0.11.

A 10-turn conversation processes roughly 75,000 total input tokens and generates 3,000 output tokens. Cost: approximately $0.27.

A 20-turn conversation processes roughly 200,000 total input tokens and generates 6,000 output tokens. Cost: approximately $0.69.

The 20-turn conversation costs 17x more than the 2-turn conversation, even though it is only 10x longer, because of the quadratic history accumulation. This means that a small percentage of long conversations can dominate the cost of an entire application. If 10 percent of conversations reach 20 turns, they may represent 30 to 40 percent of total API spending.

Why Long Conversations Dominate Spending

Most AI applications have a conversation length distribution that follows a power law: many short conversations and a few very long ones. In a typical customer support application, 50 percent of conversations resolve in 3 turns or fewer, 30 percent take 4 to 8 turns, 15 percent take 9 to 15 turns, and 5 percent take 16 turns or more. Because cost grows quadratically with length, that 5 percent of long conversations can represent 20 to 30 percent of total API spending. Identifying and optimizing these long-tail conversations delivers outsized cost reductions.

Long conversations are expensive not just because they have more turns, but because of what happens to context efficiency over time. In the first few turns, the user's question and the system prompt dominate input tokens, and every token is serving a purpose. By turn 15, the majority of input tokens are conversation history, much of which is no longer relevant to the current topic: the user has moved on from their initial question, earlier diagnostic steps have been resolved, and old context is just occupying space. The model processes all of it anyway, and you pay for all of it.

Reducing Cost Per Conversation

The highest-impact optimization for conversation cost is replacing raw history with memory summaries. Instead of resending 4,000 tokens of conversation history on turn 10, a memory summary provides the essential context in 300 to 500 tokens. This reduces the turn 10 input from 8,600 tokens to 5,100 tokens, a 40 percent reduction. The savings compound with each subsequent turn because the memory summary stays compact while raw history continues growing. By turn 20, the difference is even more dramatic: raw history might reach 8,000 tokens while a memory summary stays under 600 tokens.

Prompt caching reduces the system prompt cost by 90 percent on every turn after the first (within the 5-minute TTL). For a 10-turn conversation with a 2,000-token system prompt, this saves roughly 18,000 tokens worth of full-price processing (2,000 tokens times 9 cached turns), reducing the system prompt component of conversation cost by 90 percent.

Dynamic tool selection reduces the fixed overhead on every turn. If your application defines 10 tools at 300 tokens each (3,000 tokens total), but any given turn is likely to use only 2 or 3 tools, including only the relevant tools saves 2,100 tokens per turn. Over a 10-turn conversation, that is 21,000 tokens saved, roughly $0.06 per conversation at Sonnet pricing. The savings are small per conversation but meaningful at scale.

RAG retrieval optimization targets the other major variable component. Instead of retrieving 3 to 5 chunks on every turn, a smarter retrieval strategy checks whether the current turn actually needs external knowledge. Follow-up questions ("Can you explain that in more detail?"), confirmations ("Yes, please proceed"), and procedural steps often do not need new retrieval. Skipping RAG on turns that do not need it saves 1,000 to 2,500 tokens per skipped turn, which can reduce total conversation retrieval tokens by 30 to 50 percent.

Combined, memory summaries, prompt caching, dynamic tool selection, and smart RAG skipping can reduce the cost of a 10-turn conversation from $0.27 to roughly $0.08, a 70 percent reduction that makes long conversations economically sustainable.

Cost Per Conversation as a Business Metric

Cost per conversation is the metric that connects AI spending to business value. Knowing that your average conversation costs $0.12 lets you calculate the margin on an AI-powered support interaction, price your AI features accurately, and compare AI cost to the human alternative. If a human support agent costs $12 per interaction and your AI handles 70 percent of conversations at $0.12 each, the AI saves $11.88 per handled conversation. Even after accounting for the 30 percent of conversations that escalate to humans, the blended cost per interaction drops dramatically.

Tracking cost per conversation over time reveals whether your optimizations are working and whether new features are introducing cost regressions. A rising cost-per-conversation trend after deploying a new feature (more tools, longer system prompt, additional RAG sources) quantifies the cost impact and helps the team decide whether the feature improvement justifies the cost increase. A falling trend after implementing memory-based history replacement confirms the optimization is delivering the expected savings.

Make every conversation affordable. Adaptive Recall replaces growing conversation history with compact memory recall, flattening the cost curve from quadratic to near-linear as conversations get longer.

Get Started Free