Home » AI Cost Optimization » Pricing Compared

AI API Pricing: OpenAI vs Anthropic vs Google Compared

AI API pricing varies by 60x between the cheapest and most expensive models, and per-token prices only tell part of the story. Prompt caching discounts, batch processing rates, output-to-input price ratios, and context window limits all affect the real cost of running a production workload. This comparison breaks down the pricing across major providers so you can make informed decisions about which models to use for which tasks.

Per-Token Pricing by Model Tier

AI providers organize their models into capability tiers that correspond roughly to quality and cost levels. Each provider's lineup includes a fast, cheap model for simple tasks, a balanced mid-tier model for most production workloads, and a premium model for the most complex reasoning tasks. The pricing within each tier is remarkably competitive across providers, with the real differentiation coming from features like caching, batching, and context window sizes.

At the economy tier, Anthropic's Claude Haiku processes input at $0.80 per million tokens and output at $4.00 per million. OpenAI's GPT-4o-mini comes in at $0.15 per million input and $0.60 per million output. Google's Gemini Flash offers $0.075 per million input and $0.30 per million output. Google's Flash model is the cheapest per-token option, but the quality differences between economy-tier models mean that the cheapest per-token model may not be the cheapest per-outcome model for your specific tasks.

At the mid-tier, Claude Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens. GPT-4o costs $2.50 per million input and $10.00 per million output. Gemini Pro costs $1.25 per million input and $5.00 per million output. This tier handles the majority of production workloads, so even small per-token differences compound to significant absolute cost differences at scale. A million daily requests at mid-tier pricing generates meaningful cost differentials between providers.

At the premium tier, Claude Opus costs $15.00 per million input and $75.00 per million output. OpenAI's o1 costs $15.00 per million input and $60.00 per million output (with additional thinking token costs). Google's Gemini Ultra costs $5.00 per million input and $15.00 per million output. Premium models should only be used for tasks that genuinely require their capabilities, because the cost per request at this tier is 10x to 60x higher than economy models.

Beyond Per-Token Pricing

Raw per-token pricing is a starting point, but several other pricing dimensions significantly affect total cost of ownership. These dimensions often matter more than per-token rates for production workloads.

Prompt Caching

Anthropic offers the most aggressive prompt caching discount: cached input tokens cost $0.30 per million (90 percent reduction from the standard $3.00 for Sonnet). This makes Anthropic significantly cheaper for applications with large, stable system prompts. A 3,000-token system prompt at 1 million daily requests costs $9.00 per day at standard pricing but $0.90 per day with caching. Over a month, caching saves $243 on the system prompt component alone. OpenAI offers a more modest 50 percent cache discount on certain models. Google's caching support varies by model and is generally less mature than Anthropic's offering.

Batch Processing

Both Anthropic and OpenAI offer batch processing at 50 percent of standard pricing. Anthropic's Message Batches API processes requests asynchronously within 24 hours (typically completing in 1 to 4 hours). OpenAI's Batch API offers similar terms. For workloads that tolerate asynchronous processing (document analysis, content generation, data classification), batching cuts costs in half regardless of the base per-token price. Google's batch pricing is less standardized and depends on the specific API endpoint.

Output-to-Input Price Ratio

The ratio between output and input pricing affects cost differently depending on your workload type. Anthropic's Sonnet charges 5x more for output than input ($15 vs $3). OpenAI's GPT-4o charges 4x more ($10 vs $2.50). Applications that generate long outputs (content creation, code generation, detailed analysis) are more sensitive to output pricing than applications with short outputs (classification, extraction, simple Q&A). For a content generation workload where output tokens equal input tokens, the output pricing difference between providers can shift the total cost ranking compared to input-heavy workloads.

Context Window Limits

Larger context windows enable longer conversations and more retrieval content but also increase per-request costs because all context tokens are processed on every call. Anthropic and Google offer 1 million token context windows on their frontier models, while OpenAI's standard context is 128,000 tokens. A larger window does not mean you should fill it. The cost-optimal strategy is the smallest effective context for each request, regardless of the maximum available.

Total Cost of Ownership

The cheapest API is not always the cheapest solution. Total cost of ownership includes API costs, infrastructure costs (embedding databases, caching layers, routing logic), development costs (integration work, optimization effort), and quality costs (handling failures, rework from inadequate model output). A provider with slightly higher per-token pricing but better prompt caching, native batching, and stronger tool use reliability can produce lower total costs than a cheaper provider that requires more engineering work to achieve the same result quality.

For most production workloads, the choice between providers comes down to three factors: which provider's mid-tier model produces the best results for your specific tasks (quality differences between providers are real and task-dependent), which provider's pricing features (caching, batching, committed use discounts) best match your traffic patterns, and which provider's ecosystem (SDKs, documentation, support, MCP compatibility) reduces your development and operational costs. Running benchmarks on your actual data with all three providers, then calculating total costs including optimization features, produces a much better decision than comparing per-token rates in isolation.

Prices change frequently: AI API pricing has trended sharply downward since 2023, with major price cuts every 3 to 6 months. The specific numbers in this article reflect early 2026 pricing. Check each provider's current pricing page before making decisions. The relative positioning (which tier is cheapest, which features each provider offers) changes more slowly than the absolute prices.

Reduce your costs regardless of provider. Adaptive Recall cuts per-request token usage by 50 to 80 percent by replacing redundant context with targeted memory recall. The savings apply equally whether you use Anthropic, OpenAI, or Google.

Get Started Free