Embedding Models: OpenAI vs Voyage vs Cohere
OpenAI: text-embedding-3 Family
OpenAI offers two embedding models: text-embedding-3-small (1,536 dimensions) and text-embedding-3-large (3,072 dimensions). Both support Matryoshka representation learning, meaning you can request any dimension count up to the maximum and the model returns a truncated but still meaningful vector. This flexibility lets you start with full dimensions for maximum quality and compress later if storage becomes a concern.
text-embedding-3-large ranks among the top models on the MTEB benchmark across retrieval, classification, and clustering tasks. text-embedding-3-small offers roughly 90% of the quality at lower cost and smaller vectors, making it the go-to choice for cost-sensitive applications where slight quality differences are acceptable.
Pricing is straightforward: $0.13 per million tokens for the large model and $0.02 per million tokens for the small model. For a corpus of 100,000 documents averaging 500 tokens each (50 million tokens total), embedding costs roughly $6.50 with the large model or $1.00 with the small model. These are one-time costs for the initial embedding, plus ongoing costs for new documents and query embeddings.
The primary consideration with OpenAI embeddings is vendor dependency. Once your corpus is embedded with an OpenAI model, switching to a different provider requires re-embedding everything because vector spaces from different providers are incompatible. OpenAI's API availability, rate limits, and pricing changes directly affect your search system.
Voyage AI: Domain-Specific Excellence
Voyage AI differentiates through domain-specific models. Their general-purpose voyage-3 model is competitive with OpenAI's text-embedding-3-large on MTEB benchmarks, but their specialized models, voyage-code-3 for code and voyage-law-2 for legal text, outperform all general-purpose models on their respective domains by 5 to 10% on retrieval benchmarks.
voyage-code-3 is trained on a mixture of source code, documentation, and technical discussions. It understands that a Python function definition and a description of what that function does are semantically related, even though the code syntax and the natural language description look very different. For AI coding assistants and code search applications, this domain understanding translates directly into better retrieval.
voyage-law-2 is trained on legal corpora including case law, statutes, and legal analysis. It captures domain-specific concepts like the relationship between a statute and its implementing regulations, or between a legal precedent and the cases that cite it. General-purpose models treat legal language as formal English, which produces acceptable but not optimal embeddings for legal retrieval.
Voyage models output 1,024 dimensions, which provides a good balance between quality and efficiency. Pricing is competitive with OpenAI at $0.06 per million tokens for voyage-3 and $0.18 per million tokens for the specialized models. The API design follows similar conventions to OpenAI, making it straightforward to switch between providers at the code level (though vectors are incompatible).
Cohere: Multilingual Champion
Cohere's embed-v4 is a single model that handles 100+ languages in one embedding space. This means a query in English can match a document in Spanish, Japanese, or Arabic because the model learns cross-lingual semantic similarity during training. For applications serving international users or indexing content in multiple languages, Cohere eliminates the need for language-specific models or translation pipelines.
embed-v4 uses input type parameters (search_document for documents being indexed, search_query for queries being searched) that help the model produce slightly different embeddings optimized for each use case. A document embedding emphasizes the content's topics and details, while a query embedding emphasizes the information need. This asymmetric approach improves retrieval precision by 2 to 4% compared to using the same embedding type for both.
Cohere outputs 1,024 dimensions. Pricing is $0.10 per million tokens. The Cohere API includes a built-in reranking model (Rerank v3) that pairs well with their embeddings for a two-stage retrieval pipeline. Using embeddings from the same provider as your reranker can provide slight quality benefits because the models share similar training data and semantic representations.
Open-Source Alternatives
Open-source models eliminate per-token API costs at the expense of GPU hosting costs and operational complexity. The leading open-source embedding models include BGE-large-en-v1.5 (1,024 dimensions, strong English performance), E5-large-v2 (1,024 dimensions, instruction-tuned), GTE-large (1,024 dimensions, good all-around), and NV-Embed-v2 (4,096 dimensions, top MTEB scores but heavy on GPU requirements).
For applications embedding more than 50 million tokens per month, self-hosted open-source models become cheaper than API-based models. A single NVIDIA A10G instance costs roughly $1 to $2 per hour and can embed approximately 1,000 documents per second with batching, processing 50 million tokens in about an hour for $2. The same volume through OpenAI's API costs $6.50 to $65 depending on the model. The break-even point depends on your volume and how much you value operational simplicity versus cost savings.
Head-to-Head Comparison
Model | Dims | MTEB | Price/1M tokens | Best for
-------------------------|------|-------|-----------------|------------------
OpenAI text-embed-3-lg | 3072 | 64.6 | $0.130 | General purpose
OpenAI text-embed-3-sm | 1536 | 62.3 | $0.020 | Cost-sensitive
Voyage voyage-3 | 1024 | 63.8 | $0.060 | Technical content
Voyage voyage-code-3 | 1024 | 66.1* | $0.180 | Code retrieval
Voyage voyage-law-2 | 1024 | 65.4* | $0.180 | Legal retrieval
Cohere embed-v4 | 1024 | 63.5 | $0.100 | Multilingual
BGE-large-en-v1.5 | 1024 | 63.6 | Self-hosted | Budget, English
NV-Embed-v2 | 4096 | 69.3 | Self-hosted | Max quality
* Domain-specific benchmark scores, not overall MTEB
MTEB scores approximate, from public leaderboard dataHow to Decide
Default choice: OpenAI text-embedding-3-small. Widely used, well-documented, lowest API cost, good quality. Switch to the large model if recall metrics show room for improvement.
Code or legal domain: Voyage's specialized models. The 5 to 10% retrieval improvement on domain-specific queries justifies the higher per-token cost for applications where retrieval quality directly affects user experience.
Multilingual content: Cohere embed-v4. No other model matches its cross-lingual retrieval quality in a single model.
High volume, cost-sensitive: Self-hosted BGE-large or E5-large. Eliminates API costs entirely once you have the GPU infrastructure.
Adaptive Recall uses embeddings as one of four retrieval signals, so your system is not dependent on any single model's performance. Cognitive scoring, graph traversal, and confidence weighting compensate where embeddings fall short.
Try It Free