Home » Beyond RAG » Irrelevant Results

Why Does My RAG Return Irrelevant Results

RAG returns irrelevant results for seven common reasons: wrong embedding model for your domain, chunks that are too large or too small, no reranking step after initial retrieval, no metadata filtering to constrain the search, vocabulary mismatch between user queries and document language, stale content ranking alongside current content, and the absence of hybrid search for keyword-heavy queries. The highest-impact fix is usually adding a cross-encoder reranker, which improves precision by 15 to 25%. The second-highest fix is adding hybrid search with BM25 for keyword matching.

The Seven Causes and Their Fixes

1. Wrong Embedding Model

General-purpose embedding models (like OpenAI's text-embedding-ada-002) work well for general text but underperform on domain-specific content. Medical, legal, financial, and technical domains have specialized vocabulary that general models embed poorly. If your retrieval consistently misses relevant documents that contain domain-specific terms, try a domain-specific or newer embedding model. Voyage AI and Cohere embed-v3 outperform older models on retrieval benchmarks, and domain-specific models exist for medical and legal text.

2. Wrong Chunk Size

Chunks that are too large (over 1,000 tokens) contain too much unrelated information, diluting the embedding signal. The embedding represents the average meaning of the chunk rather than any specific point. Chunks that are too small (under 100 tokens) lack enough context to embed meaningfully. The optimal range for most applications is 200 to 500 tokens with semantic boundaries (splitting at paragraph or section breaks rather than arbitrary token counts).

3. No Reranking

Initial retrieval (vector search, BM25, or hybrid) optimizes for recall: finding all potentially relevant results. Without reranking, these results are ordered by similarity score, which does not measure how well a result answers the specific question. A cross-encoder reranker scores each query-result pair and reorders by answer relevance rather than topic similarity. This is the single highest-impact improvement for most RAG systems.

4. No Metadata Filtering

Without metadata filtering, retrieval searches the entire index including outdated documents, irrelevant document types, and content from different contexts. Adding filters for document type, date range, source, and custom tags narrows the search space so similarity scoring operates on a relevant subset. This is particularly important for multi-tenant applications and for systems with mixed content types.

5. Vocabulary Mismatch

Users describe problems in their own words. Documents use formal, technical language. The embedding model bridges some of this gap but not all of it. Adding BM25 keyword search alongside vector search (hybrid search) catches cases where exact terms matter. Query expansion (rewriting the query in multiple phrasings) increases the chance of matching the document's vocabulary.

6. Stale Content

Old content with high similarity scores outranks current content with lower similarity. A detailed explanation of the previous authentication system scores higher than a brief announcement about the new system because the old document shares more vocabulary with authentication-related queries. Timestamp-based decay (reducing scores based on content age) and freshness metadata (flagging or removing deprecated content) prevent stale information from dominating results.

7. No Hybrid Search

Vector search alone fails on queries that contain specific terms: product names, error codes, API endpoints, configuration keys, and version numbers. These terms need exact matching, which BM25 provides. Without hybrid search, queries like "configure REDIS_MAX_CONNECTIONS" return documents about Redis configuration in general rather than the specific configuration key.

Diagnosis First, Then Fix

Before applying fixes, diagnose which cause is actually responsible for your specific irrelevant results. Collect 50 queries where retrieval returned wrong results. For each, check whether the correct document was in the search results at all (retrieval failure) or was present but outranked (ranking failure). Retrieval failures point to embedding model, chunking, or vocabulary mismatch problems. Ranking failures point to reranking, metadata filtering, or staleness problems. Fix the category that accounts for the most failures first.

Adaptive Recall addresses all seven causes architecturally. Cognitive scoring combines similarity with recency, frequency, confidence, and entity connectivity, replacing both reranking and freshness management. The knowledge graph provides an alternative retrieval path that bypasses vocabulary mismatch by following entity connections. Memory consolidation keeps the knowledge base current by merging, updating, and removing stale content. The result is retrieval that stays relevant without requiring you to build and tune each fix independently.

Fix irrelevant results at the root. Adaptive Recall's multi-factor scoring addresses all seven causes of poor retrieval quality.

Try It Free