How to Build Hybrid Search with BM25 and Vectors
Why Hybrid Outperforms Either Approach Alone
Vector search and keyword search fail on different types of queries. Vector search fails on exact identifiers (error codes, version numbers, product names) because embedding models treat them as opaque strings with weak semantic signal. Keyword search fails on conceptual queries ("how to make the API faster") because the answer documents may not contain the exact query words. Running both and combining results captures the strengths of each while covering the weaknesses of the other.
The improvement is not theoretical. Benchmarks on the BEIR information retrieval dataset show that hybrid search with reciprocal rank fusion improves NDCG@10 by 5 to 15% over vector-only search across 13 diverse datasets. On technical documentation specifically, where exact identifiers are common, the improvement is often at the higher end of that range.
Step-by-Step Implementation
You need a working vector search system before adding the keyword component. This means your documents are chunked, embedded with a model, and stored in a vector database or index. Verify that vector search returns reasonable results for semantic queries ("how does authentication work") before proceeding. If vector search is not working on its own, hybrid search will not fix the underlying embedding quality issues.
Build a BM25 index over the same corpus. You have several options depending on your stack. If you use PostgreSQL with pgvector, use PostgreSQL's built-in full-text search (tsvector and tsquery). If you use a dedicated vector database, add Elasticsearch or OpenSearch alongside it for keyword search. If you use Weaviate or Qdrant, both support sparse vector representations that approximate BM25 natively.
# Option A: PostgreSQL full-text search alongside pgvector
# Assumes a documents table with content, embedding columns
ALTER TABLE documents ADD COLUMN search_vector tsvector;
UPDATE documents SET search_vector = to_tsvector('english', content);
CREATE INDEX idx_search ON documents USING GIN(search_vector);
-- Vector search
SELECT id, content, 1 - (embedding <=> query_embedding) AS vector_score
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 20;
-- Keyword search
SELECT id, content, ts_rank(search_vector, plainto_tsquery('english', 'your query')) AS bm25_score
FROM documents
WHERE search_vector @@ plainto_tsquery('english', 'your query')
ORDER BY bm25_score DESC
LIMIT 20;# Option B: Python with rank-bm25 library for standalone BM25
from rank_bm25 import BM25Okapi
import numpy as np
# Index documents
tokenized_corpus = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_corpus)
# Search
def keyword_search(query: str, top_k: int = 20):
tokens = query.lower().split()
scores = bm25.get_scores(tokens)
top_indices = np.argsort(scores)[-top_k:][::-1]
return [(idx, scores[idx]) for idx in top_indices if scores[idx] > 0]For each search query, run the vector search and keyword search simultaneously and collect both result sets. Each system returns a ranked list of document IDs with scores. Request more results from each system than you want in the final output (for example, top 20 from each if you want a final top 10) to ensure good coverage before fusion.
import asyncio
from typing import List, Tuple
async def hybrid_search(query: str, top_k: int = 10):
# Run both searches in parallel
vector_task = asyncio.create_task(vector_search(query, top_k=top_k * 2))
keyword_task = asyncio.create_task(keyword_search_async(query, top_k=top_k * 2))
vector_results = await vector_task # [(doc_id, score), ...]
keyword_results = await keyword_task # [(doc_id, score), ...]
# Fuse and return top k
fused = reciprocal_rank_fusion(vector_results, keyword_results)
return fused[:top_k]RRF assigns each document a score based on its rank position in each result list:
score = 1 / (k + rank) where k is a constant (typically 60). Documents that appear in both lists get their scores summed. This produces a single ranked list that favors documents ranked highly by both systems. RRF works well because it does not require normalizing raw scores from different systems, it only needs rank positions.
def reciprocal_rank_fusion(
*result_lists: List[Tuple[str, float]],
k: int = 60
) -> List[Tuple[str, float]]:
"""Fuse multiple ranked result lists using RRF.
Args:
result_lists: Each list contains (doc_id, score) tuples
ordered by relevance (best first).
k: Smoothing constant (default 60, standard in literature).
Returns:
Fused (doc_id, rrf_score) list sorted by combined score.
"""
scores = {}
for result_list in result_lists:
for rank, (doc_id, _) in enumerate(result_list):
if doc_id not in scores:
scores[doc_id] = 0.0
scores[doc_id] += 1.0 / (k + rank + 1)
fused = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return fusedThe default RRF treats both systems equally. If your queries skew toward one type (mostly semantic or mostly exact-match), you can weight the systems differently by multiplying the RRF contribution from each system. Evaluate by measuring recall@k on a test set of queries with known relevant documents. Compare vector-only, keyword-only, and hybrid results to verify that hybrid actually improves retrieval for your data.
def weighted_rrf(
vector_results: List[Tuple[str, float]],
keyword_results: List[Tuple[str, float]],
vector_weight: float = 0.6,
keyword_weight: float = 0.4,
k: int = 60
) -> List[Tuple[str, float]]:
scores = {}
for rank, (doc_id, _) in enumerate(vector_results):
scores[doc_id] = scores.get(doc_id, 0.0)
scores[doc_id] += vector_weight / (k + rank + 1)
for rank, (doc_id, _) in enumerate(keyword_results):
scores[doc_id] = scores.get(doc_id, 0.0)
scores[doc_id] += keyword_weight / (k + rank + 1)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)Database-Native Hybrid Search
Some vector databases handle hybrid search internally, removing the need for a separate keyword index. Weaviate supports hybrid search with a single query parameter (alpha) that controls the balance between BM25 and vector search. Qdrant supports sparse vectors alongside dense vectors, enabling BM25-style matching with the same query interface. If you use these databases, hybrid search is a configuration change rather than an infrastructure addition.
For PostgreSQL with pgvector, you can combine both searches in a single SQL query using a CTE:
WITH vector_results AS (
SELECT id, 1 - (embedding <=> $1) AS score
FROM documents
ORDER BY embedding <=> $1
LIMIT 20
),
keyword_results AS (
SELECT id, ts_rank(search_vector, plainto_tsquery('english', $2)) AS score
FROM documents
WHERE search_vector @@ plainto_tsquery('english', $2)
ORDER BY score DESC
LIMIT 20
),
combined AS (
SELECT id,
COALESCE(v.score, 0) * 0.7 + COALESCE(k.score, 0) * 0.3 AS hybrid_score
FROM vector_results v
FULL OUTER JOIN keyword_results k USING (id)
)
SELECT id, hybrid_score
FROM combined
ORDER BY hybrid_score DESC
LIMIT 10;When Not to Bother with Hybrid
Hybrid search adds complexity. If your queries are almost entirely semantic ("explain how authentication works," "what is our deployment process"), keyword search adds minimal value. Test vector-only recall on your actual query distribution before committing to hybrid. If vector-only recall at top-10 is already above 90%, the marginal improvement from hybrid may not justify the additional infrastructure and query latency.
Adaptive Recall goes beyond hybrid search, combining vector similarity with cognitive scoring, knowledge graph traversal, and confidence weighting in a single retrieval call.
Try It Free