Home » Knowledge Graphs for AI » Combine Graph and Vector

How to Combine Graph Search with Vector Search

Combining graph search with vector search means running both retrieval methods on every query and merging their results into a single ranked list. Vector search finds content that is semantically similar to the query. Graph search finds content that is structurally connected to entities in the query. The combination catches results that either method alone would miss, improving recall from roughly 78% (vector only) to 91% (combined) on multi-hop queries while maintaining vector search's strong performance on simple queries.

Why Combine Rather Than Choose

Vector search and graph search have complementary strengths. Vector search excels at finding documents that discuss the same topic using similar language, even when the exact terms differ. It handles paraphrasing, synonyms, and conceptual similarity naturally because the embedding model encodes meaning rather than keywords. But vector search fails when the relevant information shares no vocabulary or semantic overlap with the query.

Graph search excels at finding information connected through entity relationships, regardless of vocabulary overlap. When a user asks about "deployment failures" and the root cause is documented under "certificate expiration in the load balancer," graph search follows the chain from "deployment" to "infrastructure" to "load balancer" to "certificate management." But graph search only works if the entities and relationships have been extracted and stored, and it misses topically relevant content that is not connected through explicit entities.

A combined system uses vector search for broad semantic recall and graph search for precise structural connectivity. Documents found by both methods receive the highest scores. Documents found by only one method still appear, but ranked lower. The result is a retrieval system that handles both simple and complex queries well.

Step-by-Step Implementation

Step 1: Set up parallel retrieval paths.
Configure your pipeline to run vector search and graph traversal concurrently. The user query enters the system and is processed in parallel: one path embeds the query and searches the vector database, the other path extracts entities and traverses the knowledge graph. Both paths return ranked result sets. Running them in parallel keeps latency close to whichever method is slower rather than adding their latencies together.

import asyncio

async def hybrid_retrieve(query, top_k=10):
    # run vector and graph retrieval concurrently
    vector_task = asyncio.create_task(
        vector_search(query, top_k=top_k * 2)
    )
    graph_task = asyncio.create_task(
        graph_search(query, max_depth=2, max_results=top_k * 2)
    )

    vector_results, graph_results = await asyncio.gather(
        vector_task, graph_task
    )

    return fuse(vector_results, graph_results, top_k=top_k)

Step 2: Normalize scores from each source.
Vector similarity scores and graph activation scores use different scales. Cosine similarity ranges from -1 to 1 (practically 0 to 1 for most content). Graph activation scores depend on your decay function and could range from 0 to any positive number. Normalize both to a 0 to 1 range using min-max normalization within each result set. This ensures that a high vector score and a high graph score have equal influence on the final ranking.

def normalize_scores(results, score_key="score"):
    if not results:
        return results

    scores = [r[score_key] for r in results]
    min_s = min(scores)
    max_s = max(scores)

    if max_s == min_s:
        for r in results:
            r["normalized"] = 1.0
        return results

    for r in results:
        r["normalized"] = (r[score_key] - min_s) / (max_s - min_s)

    return results

Step 3: Choose a fusion strategy.
Three fusion strategies work well in practice, each with different tradeoffs.

Weighted sum is the simplest approach. Compute a final score as alpha * vector_score + (1 - alpha) * graph_score where alpha controls the balance. Start with alpha = 0.6 (vector-favored) and tune from there. This works well when both sources produce calibrated scores and you want smooth control over the balance.

def weighted_sum_fusion(vector_results, graph_results,
                        alpha=0.6, top_k=10):
    v_norm = {r["id"]: r["normalized"]
              for r in normalize_scores(vector_results)}
    g_norm = {r["id"]: r["normalized"]
              for r in normalize_scores(graph_results)}

    all_ids = set(v_norm.keys()) | set(g_norm.keys())
    fused = []
    for doc_id in all_ids:
        v_score = v_norm.get(doc_id, 0.0)
        g_score = g_norm.get(doc_id, 0.0)
        final = alpha * v_score + (1 - alpha) * g_score
        fused.append({"id": doc_id, "score": final,
                       "vector": v_score, "graph": g_score})

    return sorted(fused, key=lambda x: -x["score"])[:top_k]

Reciprocal Rank Fusion (RRF) combines rank positions rather than scores. Each result's contribution is 1 / (k + rank) where k is a constant (typically 60). RRF is robust to score scale differences because it only uses rank order, not score magnitudes. It tends to favor results that rank well in both lists.

def rrf_fusion(vector_results, graph_results, k=60, top_k=10):
    rrf_scores = {}

    for rank, r in enumerate(vector_results):
        rrf_scores[r["id"]] = rrf_scores.get(r["id"], 0.0)
        rrf_scores[r["id"]] += 1.0 / (k + rank + 1)

    for rank, r in enumerate(graph_results):
        rrf_scores[r["id"]] = rrf_scores.get(r["id"], 0.0)
        rrf_scores[r["id"]] += 1.0 / (k + rank + 1)

    fused = [{"id": doc_id, "score": score}
             for doc_id, score in rrf_scores.items()]
    return sorted(fused, key=lambda x: -x["score"])[:top_k]

Cascading retrieval uses graph search as a reranker rather than a parallel source. First, retrieve a broad set from vector search (top 50). Then, for each result, check whether it is connected to query entities in the graph and boost its score if so. This approach adds minimal latency because the graph is only consulted for the already-retrieved results rather than performing an independent traversal.

Step 4: Implement deduplication.
When the same document appears in both result sets, merge the entries rather than creating duplicates. The merged entry should carry both scores so the fusion formula can use them. Documents appearing in both sets are strong candidates for relevance (both semantic and structural signals agree), so they naturally rise to the top of the fused ranking.

Step 5: Tune weights with evaluation data.
Build a test set of 100 queries categorized by type: simple lookups, multi-hop reasoning, entity-specific, and broad exploratory. For each query, annotate the expected relevant documents. Run the hybrid system with different alpha values (0.4, 0.5, 0.6, 0.7, 0.8) and measure recall at k=5 and k=10 for each query category. The optimal alpha usually differs by query type, so consider adaptive weighting that detects entity-heavy queries and shifts alpha toward graph search for those.

Adaptive Recall's Approach

Adaptive Recall combines vector similarity and graph traversal in its recall tool using a spreading activation model derived from ACT-R cognitive architecture. The cognitive scoring formula naturally fuses vector similarity (how semantically close the memory is to the query), base-level activation (recency and frequency), spreading activation (graph connectivity), and confidence weighting (how well-corroborated the memory is). This multi-factor scoring replaces manual alpha tuning with a single integrated formula that adapts to each query's characteristics.

Get hybrid retrieval without building the fusion logic. Adaptive Recall's cognitive scoring combines vector similarity and graph traversal in every recall.

Try It Free

How to Combine Graph Search with Vector Search

Why Combine Rather Than Choose

Step-by-Step Implementation

Adaptive Recall's Approach

Related Articles