How Large Can a Knowledge Graph Get Before It Slows Down
Performance by Storage Engine
| Storage | Sweet spot | Slow point | Two-hop latency at sweet spot |
|---|---|---|---|
| NetworkX (in-memory) | Under 100K nodes | Memory limit (~1M nodes) | Sub-millisecond |
| PostgreSQL triples | Under 500K triples | 1M+ triples with deep CTEs | 5-50ms |
| Neo4j Community | Under 10M nodes | 10M+ (single instance limit) | 1-10ms |
| Neo4j Enterprise | Under 100M nodes | Depends on cluster size | 1-10ms |
What Actually Causes Slowdowns
High-Degree Nodes (Supernodes)
The most common performance issue is not total graph size but supernodes: entities with thousands of connections. If "PostgreSQL" is connected to 5,000 other entities in your graph (every service that uses it, every developer who has touched it, every configuration document that mentions it), any traversal that passes through PostgreSQL must process all 5,000 connections. At depth 2, this fans out to potentially millions of nodes.
The solution is degree-limited traversal. Instead of following all connections from a supernode, sort connections by relevance (relationship strength, confidence, recency) and follow only the top 50 or 100. This caps the fan-out at each hop and keeps traversal time predictable regardless of node degree. Spreading activation does this naturally because the decay factor reduces activation at each hop, and you only follow connections above a minimum activation threshold.
Deep Traversals
Each additional hop multiplies the number of nodes visited by the average degree of the graph. In a graph where nodes have an average of 10 connections, a one-hop traversal visits 10 nodes, a two-hop traversal visits up to 100, and a three-hop traversal visits up to 1,000. At four or five hops, you are visiting tens of thousands of nodes per query, which is slow in any storage engine.
For AI retrieval, two hops is the practical maximum. Direct connections (one hop) are strongly related to the query entity. Two-hop connections are indirectly related through one intermediary. Three-hop connections are so distant that the relevance signal is weak and the noise is high. Limiting traversal depth to two keeps query performance consistent and retrieval quality high.
Missing Indexes
In relational triple stores, missing indexes are the most common cause of poor performance. Without indexes on the subject, predicate, and object columns, every traversal requires a full table scan. With proper indexes, the database can look up an entity's connections in microseconds. A composite index on (subject, predicate) makes the most common query pattern ("find everything this entity is related to by this relationship type") fast. An index on object enables reverse traversal ("find everything related to this entity").
Scaling Strategies
Partition by domain. If your knowledge graph covers multiple distinct domains (engineering, sales, support), partition the graph so that traversals stay within a domain. Cross-domain connections still exist but are traversed less frequently, reducing the effective size of most queries.
Precompute common traversals. If certain queries are frequent (the top 100 entities queried, the most common relationship patterns), precompute and cache the traversal results. Update the cache when the underlying graph changes. This eliminates traversal latency for hot queries entirely.
Use spreading activation with thresholds. Instead of visiting every reachable node, only visit nodes whose accumulated activation exceeds a minimum threshold. This naturally prunes low-relevance branches and keeps the number of visited nodes bounded regardless of graph density.
Adaptive Recall uses spreading activation with adaptive thresholds for graph traversal. The traversal radius adjusts based on the density of the local graph neighborhood: sparse areas spread further, dense areas spread narrower. This keeps traversal time consistent across graphs of any size and density.
Graph traversal that stays fast at any scale. Adaptive Recall's spreading activation adapts to graph density automatically.
Try It Free