Home » Knowledge Graphs for AI » Without Neo4j

Can You Build a Knowledge Graph Without Neo4j

Yes. For graphs under 100,000 entities, a simple triples table in PostgreSQL with recursive CTEs handles traversal well. For prototyping, Python's NetworkX library stores the graph in memory. For production without managing any database, managed services like Adaptive Recall include graph capabilities built in. Neo4j is the best choice when you need native graph performance on large graphs or when Cypher query expressiveness saves significant development time, but it is not required for most AI knowledge graph applications.

PostgreSQL Triple Table

The simplest knowledge graph implementation is a table with three columns: subject, predicate, and object. Add columns for confidence, source, and created_at metadata. This approach uses your existing database, requires no additional infrastructure, and handles graphs up to several hundred thousand triples with good query performance when properly indexed.

CREATE TABLE triples ( id SERIAL PRIMARY KEY, subject TEXT NOT NULL, predicate TEXT NOT NULL, object TEXT NOT NULL, confidence REAL DEFAULT 0.8, source TEXT, created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_subject ON triples(subject); CREATE INDEX idx_object ON triples(object); CREATE INDEX idx_predicate ON triples(predicate); CREATE INDEX idx_sub_pred ON triples(subject, predicate);

Multi-hop traversal uses recursive CTEs. To find everything within two hops of a starting entity:

WITH RECURSIVE graph_walk AS ( SELECT object AS entity, 1 AS depth, confidence FROM triples WHERE subject = 'order_service' UNION ALL SELECT t.object, gw.depth + 1, t.confidence FROM triples t JOIN graph_walk gw ON t.subject = gw.entity WHERE gw.depth < 2 ) SELECT DISTINCT entity, MIN(depth) as distance FROM graph_walk GROUP BY entity ORDER BY distance;

The limitation is performance. Recursive CTEs on a large triple table are slower than Neo4j's native graph traversal because PostgreSQL must join the table against itself at each hop. For graphs under 100,000 triples, the difference is unnoticeable (both return in milliseconds). For larger graphs or deep traversals (3+ hops), Neo4j's index-free adjacency model is significantly faster.

Python In-Memory (NetworkX)

For prototyping and small applications, Python's NetworkX library stores the entire graph in memory as a dictionary of adjacency lists. No database needed. Graph operations (neighbors, shortest path, connected components) are built-in methods. The graph loads instantly and traversals execute in microseconds.

import networkx as nx G = nx.DiGraph() # add triples as edges G.add_edge("order_service", "PostgreSQL", predicate="uses", confidence=0.95) G.add_edge("order_service", "Redis", predicate="depends_on", confidence=0.9) G.add_edge("order_service", "payments_team", predicate="maintained_by", confidence=0.85) # one-hop neighbors neighbors = list(G.successors("order_service")) # two-hop traversal def traverse(graph, start, max_depth=2): visited = {} queue = [(start, 0)] while queue: node, depth = queue.pop(0) if depth > max_depth or node in visited: continue visited[node] = depth for neighbor in graph.successors(node): queue.append((neighbor, depth + 1)) return visited

The limitation is scale. NetworkX stores everything in memory, so a graph with 1 million nodes and 5 million edges uses several gigabytes of RAM. Persistence requires serializing to disk (pickle, JSON, or GraphML format) and loading on startup. For production applications with graphs larger than a few hundred thousand nodes, a persistent database is more appropriate.

SQLite for Embedded Applications

SQLite works well for applications that need graph storage without a server. The triple table approach is identical to PostgreSQL but runs as an embedded library. This is particularly useful for desktop applications, mobile apps, or edge deployments where running a database server is not practical. SQLite handles millions of rows efficiently and recursive CTEs work identically to PostgreSQL.

Other Alternatives

Amazon Neptune is a managed graph database that supports both property graph (with Gremlin) and RDF (with SPARQL). It eliminates operational burden but costs more than self-hosted options and locks you into AWS.

Apache AGE is a PostgreSQL extension that adds native graph capabilities, including Cypher support, to PostgreSQL. This gives you Neo4j-like query syntax and traversal performance without a separate database. It is a strong option if you are already using PostgreSQL and want graph capabilities without managing Neo4j.

Memgraph is an in-memory graph database compatible with Cypher. It is faster than Neo4j for real-time queries but requires enough RAM to hold the entire graph in memory.

When Neo4j Is Actually Necessary

Neo4j earns its place when your graph exceeds 500,000 nodes and you need sub-millisecond traversal, when you need Cypher's query expressiveness for complex traversal patterns (variable-length paths, pattern matching, aggregation over graph structures), or when your team already knows the Neo4j ecosystem and would spend more time building equivalent functionality in PostgreSQL than deploying Neo4j.

For most AI retrieval applications, the graph is a supporting data structure rather than the primary database. The graph augments vector search results with entity connectivity, which requires simple neighbor lookups and two-hop traversals. These operations are fast in any storage engine, so the choice between PostgreSQL, NetworkX, and Neo4j is driven by operational preference rather than performance requirements.

Adaptive Recall includes a managed knowledge graph as part of its memory system, eliminating the storage choice entirely. Entities and relationships are stored, maintained, and traversed as part of the memory storage and retrieval workflow. You get the retrieval benefits of a knowledge graph without choosing, deploying, or maintaining any graph database.

Skip the graph database decision. Adaptive Recall includes a managed knowledge graph with every memory system.

Try It Free