Home » Knowledge Graphs for AI » Keep Updated

How to Keep a Knowledge Graph Updated Over Time

A knowledge graph that is not maintained becomes a liability rather than an asset. Stale relationships lead to wrong answers, deprecated entities confuse retrieval, and missing connections create blind spots. Keeping a graph accurate requires change detection on source documents, incremental re-extraction, diffing new triples against existing data, confidence-based update logic, and periodic full validation. This guide covers each step.

Why Graphs Go Stale

Knowledge graphs go stale because the reality they model changes faster than the graph is updated. A team migrates from MySQL to PostgreSQL, but the graph still says "order service uses MySQL." A developer leaves the company, but the graph still shows them as the maintainer of three services. An API endpoint is deprecated, but the graph still connects applications to it. Each stale triple produces retrieval results that are technically connected to the query but factually wrong.

The rate of staleness depends on your domain. Infrastructure graphs change weekly as services are deployed, scaled, and reconfigured. Personnel graphs change monthly as people join, leave, and switch teams. Conceptual graphs (technology comparisons, architectural patterns) change slowly but can shift dramatically when a new version launches or a technology is deprecated. Understanding the change velocity of your domain tells you how often to update.

Step-by-Step Maintenance

Step 1: Set up change detection.
Monitor the sources that feed your knowledge graph for changes. If your graph was built from documentation, watch for file modifications using filesystem events, git webhooks, or polling. If your graph was built from API responses, schedule periodic re-fetches and compare against cached versions. If your graph was built from conversation logs or support tickets, monitor the stream for new entries that mention known entities.
import hashlib import json class ChangeDetector: def __init__(self, state_file="graph_state.json"): self.state_file = state_file self.state = self._load_state() def _load_state(self): try: with open(self.state_file) as f: return json.load(f) except FileNotFoundError: return {} def has_changed(self, doc_id, content): content_hash = hashlib.sha256(content.encode()).hexdigest() previous = self.state.get(doc_id) if previous != content_hash: self.state[doc_id] = content_hash return True return False def save(self): with open(self.state_file, 'w') as f: json.dump(self.state, f)
Step 2: Implement incremental extraction.
When a source document changes, re-extract entities and relationships from that document only. Do not reprocess the entire corpus for every change. Tag the new extractions with the source document ID and a timestamp so you can track provenance. This keeps extraction costs proportional to the volume of changes rather than the total corpus size.
def incremental_update(changed_docs, graph_db): for doc_id, content in changed_docs: # extract from changed document chunks = chunk_text(content) new_entities = [] new_triples = [] for chunk in chunks: extraction = extract_from_chunk(chunk) new_entities.extend(extraction["entities"]) new_triples.extend(extraction["relationships"]) # diff against existing graph data for this document existing = graph_db.get_triples_by_source(doc_id) changes = diff_triples(existing, new_triples) # apply changes apply_graph_changes(graph_db, changes, doc_id)
Step 3: Diff new extractions against the existing graph.
Compare newly extracted triples with the triples currently in the graph from the same source document. Classify each triple as: new (exists in extraction but not in graph), unchanged (exists in both with same subject, predicate, object), modified (same subject and object but different predicate, or same subject and predicate but different object), or deleted (exists in graph but not in new extraction). Each classification drives a different update action.
def diff_triples(existing, extracted): existing_set = {(t["subject"], t["predicate"], t["object"]) for t in existing} extracted_set = {(t["subject"], t["predicate"], t["object"]) for t in extracted} return { "new": extracted_set - existing_set, "deleted": existing_set - extracted_set, "unchanged": existing_set & extracted_set }
Step 4: Apply confidence-based updates.
Do not blindly overwrite existing triples with new extractions. Use confidence scores to determine the appropriate action. New triples from a single extraction start at moderate confidence (0.6 to 0.7). Triples that appear in multiple extractions or from multiple sources accumulate confidence. Triples that are "deleted" (not found in re-extraction) do not get removed immediately. Instead, reduce their confidence by a fixed amount (0.1 to 0.2). Only remove triples when confidence drops below a threshold (0.3). This prevents extraction noise from destabilizing the graph.
def apply_graph_changes(graph_db, changes, source_doc): for s, p, o in changes["new"]: graph_db.upsert_triple(s, p, o, confidence=0.7, source=source_doc, updated=datetime.now()) for s, p, o in changes["deleted"]: current = graph_db.get_triple(s, p, o) if current: new_conf = current["confidence"] - 0.15 if new_conf < 0.3: graph_db.archive_triple(s, p, o) else: graph_db.update_confidence(s, p, o, new_conf) for s, p, o in changes["unchanged"]: current = graph_db.get_triple(s, p, o) if current and current["confidence"] < 0.95: graph_db.update_confidence(s, p, o, min(current["confidence"] + 0.05, 0.95))
Step 5: Handle contradictions.
When a new extraction says "checkout service uses Braintree" but the graph says "checkout service uses Stripe," you have a contradiction. Do not silently overwrite. Instead, keep both triples with a contradiction flag and reduced confidence on the older triple. Log the contradiction for review. In many cases, both may be true (the service migrated, or uses both), and only a human can determine the correct resolution.
def handle_contradiction(graph_db, new_triple, existing_triple): # reduce confidence on existing graph_db.update_confidence( existing_triple["subject"], existing_triple["predicate"], existing_triple["object"], existing_triple["confidence"] * 0.7 ) # add new triple at moderate confidence graph_db.upsert_triple( new_triple["subject"], new_triple["predicate"], new_triple["object"], confidence=0.6, contradicts=existing_triple["id"] ) # log for review log_contradiction(existing_triple, new_triple)
Step 6: Schedule maintenance cycles.
Incremental updates catch changes as they happen, but drift accumulates from sources that change without triggering detection. Schedule a full re-extraction and validation cycle weekly or monthly depending on your domain's change velocity. The full cycle reprocesses all source documents, compares the complete extracted graph against the current graph, and generates a report of discrepancies. This catches the changes that incremental updates miss.

Automated Maintenance with Adaptive Recall

Adaptive Recall handles graph maintenance as part of its memory consolidation process. When memories are consolidated (merged, updated, or archived), the entities and relationships associated with those memories are re-evaluated. New entities are added. Entities whose source memories have been archived have their confidence reduced. Contradictions between memories propagate to the graph as reduced confidence on conflicting triples. This keeps the graph aligned with the current state of the memory system without requiring a separate maintenance pipeline.

Let your knowledge graph maintain itself. Adaptive Recall updates entity connections automatically during memory consolidation.

Get Started Free