Home » Reducing AI Hallucinations » Build a Fact-Checking Layer

How to Build a Fact-Checking Layer for AI

A fact-checking layer is a post-generation pipeline that automatically verifies claims in LLM output against trusted sources before the response reaches users. It extracts individual factual assertions from the generated text, retrieves evidence for or against each claim, classifies them as supported, refuted, or unverifiable, and modifies the response accordingly. Building one requires a truth source to verify against, a claim extraction mechanism, and a verdict classifier.

Before You Start

A fact-checking layer is only as good as the truth sources it verifies against. Before building the pipeline, you need to identify what your system should treat as authoritative. For an enterprise application, this might be your product database, internal documentation, CRM records, and verified customer data. For a coding assistant, this might be the actual codebase, dependency manifests, and project documentation. For a general knowledge application, this might be a curated knowledge base, a knowledge graph, or a set of verified documents. The truth source does not need to be comprehensive, but it needs to be reliable within its coverage area.

You also need to decide on your error budget: what percentage of unverified claims is acceptable for your application. A medical information system might require 100% verification with human review for anything the automated system cannot verify. A casual assistant might accept 80% verification coverage and let the remaining claims pass with a disclaimer. This decision shapes every design choice in the pipeline.

Step-by-Step Implementation

Step 1: Define and index your truth sources.
Your truth sources need to be searchable, which means indexing them for both semantic search and, ideally, entity-based lookup. Persistent memory systems like Adaptive Recall are natural truth sources because they already store verified facts with confidence scores, entity links, and timestamps. If you are working with document collections, chunk and embed them for vector search. If you have structured data (databases, APIs, configuration files), build a query interface that the fact-checker can call programmatically. The goal is that given any factual claim, the system can quickly find the most relevant evidence from your truth sources.

Step 2: Build the claim extractor.
The claim extractor takes the LLM's response and produces a list of atomic factual claims. Each claim should be a single, self-contained assertion that is either true or false. "The API rate limit is 100 requests per second" is a good atomic claim. "The API has generous rate limits and handles errors well" contains two claims (generous rate limits, handles errors well) and subjective language (generous) that makes it harder to verify. Use an LLM for extraction with a prompt that specifies what counts as a factual claim and what to exclude. Test the extractor on sample responses to tune the extraction prompt until it reliably separates verifiable facts from opinions, recommendations, and filler.

CLAIM_EXTRACT = """Extract every verifiable factual claim from
the text below. Output one claim per line. Rules:
- Each claim must be a single, self-contained assertion
- Include specific facts: names, numbers, dates, versions
- Exclude opinions, recommendations, and hedged statements
- Exclude claims that are clearly the model's own reasoning
- If a sentence contains multiple facts, split them

Text:
{response}

Factual claims:"""

Step 3: Implement evidence retrieval for each claim.
For each extracted claim, query your truth sources for supporting or contradicting evidence. Use the claim text as a search query against your indexed truth sources. Retrieve the top 3 to 5 most relevant passages or records. If your truth sources include a knowledge graph, also query for any entities mentioned in the claim and check whether the asserted relationships match the graph. The output of this step is a set of evidence items for each claim, which the verdict classifier will use in the next step.

def retrieve_evidence(claim, truth_sources):
    evidence = []

    # Semantic search against document index
    docs = vector_index.search(claim, top_k=3)
    evidence.extend([{"type": "document", "content": d.text,
                      "score": d.similarity} for d in docs])

    # Entity lookup in knowledge graph
    entities = extract_entities(claim)
    for entity in entities:
        facts = knowledge_graph.query(entity)
        evidence.extend([{"type": "graph", "content": f.text,
                          "score": 1.0} for f in facts])

    # Memory store lookup
    memories = memory_store.recall(query=claim, limit=3)
    evidence.extend([{"type": "memory", "content": m.content,
                      "confidence": m.confidence}
                     for m in memories])

    return evidence

Step 4: Build the verdict classifier.
The verdict classifier takes each claim paired with its evidence and produces a verdict: supported (the evidence confirms the claim), refuted (the evidence contradicts the claim), or unverifiable (no evidence found, or evidence is ambiguous). You can implement this with an NLI model, which classifies whether the evidence entails the claim, contradicts it, or is neutral. Alternatively, use an LLM as a judge with a structured prompt that asks it to compare the claim against each evidence item and produce a verdict with an explanation. The LLM-as-judge approach is more flexible and produces human-readable explanations, while the NLI approach is faster and cheaper.

VERDICT_PROMPT = """Given the following claim and evidence,
determine if the claim is SUPPORTED, REFUTED, or UNVERIFIABLE.

Claim: {claim}

Evidence:
{evidence}

Rules:
- SUPPORTED: Evidence directly confirms the claim
- REFUTED: Evidence directly contradicts the claim
- UNVERIFIABLE: No relevant evidence, or evidence is ambiguous

Respond with exactly one of: SUPPORTED, REFUTED, UNVERIFIABLE
Then explain your reasoning in one sentence.

Verdict:"""

Step 5: Apply verdicts to modify the response.
With verdicts for every claim, modify the original response based on your application's risk tolerance. For supported claims, no changes needed. For refuted claims, either remove the incorrect statement entirely, replace it with the correct information from the evidence, or flag it with a visible warning. For unverifiable claims, you can add a qualifier ("this has not been verified against available records"), leave them unchanged with a general disclaimer, or remove them in high-stakes applications. The goal is a response where every factual statement is either verified or clearly marked as unverified.

Step 6: Close the feedback loop by storing verification results.
Store the results of each verification run in your memory system. When a claim is verified as correct, store it as a confirmed fact with high confidence, making it available as grounding context for future responses. When a claim is refuted, store the correction so the system has explicit knowledge of the right answer. When the system consistently hallucinates about a specific topic, the accumulated corrections create a strong grounding context that prevents the same hallucination from recurring. This feedback loop means the fact-checking layer does not just catch errors but actively reduces the rate of errors over time.

Handling Ambiguous Verdicts

Not every claim falls neatly into supported, refuted, or unverifiable. Partially supported claims, where the evidence supports part of the assertion but not all of it, are common. "The API supports 100 concurrent connections" might be partially supported if the evidence says "the API supports configurable connection limits, with a default of 50." The claim is not entirely wrong, but the specific number is. Your verdict classifier should handle partial support explicitly, either as a separate category or by splitting the claim into its component parts and verifying each independently.

Temporal ambiguity is another common issue. A claim might have been true historically but is no longer accurate, or might be true in one context but not another. Using confidence-scored memories with timestamps helps here, because the fact-checker can identify when a claim matches outdated information and flag it as "previously true but may be outdated" rather than simply "supported."

Build AI that verifies before it answers. Adaptive Recall provides the memory infrastructure, knowledge graph, and confidence scoring that powers reliable automated fact-checking.

Get Started Free

How to Build a Fact-Checking Layer for AI

Before You Start

Step-by-Step Implementation

Handling Ambiguous Verdicts

Related Articles