Home » Reinforcement Learning » Build Feedback Loop

How to Build a Feedback Loop for AI Retrieval

A feedback loop captures signals about retrieval quality from user behavior, converts those signals into rewards, and uses the rewards to adjust ranking parameters. Over time, the system learns which ranking strategies produce the best outcomes and shifts toward them automatically. This guide walks through building one from instrumentation through measurement.

Before You Start

You need an existing retrieval system that serves ranked results to users, whether that is a search engine, a RAG pipeline, a memory retrieval system, or any application where the quality of returned results matters. You also need the ability to observe user behavior after results are served: clicks, dwell time, query reformulations, task completions, or other engagement signals.

Step-by-Step Implementation

Step 1: Instrument retrieval events.
Before you can learn from feedback, you need to record what happened. Log every retrieval event with the query, the results served (in order), the scores that determined the ranking, and a unique event ID that links the retrieval to subsequent user behavior.

import uuid
import time
import json

def log_retrieval(query, results, user_id):
    event_id = str(uuid.uuid4())
    event = {
        "event_id": event_id,
        "timestamp": time.time(),
        "user_id": user_id,
        "query": query,
        "results": [
            {
                "memory_id": r["id"],
                "rank": i + 1,
                "score": r["score"],
                "content_preview": r["content"][:100]
            }
            for i, r in enumerate(results)
        ]
    }
    store_event(event)
    return event_id

Step 2: Capture implicit feedback signals.
After results are served, track how the user interacts with them. The most useful signals are which results the model actually used in its response (for memory/RAG systems), whether the user asked a follow-up question on the same topic (indicating the answer was insufficient), and whether the user's task was completed (the ultimate quality signal).

def log_feedback(event_id, feedback_type, details):
    feedback = {
        "event_id": event_id,
        "timestamp": time.time(),
        "type": feedback_type,
        "details": details
    }
    store_feedback(feedback)

# After the LLM generates a response, check which memories
# it actually referenced
def detect_memory_usage(response_text, served_memories):
    used = []
    unused = []
    for mem in served_memories:
        # Check if key phrases from the memory appear in response
        key_terms = extract_key_terms(mem["content"])
        if any(term.lower() in response_text.lower()
               for term in key_terms):
            used.append(mem["id"])
        else:
            unused.append(mem["id"])
    return used, unused

Step 3: Design the reward function.
Convert the collected feedback signals into a numerical reward for each retrieval event. The reward should be higher when results were useful and lower when they were ignored or caused the user to reformulate. A composite reward combining multiple signals is more robust than any single signal.

def compute_reward(event_id):
    feedbacks = get_feedbacks_for_event(event_id)
    event = get_event(event_id)

    reward = 0.0

    for fb in feedbacks:
        if fb["type"] == "memory_used":
            # Memory was referenced in the model's response
            reward += 1.0

        elif fb["type"] == "memory_ignored":
            # Memory was served but not used
            reward -= 0.3

        elif fb["type"] == "query_reformulated":
            # User had to rephrase, results were insufficient
            reward -= 0.5

        elif fb["type"] == "task_completed":
            # User accomplished their goal
            reward += 2.0

        elif fb["type"] == "explicit_positive":
            # User gave thumbs up or similar
            reward += 1.5

        elif fb["type"] == "explicit_negative":
            # User gave thumbs down
            reward -= 1.5

    # Normalize by number of results served
    num_results = len(event["results"])
    if num_results > 0:
        reward = reward / num_results

    return reward

Step 4: Build the update mechanism.
Use accumulated rewards to adjust ranking parameters. The simplest approach is to update per-memory quality scores based on whether each memory was useful when served. Memories that consistently produce positive rewards get boosted in future rankings. Memories that consistently produce negative rewards get demoted.

def update_rankings(batch_size=100):
    recent_events = get_recent_events(limit=batch_size)

    for event in recent_events:
        if event.get("processed"):
            continue

        reward = compute_reward(event["event_id"])
        feedbacks = get_feedbacks_for_event(event["event_id"])

        used_ids = [
            fb["details"]["memory_id"]
            for fb in feedbacks
            if fb["type"] == "memory_used"
        ]
        unused_ids = [
            fb["details"]["memory_id"]
            for fb in feedbacks
            if fb["type"] == "memory_ignored"
        ]

        # Boost useful memories
        for mid in used_ids:
            adjust_memory_score(mid, delta=+0.1 * reward)

        # Slightly demote unused memories
        for mid in unused_ids:
            adjust_memory_score(mid, delta=-0.02)

        mark_processed(event["event_id"])

Learning rate matters. The delta values in the update function control how fast the system adapts. Too large and the system overreacts to noise. Too small and it takes too long to learn. Start conservative (small deltas) and increase if the system is not adapting fast enough. You can always increase the learning rate, but recovering from an overfit is harder.

Step 5: Add safety guardrails.
The feedback loop must not degrade retrieval quality below a baseline. Add a minimum quality threshold that prevents ranking changes from being applied if they would drop performance. Also add a rollback mechanism that reverts the last batch of updates if quality metrics deteriorate.

def safe_update(batch_size=100):
    # Snapshot current quality metrics
    baseline_mrr = measure_mean_reciprocal_rank()

    # Apply updates
    update_rankings(batch_size)

    # Measure quality after updates
    new_mrr = measure_mean_reciprocal_rank()

    # Rollback if quality dropped significantly
    if new_mrr < baseline_mrr * 0.95:
        rollback_last_updates()
        log_warning("Ranking update rolled back due to "
                    "quality degradation")

Step 6: Measure and iterate.
Track retrieval quality metrics over time to verify the feedback loop is working. Key metrics include mean reciprocal rank (MRR), recall at k (what percentage of relevant memories appear in the top k results), and the percentage of served memories that are actually used. Plot these metrics over time. A working feedback loop shows gradual improvement across all metrics.

Run the feedback loop on a small percentage of traffic initially (5-10%) while measuring quality on both the feedback-enabled and control groups. When the feedback group shows consistent improvement, expand to all traffic. This A/B testing approach provides statistical confidence that the feedback loop is genuinely improving results rather than just changing them.

Adaptive Recall's Built-in Feedback

Adaptive Recall implements a feedback loop natively through ACT-R activation dynamics. Every time a memory is retrieved, its activation score updates based on recency and frequency. Memories that are consistently retrieved across different queries gain higher base-level activation and resist decay. Memories that are never retrieved lose activation and eventually fade. The spreading activation component adds a contextual feedback signal: memories connected to frequently activated entities gain activation through the graph, even if they are not directly retrieved.

This means Adaptive Recall improves retrieval quality over time without requiring you to build custom feedback infrastructure. The cognitive scoring system provides the reward signal (activation gain from successful retrieval), the update mechanism (ACT-R activation equations), and the safety guardrails (activation bounds, confidence-based protection) as built-in features.

Get retrieval that improves from every interaction without building feedback infrastructure. Adaptive Recall's cognitive scoring learns from usage automatically.

Get Started Free

How to Build a Feedback Loop for AI Retrieval

Before You Start

Step-by-Step Implementation

Adaptive Recall's Built-in Feedback

Related Articles