How to Implement Online Learning for Retrieval
Before You Start
You need a retrieval system with a parameterized ranking function (weights for similarity, recency, frequency, confidence, or other factors) and a feedback mechanism that provides reward signals quickly enough for near-real-time updates. If feedback is delayed by hours or days (like task completion metrics), batch learning is more appropriate. Online learning works best when feedback is available within seconds to minutes of the retrieval event.
Step-by-Step Implementation
Identify which components of your ranking function can be adjusted incrementally without destabilizing the system. Good candidates are the relative weights of ranking factors (how much weight to give similarity versus recency), per-memory quality scores (boosting or demoting individual memories based on usage), and threshold values (minimum similarity score to include a result).
class OnlineRanker:
def __init__(self):
# Ranking weights that can be updated online
self.weights = {
"similarity": 0.4,
"recency": 0.2,
"frequency": 0.2,
"confidence": 0.2
}
# Per-memory quality adjustments
self.memory_boosts = {}
# Learning parameters
self.lr = 0.01
self.update_count = 0
def score(self, memory, query_similarity):
base = (
self.weights["similarity"] * query_similarity +
self.weights["recency"] * memory["recency_score"] +
self.weights["frequency"] * memory["freq_score"] +
self.weights["confidence"] * memory["confidence"]
)
boost = self.memory_boosts.get(memory["id"], 0.0)
return base + boostAfter each interaction, compute the reward and adjust the parameters that contributed to the outcome. If a retrieval event produced a positive reward, increase the weights of the factors that were dominant in the ranking. If it produced a negative reward, decrease them.
def update(self, query_similarity, memory, reward):
self.update_count += 1
# Gradient direction: which factors contributed
# most to the ranking score for this memory?
factors = {
"similarity": query_similarity,
"recency": memory["recency_score"],
"frequency": memory["freq_score"],
"confidence": memory["confidence"]
}
# Update weights proportional to factor contribution
for factor, value in factors.items():
gradient = reward * value
self.weights[factor] += self.lr * gradient
# Normalize weights to sum to 1.0
total = sum(self.weights.values())
if total > 0:
for k in self.weights:
self.weights[k] /= total
# Update per-memory quality boost
mid = memory["id"]
current_boost = self.memory_boosts.get(mid, 0.0)
self.memory_boosts[mid] = current_boost + (
self.lr * reward * 0.1
)The learning rate should decrease over time as the system accumulates more experience. Early on, with few observations, larger updates are appropriate because the system is still exploring the parameter space. As experience accumulates and the parameters approach optimal values, smaller updates prevent overshooting.
def get_learning_rate(self):
# Inverse square root schedule: fast early, slow later
return self.lr / (1 + 0.001 * self.update_count)
def update_with_schedule(self, query_similarity,
memory, reward):
current_lr = self.get_learning_rate()
factors = {
"similarity": query_similarity,
"recency": memory["recency_score"],
"frequency": memory["freq_score"],
"confidence": memory["confidence"]
}
for factor, value in factors.items():
gradient = reward * value
self.weights[factor] += current_lr * gradient
total = sum(self.weights.values())
if total > 0:
for k in self.weights:
self.weights[k] /= total
self.update_count += 1Constrain each parameter to a valid range to prevent extreme updates from dominating the ranking. No single weight should drop to zero (eliminating a factor entirely) or dominate at 1.0 (making all other factors irrelevant). Clip parameter updates to stay within bounds.
MIN_WEIGHT = 0.05
MAX_WEIGHT = 0.70
MAX_BOOST = 0.5
MIN_BOOST = -0.3
def clip_and_normalize(self):
for k in self.weights:
self.weights[k] = max(
self.MIN_WEIGHT,
min(self.MAX_WEIGHT, self.weights[k])
)
total = sum(self.weights.values())
for k in self.weights:
self.weights[k] /= total
def clip_boost(self, memory_id):
if memory_id in self.memory_boosts:
self.memory_boosts[memory_id] = max(
self.MIN_BOOST,
min(self.MAX_BOOST,
self.memory_boosts[memory_id])
)User behavior changes over time. A ranking strategy that worked well last month might not work this month because the user's project changed, new content was added, or their preferences evolved. Detect distribution shifts by monitoring the running average reward. If the average drops significantly, increase the learning rate temporarily to allow faster adaptation.
def detect_shift(self, recent_rewards, window=50):
if len(recent_rewards) < window * 2:
return False
old_avg = sum(recent_rewards[-window*2:-window]) / window
new_avg = sum(recent_rewards[-window:]) / window
# Significant drop suggests distribution shift
return new_avg < old_avg * 0.8
def handle_shift(self):
# Reset learning rate to allow faster adaptation
self.lr = self.lr * 5
# Schedule will bring it back down over timeTrack the magnitude of parameter updates over time. A converging system shows decreasing update magnitudes as parameters stabilize. An oscillating system shows persistent large updates, which indicates the learning rate is too high or the reward signal is too noisy.
Plot the rolling average of update magnitudes alongside the reward trend. Both should stabilize over time. If the reward is increasing but updates are still large, the system is learning but has not converged. If the reward is flat but updates are large, the system is oscillating. If both are stable, the system has converged to an effective policy.
Online Learning in Adaptive Recall
Adaptive Recall implements online learning natively through ACT-R activation dynamics. Every retrieval event updates the activation history of each involved memory immediately, without waiting for batch processing. The activation equation computes current activation from the complete access history, weighted by recency, which provides natural learning rate decay (recent events have more influence) and non-stationarity handling (old events fade automatically). This is mathematically equivalent to online learning with exponential recency weighting, but it is expressed as a single equation rather than an incremental update procedure.
Get real-time retrieval improvement without building online learning infrastructure. Adaptive Recall's activation dynamics update with every interaction.
Get Started Free