How to Build a Memory-Powered Recommendation Layer
Before You Start
Memory-powered recommendations work best when you already have a persistent memory layer storing user interactions. If your application stores memories through Adaptive Recall or a similar system, you already have the data foundation. If you are starting from scratch, you need at minimum a way to store and retrieve user preference memories, a content catalog or action set to recommend from, and a way to display recommendations in your application.
This approach differs from traditional recommendation engines in an important way: it does not require large user populations for collaborative filtering. Memory-based recommendations work on a per-user basis, drawing from that individual's stored history and preferences. This makes them effective even for applications with small user bases or highly individualized use cases where users have little behavioral overlap.
Step-by-Step Implementation
Start by identifying every point in your application where a recommendation could add value. Common surfaces include session start (suggest what to work on based on recent activity), mid-conversation (suggest related topics or next steps), content discovery (surface articles, documentation, or features the user has not explored), and proactive suggestions (alert the user to relevant new content based on their interests).
Each recommendation surface has different constraints. Session start recommendations need to load fast because they are in the critical path. Mid-conversation recommendations need to be contextually relevant to the current discussion. Content discovery can be less time-sensitive but needs to balance novelty with relevance. Define the latency budget, the number of recommendations, and the content type for each surface before building the pipeline.
The user memory profile is the queryable representation of everything the system knows about a user. It combines explicit preferences (stated by the user), implicit preferences (inferred from behavior), interaction history (what the user has engaged with), and entity connections (topics, tools, and concepts the user is associated with in the knowledge graph).
Structure the profile as a set of weighted topics and preferences rather than a flat list. Each topic carries a weight based on how frequently and recently the user has engaged with it. This weighted structure allows the recommendation engine to prioritize topics the user is actively interested in over topics they explored once months ago.
async function buildUserProfile(userId) {
// Retrieve all user memories, sorted by activation
const memories = await recall({
userId: userId,
limit: 100,
sort: 'activation'
});
// Extract weighted topic profile
const topicWeights = {};
for (const memory of memories) {
for (const entity of memory.entities || []) {
const weight = memory.confidence * memory.activation;
topicWeights[entity] = (topicWeights[entity] || 0) + weight;
}
}
// Normalize weights to 0-1 range
const maxWeight = Math.max(...Object.values(topicWeights));
for (const topic in topicWeights) {
topicWeights[topic] /= maxWeight;
}
return {
userId,
topics: topicWeights,
preferences: memories.filter(m => m.type === 'preference'),
recentActivity: memories.slice(0, 10),
negativePreferences: memories.filter(m => m.category === 'negative')
};
}Candidate generation produces a broad set of potentially relevant recommendations. Use the user profile to query your content catalog from multiple angles: semantic similarity to the user's top topics, entity overlap with the user's knowledge graph connections, popularity among similar users (if you have collaborative data), and recency of content creation (to surface new material).
Generate more candidates than you will ultimately show. If you plan to display five recommendations, generate fifty candidates and let the ranking step select the best five. This over-generation ensures that the final recommendations are high quality even when individual candidate sources have mixed relevance.
async function generateCandidates(userProfile, surface) {
const candidates = [];
// Semantic similarity to user's top topics
const topTopics = Object.entries(userProfile.topics)
.sort((a, b) => b[1] - a[1])
.slice(0, 10);
for (const [topic, weight] of topTopics) {
const matches = await searchContent({
query: topic,
limit: 10,
exclude: userProfile.recentActivity.map(m => m.id)
});
candidates.push(...matches.map(m => ({
...m,
source: 'topic_match',
sourceWeight: weight
})));
}
// Entity graph neighbors: content connected to user's entities
const graphNeighbors = await graphQuery({
startEntities: Object.keys(userProfile.topics).slice(0, 5),
depth: 2,
contentType: surface.contentType,
limit: 20
});
candidates.push(...graphNeighbors.map(n => ({
...n,
source: 'graph_traversal',
sourceWeight: 0.8
})));
// Deduplicate by content ID
return deduplicateById(candidates);
}Rank the candidates using the same cognitive scoring principles that power memory retrieval. Each candidate gets a composite score based on relevance (semantic similarity to the user's current context and stored interests), recency (how recently the user engaged with related topics), frequency (how often the user has engaged with similar content), confidence (how reliable the preference signals supporting this recommendation are), and entity connection strength (how closely the content connects to the user's knowledge graph).
function scoreCandidates(candidates, userProfile, currentContext) {
return candidates.map(candidate => {
// Base relevance from semantic similarity
let score = candidate.similarity || 0.5;
// Boost from user topic weights
for (const entity of candidate.entities || []) {
if (userProfile.topics[entity]) {
score += userProfile.topics[entity] * 0.3;
}
}
// Recency boost for topics the user engaged with recently
const recentEntities = new Set(
userProfile.recentActivity.flatMap(m => m.entities || [])
);
const recentOverlap = (candidate.entities || [])
.filter(e => recentEntities.has(e)).length;
score += recentOverlap * 0.15;
// Context relevance boost
if (currentContext) {
const contextSimilarity = cosineSimilarity(
candidate.embedding,
currentContext.embedding
);
score += contextSimilarity * 0.25;
}
// Negative preference penalty
for (const neg of userProfile.negativePreferences) {
if (candidate.content.toLowerCase().includes(neg.value.toLowerCase())) {
score *= 0.1; // heavy penalty, not full exclusion
}
}
return { ...candidate, score };
}).sort((a, b) => b.score - a.score);
}Track how users respond to recommendations. The three signals to capture are engagement (user clicked, expanded, or acted on the recommendation), dismissal (user explicitly removed or rejected the recommendation), and ignore (user saw the recommendation but did not interact with it). Each signal feeds back into the memory system to improve future recommendations.
Engagement signals should strengthen the entity connections and topic weights that led to the recommendation. Dismissal signals should weaken them and potentially create negative preferences. Ignore signals are ambiguous (the user might not have noticed, or might not have been interested) and should produce small, gradual weight adjustments rather than strong signals.
Pure relevance-based recommendations create filter bubbles: the system only shows content similar to what the user has already engaged with, narrowing the user's exposure over time. Add diversity by reserving a percentage of recommendation slots (typically 10-20%) for exploration candidates that are outside the user's established interests but adjacent enough to be potentially interesting.
Exploration candidates come from the outer edges of the user's knowledge graph: entities that are two or three hops away from their core interests, or content categories they have not explored but that are popular among users with similar profiles. The explore-exploit tradeoff from reinforcement learning applies directly here: allocate enough exploration to discover new interests without overwhelming the user with irrelevant suggestions.
Performance Considerations
Recommendation latency matters most at session start and in mid-conversation surfaces. For session start, pre-compute recommendations when the user logs in or when their profile changes significantly, and cache the results. For mid-conversation, keep the candidate pool warm and only re-rank when the conversation context changes. Adaptive Recall's cognitive scoring is designed for low-latency retrieval, but candidate generation from a large content catalog may need its own caching layer.
Build recommendations powered by memory that learns. Adaptive Recall provides the storage, cognitive scoring, and knowledge graph that your recommendation layer needs.
Start Building Free