Home » AI Memory » Store and Retrieve Across Sessions

How to Store and Retrieve Memories Across Sessions

Cross-session memory requires saving context at the end of each conversation and loading relevant context at the start of the next one. The key challenges are deciding what to save, how to scope retrieval so only relevant memories are loaded, and how to handle memories that become stale or outdated over time.

Before You Start

You need a memory storage backend (vector database, key-value store, or managed memory service) and an application that has clear session boundaries. If your application uses continuous streaming without clear session breaks, you can define artificial boundaries based on time gaps, topic changes, or explicit user actions. The storage backend needs to support both writing new entries and querying by similarity or metadata.

Step-by-Step Implementation

Step 1: Define session boundaries.
A session is a contiguous interaction that has a beginning and an end. In a chat application, a session might be a single conversation thread. In a coding assistant, a session might be one project editing session. In a customer support bot, a session is one support ticket interaction. Defining these boundaries determines when memory gets saved and when it gets loaded.

For web applications, session boundaries are typically tied to login/logout, page close, or inactivity timeouts. For API-based applications, the boundary is between separate API call sequences. For MCP-connected tools like Claude Code, each project session has natural start and end points. Store the session ID alongside each memory so you can trace where a memory came from.

Step 2: Extract memories at session end.
When a session ends, process the conversation to identify what should persist. Not everything in a conversation is worth storing. The extraction should capture facts, decisions, preferences, and outcomes while filtering out greetings, transient questions, and context that was only relevant within the session.
def on_session_end(session_id, conversation, user_id): extraction_prompt = """Review this conversation and extract lasting information. Focus on what would be useful in a FUTURE conversation: - Facts about the user, their project, or their situation - Preferences stated or demonstrated through choices - Decisions made and their rationale - Outcomes of actions taken - Problems encountered and how they were resolved Exclude: greetings, transient questions, information only relevant to this specific interaction. Return each memory on its own line.""" response = llm_call(extraction_prompt, conversation) memories = parse_lines(response) for memory_text in memories: store_memory( user_id=user_id, content=memory_text, metadata={ "session_id": session_id, "source": "session_extraction", "timestamp": time.time() } )
Step 3: Tag memories with session context.
Each memory needs metadata that helps the retrieval system scope results appropriately. Beyond user_id and timestamp, tag memories with the topic or domain of the conversation, the type of information (fact, preference, decision, outcome), and the session in which they were created.
def store_tagged_memory(user_id, content, session_id, topic=None): # Classify the memory type automatically classification = llm_call( "Classify this memory as one of: fact, preference, " "decision, outcome, observation. Reply with one word.", content ).strip().lower() store_memory( user_id=user_id, content=content, metadata={ "session_id": session_id, "memory_type": classification, "topic": topic or "general", "timestamp": time.time(), "access_count": 0 } )
Step 4: Load context at session start.
When a new session begins, retrieve memories that are likely to be relevant. The challenge is that you do not yet know what the user will ask about. Use a combination of strategies: load the most recent memories (likely relevant by recency), load the most frequently accessed memories (likely important), and load any memories tagged with the current topic or project.
def on_session_start(user_id, project_id=None): # Strategy 1: Recent memories (last 7 days) recent = search_memories( user_id=user_id, filters={"created_after": days_ago(7)}, limit=5, sort_by="timestamp_desc" ) # Strategy 2: Most accessed memories (core knowledge) frequent = search_memories( user_id=user_id, filters={"min_access_count": 3}, limit=5, sort_by="access_count_desc" ) # Strategy 3: Project-specific memories project = [] if project_id: project = search_memories( user_id=user_id, filters={"topic": project_id}, limit=5 ) # Deduplicate and merge all_memories = deduplicate(recent + frequent + project) return all_memories[:10]
Step 5: Handle memory freshness.
Memories can become outdated. A user's technology stack changes, decisions get reversed, projects end. Without freshness management, stale memories pollute the context and cause the model to reference outdated information. Update access timestamps when memories are retrieved, and flag memories that have not been accessed in a configurable time window.
def mark_accessed(memory_ids): for mid in memory_ids: update_memory(mid, { "last_accessed": time.time(), "access_count": increment(1) }) def flag_stale_memories(user_id, stale_days=90): stale = search_memories( user_id=user_id, filters={ "last_accessed_before": days_ago(stale_days) }, limit=100 ) for memory in stale: update_memory(memory["id"], {"status": "stale"}) return len(stale)
Step 6: Build the continuity message.
Format the loaded memories into a context block that the model can use naturally. Group memories by type (facts, preferences, recent events) so the model can prioritize different kinds of context. Include timestamps so the model can weigh recent information more heavily.
def build_context_block(memories): if not memories: return "" groups = {"fact": [], "preference": [], "decision": [], "outcome": [], "observation": []} for m in memories: mtype = m["metadata"].get("memory_type", "observation") groups.get(mtype, groups["observation"]).append(m) lines = ["\n--- Context from previous sessions ---"] for group_name, items in groups.items(): if items: lines.append(f"\n{group_name.title()}s:") for item in items: age = format_age(item["metadata"]["timestamp"]) lines.append(f" - ({age}) {item['content']}") return "\n".join(lines)

Adaptive Retrieval

Once the session gets underway and you know what the user is discussing, run additional retrieval queries scoped to the conversation topic. This dynamic retrieval supplements the initial context load with memories that are specifically relevant to the current discussion. Trigger additional retrieval when the user introduces a new topic, asks about something not covered by the initial context, or references a previous interaction explicitly.

Adaptive Recall handles this automatically through its cognitive retrieval system. Each recall query uses ACT-R scoring to find memories that are not just semantically similar but also contextually relevant through entity graph connections, recently active, and well corroborated. The system learns which memories are useful over time by tracking access patterns and adjusting activation scores accordingly.

Build cross-session memory that improves with every conversation. Adaptive Recall manages storage, retrieval, freshness, and lifecycle automatically.

Get Started Free