How to Add Memory to an Existing Chatbot
Before You Start
You need access to modify your chatbot's context assembly code (the part that builds the prompt or message array sent to the LLM). If you are using a platform like Botpress or Dialogflow that does not expose context assembly directly, check whether the platform supports custom middleware or pre-processing hooks that can inject additional context. You also need a user identification mechanism already in place, because memory is only useful when associated with a specific user. If your chatbot serves anonymous users without any form of identification, you will need to add authentication or persistent cookies before memory will provide value.
Review your current architecture and identify three integration points: where the model's context is assembled (this is where recall goes), where conversation completion events fire (this is where extraction goes), and where user messages are processed (this is where correction detection goes). In most architectures, these are clearly defined functions or API handler stages.
Step-by-Step Implementation
Before adding memory, document exactly how your chatbot currently assembles context. Draw the data flow: user message comes in, system prompt is loaded, conversation history is appended, RAG retrieval happens (if applicable), tool definitions are added, and the combined context is sent to the LLM. Note the token budget: how much of the context window is used by the system prompt, how much by history, how much by retrieved documents, and how much remains available for new content like recalled memories. If your context is already near capacity, you will need to reduce something else (shorten the system prompt, limit conversation history, reduce RAG chunks) to make room for memory context. A good target is 500 to 1,500 tokens reserved for recalled memories, which is enough for 5 to 15 discrete facts about the user.
Connect your chatbot backend to a memory service. If you are using Adaptive Recall, this means adding the MCP server configuration or the REST API client to your backend. If you are using a self-hosted solution, set up the vector database connection and embedding generation pipeline. The connection should be initialized once at application startup and shared across request handlers. Test the connection independently before integrating it into the chatbot flow: store a test memory, recall it, update it, and delete it to verify all operations work. Wrap all memory operations in try-catch blocks with graceful fallback to the chatbot's original behavior (no memory), because a memory service outage should degrade the chatbot to its pre-memory behavior, not break it entirely.
# Wrapper with graceful fallback
class MemoryService:
def __init__(self, api_key, server_url):
self.client = AdaptiveRecallClient(api_key, server_url)
self.available = True
async def recall(self, query, user_id, limit=10):
if not self.available:
return []
try:
return await self.client.recall(
query=query,
metadata={"user_id": user_id},
limit=limit
)
except Exception:
self.available = False
schedule_health_check(self, delay=60)
return []
async def store(self, content, user_id):
if not self.available:
queue_for_later(content, user_id)
return
try:
await self.client.store(
content=content,
metadata={"user_id": user_id}
)
except Exception:
queue_for_later(content, user_id)Find the function or code block where your chatbot builds the LLM's input and add a memory recall step. Use the user's current message as the recall query, filtered to their user ID. Insert the recalled memories into the context between the system prompt and the conversation history, formatted as a structured section the model can reference. Do not mix recalled memories into the conversation history because the model may confuse them with things the user said in the current conversation. Label the section clearly: "Known context about this user from previous interactions" or similar. Test with a few users first by feature-flagging the memory recall so you can compare responses with and without memory.
# Before (existing code):
async def build_context(system_prompt, user_message, history, user_id):
messages = [{"role": "system", "content": system_prompt}]
messages.extend(history[-10:])
messages.append({"role": "user", "content": user_message})
return messages
# After (with memory recall added):
async def build_context(system_prompt, user_message, history, user_id):
memories = await memory_service.recall(user_message, user_id, limit=10)
memory_block = ""
if memories:
memory_block = "\n\n## Known context about this user\n"
for mem in memories:
memory_block += f"- {mem['content']}\n"
memory_block += ("\nUse this context naturally in your response. "
"Do not list these facts back to the user.\n")
messages = [{"role": "system", "content": system_prompt + memory_block}]
messages.extend(history[-10:])
messages.append({"role": "user", "content": user_message})
return messagesAfter a conversation ends (or at periodic intervals during long conversations), run an extraction pass that identifies facts worth storing in long-term memory. Hook this into your existing conversation completion event (session timeout, user closes chat, explicit "goodbye"). Pass the conversation transcript to an extraction function that uses an LLM call to identify discrete, valuable facts and stores each one individually. Filter out conversational filler, greetings, and information the system already knows. Check for duplicates and contradictions against existing memories before storing new ones. In the early days of the retrofit, log extracted memories for manual review before storing them automatically, so you can calibrate the extraction quality.
If you have historical conversation logs, you have a goldmine of user knowledge that can be extracted and stored to give memory a head start. Run the extraction pipeline against historical conversations, processing them chronologically per user so that later conversations can contradict and update earlier ones (a user who said "I use React" six months ago and said "we migrated to Vue" last month should have the Vue memory, not React). Process in batches, starting with your most active users (the ones who will benefit most from memory immediately). Set realistic expectations: historical extraction from raw chat logs produces noisier results than real-time extraction from structured conversations, so review a sample of backfilled memories before processing the entire history.
Run the memory-enhanced chatbot alongside the original (A/B test if possible) and measure: resolution rate (does memory help the chatbot resolve more queries without escalation), user satisfaction (do users rate memory-enhanced interactions higher), repetition rate (does the chatbot ask fewer redundant questions), and conversation length (shorter conversations often indicate that the chatbot found the right context faster). Common issues in the first iteration: the chatbot surfacing irrelevant memories (tune recall similarity thresholds), outdated memories confusing the model (add recency weighting), or the model over-relying on recalled context and ignoring the user's actual current question (adjust the memory section prompt to instruct contextual use rather than regurgitation). Plan for two to three iteration cycles before the memory integration is production-ready.
Framework-Specific Integration
LangChain applications can add memory by inserting a custom retriever into the chain that queries the memory store alongside (or instead of) the document retriever. Replace LangChain's built-in ConversationBufferMemory or ConversationSummaryMemory with a custom memory class that wraps your persistent store. The built-in memory classes are in-process and do not persist across sessions, which is why they are insufficient for production applications. Your custom class should implement the load_memory_variables and save_context methods to integrate with LangChain's chain execution lifecycle.
Direct API integrations (no framework) are the easiest to retrofit because you have full control over context assembly. Add the recall call before your API request and the extraction call after, using the patterns shown in steps 3 and 4. The entire integration is typically under 50 lines of code, with most of the complexity in tuning extraction prompts and recall thresholds rather than in the plumbing.
Platform-based chatbots (Botpress, Dialogflow, Voiceflow) vary in how extensible their context assembly is. Most support webhooks or middleware that can inject custom context before the model processes a message. Check your platform's documentation for "custom context," "pre-processing hooks," or "middleware" capabilities. If the platform does not expose context assembly at all, you may be able to add memory through a system prompt that instructs the model to call a memory recall tool before responding.
The fastest retrofit: connect Adaptive Recall as an MCP server and your chatbot gains persistent memory in minutes. No rewrite, no migration, just add the connection and start remembering.
Get Started Free