How to Add Long-Term Memory to an AI Agent
Why Agents Need More Than Conversation History
Conversation history captures what was said between the user and the agent in a single session. It is a transcript, not a knowledge base. When an agent runs for 30 minutes investigating a production issue, the conversation history contains hundreds of tool calls, intermediate reasoning steps, dead ends, and corrections. Retrieving useful information from that history requires reading the entire thing and extracting the few relevant facts from pages of noise.
Long-term memory is selective. The agent stores conclusions, not reasoning steps. It stores facts about entities, not the process of discovering those facts. It stores outcomes of actions, not the full action sequence. This selectivity means that when the agent retrieves memories in a future session, it gets high-signal information rather than a raw log that it has to re-process.
The practical difference is stark. An agent without long-term memory that encounters a recurring issue (say, a flaky test in the CI pipeline) investigates it fresh every time, spending 5 to 10 minutes rediscovering that the test is flaky and should be retried. An agent with long-term memory retrieves its previous observation ("test X in module Y is flaky, retry once before reporting failure") and handles it in seconds.
Step-by-Step Implementation
The memory backend determines how memories are stored and retrieved. The three main options are: a dedicated memory API (like Adaptive Recall) that handles storage, retrieval, scoring, and lifecycle management; a vector database (like Pinecone or pgvector) where you manage embeddings and retrieval logic yourself; or a hybrid approach using a relational database for structured state plus a vector store for semantic search. A dedicated memory API is the fastest path to production because it handles embedding, retrieval ranking, and memory lifecycle out of the box.
# Option A: Dedicated memory API via MCP
# Agent automatically gets store/recall tools
# Option B: Vector database (you manage embeddings)
import openai
from pinecone import Pinecone
pc = Pinecone(api_key="your-key")
index = pc.Index("agent-memory")
def store_memory(text, metadata):
embedding = openai.embeddings.create(
model="text-embedding-3-small",
input=text
).data[0].embedding
index.upsert([(metadata["id"], embedding, metadata)])
def recall_memories(query, top_k=5):
embedding = openai.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
return index.query(vector=embedding, top_k=top_k,
include_metadata=True)Structure what agents store so that memories are useful for retrieval rather than just raw text dumps. Each memory should include: the observation or fact (what the agent learned), the context (what task it was performing when it learned this), the confidence level (how certain the agent is), and temporal metadata (when this was observed, whether it has an expiration). A well-structured memory schema prevents the accumulation of vague, low-value memories that dilute retrieval quality.
# Memory schema for agent observations
memory = {
"content": "Service X connection pool maxes out at 50 "
"concurrent connections during peak load. "
"Increasing to 100 resolved the 503 errors.",
"category": "infrastructure",
"confidence": 0.9,
"source": "incident-investigation-2026-05-10",
"tags": ["service-x", "connection-pool", "capacity"],
"temporal_scope": "valid as of 2026-05-10"
}Instrument the agent to store memories at three points in its execution: after significant tool calls (when it discovers a fact about the environment), at decision points (when it chooses between approaches and the choice succeeds), and at task completion (when it records the outcome and any lessons learned). Do not store every intermediate step. Be selective about what is worth remembering, just as a human investigator takes notes on key findings rather than writing down every thought.
from anthropic import Anthropic
client = Anthropic()
AGENT_PROMPT = """You are an agent with persistent memory.
After discovering important facts, use the store_memory tool.
Before starting any task, use the recall_memories tool to check
what you already know about the relevant topics.
Store memories when you:
- Discover a fact about the system or environment
- Complete a task successfully (record what worked)
- Encounter an error and find the root cause
- Learn a preference or requirement
Do NOT store:
- Intermediate reasoning steps
- Information you retrieved from memory (it is already stored)
- Speculative observations without evidence"""
def run_agent_loop(task, memory_client):
# Recall relevant context before starting
context = memory_client.recall(task, top_k=10)
messages = [{"role": "user",
"content": f"Task: {task}\n\nRelevant memories:\n{context}"}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=AGENT_PROMPT,
tools=get_tools(memory_client),
messages=messages
)
if response.stop_reason == "end_turn":
break
# Process tool calls including memory operations
messages = process_tool_calls(response, memory_client,
messages)The highest-value point to retrieve memories is before the agent makes a decision. When the agent is about to choose an approach, diagnose a problem, or take an action that has consequences, inject a recall query for relevant memories. This gives the agent the benefit of its past experience. The recall query should be specific to the decision at hand, not a generic "what do I know about this topic." A specific query like "past investigations of connection pool exhaustion in Service X" returns more useful results than "Service X."
Without hygiene, the memory store grows monotonically and retrieval quality degrades as useful memories get buried under noise. Implement three hygiene practices: deduplication (check whether a similar memory already exists before storing a new one), consolidation (periodically merge related memories into concise summaries), and importance-based eviction (remove memories that are old, never accessed, and low confidence). Adaptive Recall handles all three automatically through its memory lifecycle, but if you are building on a raw vector store, you need to implement these yourself.
def store_with_dedup(memory_client, content, metadata):
# Check for existing similar memories
existing = memory_client.recall(content, top_k=3)
for mem in existing:
if mem.similarity > 0.92:
# Very similar memory exists, update instead
memory_client.update(mem.id,
content=content,
metadata=metadata)
return mem.id
# No duplicate found, store new memory
return memory_client.store(content, metadata=metadata)What Good Agent Memory Looks Like in Practice
After a week of operation with long-term memory, a well-instrumented agent has accumulated a knowledge base that captures: the structure and dependencies of the systems it manages, common failure modes and their resolutions, user preferences and priorities, the outcomes of past decisions, and temporal patterns (which issues recur, which services degrade during peak hours, which deployments tend to cause problems). Each memory is tagged with confidence, source, and timestamp.
When the agent encounters a new task, it retrieves relevant memories and starts with institutional knowledge rather than a blank slate. It knows that Service X's connection pool maxes out at 50 connections. It knows that the last three deploys of Service Y caused brief 404 errors that resolved after 60 seconds. It knows that the user prefers to be notified about incidents via Slack rather than email. This accumulated knowledge makes the agent meaningfully more effective over time, which is exactly the behavior that turns a useful tool into an indispensable one.
Adaptive Recall provides this entire memory infrastructure through its MCP tools. The store tool accepts observations with metadata. The recall tool retrieves with cognitive scoring that accounts for recency, access frequency, entity connections, and confidence. The consolidation process runs automatically. The knowledge graph connects entities across memories. No custom embedding pipeline, no manual deduplication, no retrieval tuning required.
Give your agent memory that learns and improves. Adaptive Recall handles storage, retrieval, and lifecycle so you can focus on agent logic.
Get Started Free