Home » AI Agent Memory » Working vs Long-Term Memory

Working Memory vs Long-Term Memory for Agents

Working memory is the active context an agent uses during a single task execution, typically the LLM context window plus any scratchpad variables. Long-term memory is the persistent store of knowledge that survives across sessions. The key design decision is what gets promoted from working memory to long-term storage when a task ends: conclusions and discoveries should persist, while intermediate reasoning and dead ends should not, because storing everything dilutes retrieval quality and wastes storage.

Working Memory in Agent Systems

Working memory for an LLM-based agent has a direct physical analog: the context window. Everything the LLM can "see" at any given moment, the system prompt, conversation history, tool results, and any injected context, is its working memory. This working memory has a hard capacity limit (the token limit of the model) and everything in it is immediately accessible without a retrieval step.

In practice, agent working memory extends slightly beyond the raw context window. Many agent frameworks maintain state variables that track the current plan, completed steps, and intermediate results outside the conversation. These variables are injected into the LLM prompt at each reasoning step, effectively expanding working memory beyond the conversation itself. LangGraph's graph state, CrewAI's task context, and AutoGen's message history all serve as extended working memory.

The limitation of working memory is capacity. A 200,000-token context window sounds large, but a complex agent task can consume it quickly. Each tool call and result adds hundreds or thousands of tokens. Intermediate reasoning, especially when the agent explores multiple hypotheses, adds more. After 20 to 30 tool calls, the working memory is half full and the LLM starts losing track of earlier findings because they have scrolled far up in the context. This is the "lost in the middle" problem applied to agent execution: the LLM attends more to recent messages and may forget important findings from early in the session.

Long-Term Memory for Agent Systems

Long-term memory is a separate store that the agent explicitly writes to and reads from. It is not part of the LLM context window; the agent has to take an action (call a tool) to store or retrieve information. This extra step is the trade-off for persistence: information in long-term memory survives indefinitely, is searchable across all sessions, and is not constrained by the context window size.

Long-term memory serves three purposes in agent systems. First, it carries knowledge across sessions. An agent that discovered the root cause of a bug yesterday can recall that knowledge today without re-investigating. Second, it offloads working memory. When the context window is filling up, the agent can store important findings in long-term memory and then summarize the conversation to free up working memory space, knowing the details are safely persisted. Third, it enables multi-agent collaboration. Multiple agents can write to and read from the same long-term memory store, sharing knowledge without direct communication.

The quality of long-term memory depends entirely on what gets stored. A memory store full of raw conversation snippets is barely better than no memory at all, because retrieval returns context-free fragments that require the original conversation to interpret. A memory store of clean, self-contained facts with metadata (timestamps, confidence, entity tags) is a powerful knowledge base that any agent can query and get immediately useful results.

The Promotion Problem

The most critical design decision in agent memory is the promotion policy: what moves from working memory to long-term storage, and when. Four approaches exist, ordered from least to most effective.

Store everything. When the session ends, dump the entire conversation history into long-term memory. This is easy to implement but produces a memory store full of noise. Intermediate reasoning, failed hypotheses, tool call formatting, and clarification exchanges all get stored alongside the actual findings. Retrieval quality degrades rapidly because useful memories are outnumbered by noise.

Store summaries. At the end of the session, use the LLM to summarize what was accomplished and store the summary. This is better than storing everything because the summary is concise and focused on outcomes. The limitation is that summaries lose specific details (exact metric values, specific command sequences, precise error messages) that may be needed in future sessions.

Store at decision points. Instrument the agent to store a memory at specific moments: after discovering a fact, after completing a task, after making a key decision. Each memory is self-contained and specific. This produces the highest-quality memory store but requires careful instrumentation of the agent loop.

Continuous selective storage. The agent stores observations throughout execution but is trained (through its system prompt) to be selective about what is worth storing. It stores facts, outcomes, and surprises. It does not store intermediate reasoning, expected results, or information it retrieved from memory (which is already stored). This combines the coverage of continuous storage with the quality of selective storage.

Adaptive Recall works best with the third and fourth approaches. The store tool is designed for agents to call during execution when they discover something worth remembering. The metadata system (confidence, tags, entity extraction) ensures that each stored memory is well-structured and retrievable. The consolidation process handles cleanup by merging redundant memories and fading low-value ones, which means even imperfect promotion policies improve over time as the lifecycle system refines the memory store.

Practical Architecture

The typical production architecture has three tiers. The bottom tier is the LLM context window: small, fast, volatile. This holds the current message, the current plan, and the most recent tool results. The middle tier is an extended scratchpad (a JSON state object or a short conversation buffer): medium-sized, persisted for the duration of the task, discarded when the task completes. This holds the full plan, all step results, and running notes. The top tier is long-term memory: large, persistent, searchable. This holds all accumulated knowledge from all sessions.

Information flows upward through promotion. A finding starts in the context window (the agent notices it in a tool result), gets recorded in the scratchpad (as part of the step results), and gets promoted to long-term memory (when the agent recognizes it as a durable fact worth remembering). Information flows downward through retrieval. Before starting a new task, the agent queries long-term memory for relevant context, which is loaded into the scratchpad and injected into the context window.

Build agents with three-tier memory. Adaptive Recall provides the long-term storage tier with cognitive scoring, entity awareness, and automatic lifecycle management.

Try It Free