Home » AI Coding Memory » How Much Context

How Much Context Can AI Coding Assistants Handle

Claude Code supports up to 1 million tokens of context, enough for approximately 750,000 words or several hundred source files simultaneously. Cursor uses models with 128K to 200K token windows depending on the selected model. GitHub Copilot's context is smaller and primarily focused on the current file and open tabs. However, effective context is always smaller than the raw window size because system prompts, conversation history, and tool outputs consume tokens. The practical recommendation is to keep active context focused rather than maxing out the window, because LLMs attend less strongly to content in the middle of very long contexts.

Raw Context Window Sizes

The context window is the total number of tokens the model can process in a single request, including the system prompt, conversation history, any files or tool results included as context, and the model's response. Different coding assistants use different underlying models with different window sizes.

Claude Code uses Anthropic's Claude models. Claude Opus and Sonnet support 200K tokens as the standard context window and up to 1 million tokens with extended context. A million tokens is roughly equivalent to 3,000 pages of text or several hundred source files, enough to hold a significant portion of a medium-sized codebase in a single context window.

Cursor supports multiple model providers (Anthropic, OpenAI, Google) and the context window depends on which model is selected. With Claude models, the window matches Claude's native limits. With GPT-4o, the window is 128K tokens. Cursor's tab-based context and codebase indexing help manage which code is included within whatever window the selected model provides.

GitHub Copilot uses a combination of models optimized for code completion. The context available for inline completions is relatively small (focused on the current file and a few surrounding files). Copilot Chat has a larger context window but still smaller than Claude Code's maximum.

Effective Context vs Raw Context

The raw context window is the theoretical maximum. The effective context, the amount of information the model actually uses well, is always smaller. Several factors reduce effective context from the raw maximum.

System prompts and instructions consume tokens before any user content is loaded. A CLAUDE.md file, MCP server descriptions, tool definitions, and the assistant's built-in system prompt can consume 5,000 to 15,000 tokens before the conversation starts. Conversation history accumulates as the session progresses. Each message from the developer and each response from the assistant consumes tokens. Tool results (file contents, search results, memory retrievals) consume tokens when they are returned.

The "lost in the middle" phenomenon further reduces effective context. Research has shown that LLMs attend more strongly to information at the beginning and end of the context window and less strongly to information in the middle. For a 200K token context filled with code, the model may effectively ignore or underweight code files that fall in the middle portion. This means that simply loading as many files as possible into the context window does not guarantee the model will use all of them effectively.

Why External Memory Beats Bigger Windows

External memory systems (like MCP memory servers) are more effective than simply using larger context windows for three reasons. First, they provide selective retrieval: instead of loading everything and hoping the model attends to the right parts, external memory retrieves only the specific knowledge relevant to the current task. Second, they persist across sessions: a larger context window still gets emptied at the end of each session, while external memory persists indefinitely. Third, they are scored: memories are ranked by relevance, recency, and confidence, so the most useful information is presented first rather than buried in the middle of a massive context.

The optimal approach combines a reasonable amount of in-context code (the files directly relevant to the current task) with external memory retrieval (the project knowledge and conventions that inform how to work on those files). This gives the model focused, high-quality context rather than a large quantity of context with variable relevance.

Use your context window for code, not for repeating project knowledge. Adaptive Recall retrieves only the memories relevant to each task, keeping your context window focused on the work.

Get Started Free