Context Windows and Memory

How Context Works

A context window is the text the model can process when generating a response. It includes system prompts, conversation history, and any documents you’ve included: everything the model can “see” for this particular response.

Models are stateless. Each response is generated fresh from the current context. Previous conversations aren’t remembered unless explicitly included.

Bigger Windows, Different Tradeoffs

Context windows have grown substantially, from a few thousand tokens to 100K+ in some models. This sounds like pure improvement, but there are considerations.

Longer contexts are slower and more expensive to process. More importantly, research shows models don’t use long contexts uniformly. Information in the middle tends to get less attention than content at the beginning and end.

Practical Implications

Be explicit about what’s needed. Don’t assume the model remembers anything from outside the current context.

Position matters. Put the most important information at the beginning of your context.

More isn’t always better. Selective, relevant context often outperforms comprehensive context.

The Takeaway: Context is working memory, not long-term memory. Design systems that explicitly manage what the model sees for each response.