Context Windows & Tokens

AI models have a memory limit. It’s called the context window. And it’s measured in tokens. A token is roughly 4 characters or 3/4 of a word. So if a model has a 100k context window, it can hold about 100,000 tokens in its memory at once.

The agent can only see what fits in that window. So if you’ve got a massive codebase with hundreds of files, it can’t just load all of them at once. It has to be smart about what it looks at. It’ll read the files it needs, do some work, maybe forget some stuff to make room for other stuff.

Or if the tool has a big system prompt, and you’ve given it a massive agents.md file — that all takes up space.

The more you cram in there, the worse it gets at focusing on what matters. Some agents handle this with compaction — that’s when the agent summarises what’s happened so far and keeps only the important bits, freeing up space in the context window.

When working with agents, the goal is to give them the right context. It’s almost the entire job really. That’s why your agents.md should be focused. That’s why skills are separate files that only get loaded when needed. You’re managing the agent’s attention.

And if the agent seems to forget something you told it earlier in a long conversation, that’s why. It fell out of the context window. You might need to remind it or start a fresh session.