Context Budget

The context budget is the finite working memory available to a model during a task. It is consumed by the system prompt, user messages, tool schemas, tool call results, and prior conversation turns. When the budget is exhausted, the model can no longer access earlier work — and if it is poorly managed, quality degrades long before it runs out.

Why it’s both a cost and a quality concern

Token management is reasoning quality management. A context window filled with stale conversation history, redundant tool outputs, and verbose system instructions becomes noisy — the model must attend to an increasingly diluted signal. This is context rot: the reasoning quality decline caused by accumulated irrelevant history.

Context rot is silent. The model continues producing responses, but their relevance and accuracy drift as the useful signal-to-noise ratio falls.

Consumed by

  • System prompt and persistent instructions
  • File contents loaded into context
  • Tool schemas (can be large if many tools are active)
  • Tool call results returned to the model
  • Prior conversation turns and intermediate outputs

Management practices

Before the task:

  • Write lean system prompts — avoid encyclopedic instructions
  • Activate only tools needed for this workflow
  • Limit files to the smallest folder containing the relevant work

During the task:

  • Use plan mode for complex or risky work before running actions
  • Insert checkpoints to verify direction before continuing
  • Watch for signs of context rot (unexpected direction changes, repetition)

After a task segment:

  • Use /clear or equivalent to reset context when starting different work
  • Convert repeatable instructions into reusable skills rather than repeating them each session
  • Store session IDs and results externally — do not rely on context as the only record

Prompt caching

When stable context (system prompts, large documents) is reused across multiple calls, prompt caching can reduce cost by avoiding re-encoding the same tokens. This is an optimization for workflows with a large, unchanging context prefix.

  • AgentLoop — each loop iteration consumes from the context budget
  • ClaudeCowork — the Cowork product where context budget management is a primary operating discipline
  • ClaudeAgentSDK — SDK sessions preserve context across loop iterations