Context Budget

The context window is the model’s active working memory during a task. It is consumed by prompts, file contents, tool schemas, tool results, and prior conversation history. Managing it is both a cost control practice and a reasoning quality practice.

Context rot

Context rot is quality decline caused by accumulated stale history and noisy outputs. Long sessions increase reasoning noise. Too many connected tools create hidden overhead. Vague deliverables cause rework that compounds context waste.

Symptoms: the model starts contradicting earlier work, loses track of the task scope, or repeats actions it already completed.

Budget management practices

  • Scope the task before starting — name the deliverable, inputs, scope, review boundary, and definition of done
  • Use the smallest folder — connect only the files the task actually requires
  • Connect only relevant tools — each connected tool adds schema tokens on every turn
  • Plan before acting on complex tasks — plan mode is a checkpoint where Claude explores and proposes before consuming execution budget
  • Compact or clear between tasks — when work changes, compact or clear context; do not let stale history become hidden policy
  • Use prompt caching — for stable, repeatedly-used context (system prompts, large reference files), prompt caching reduces repeated cost

Cost implications

In long-running agent workflows (planning, tool calling, sub-agent delegation, validation), token counts grow quickly. A model that finishes tasks in fewer turns at lower cost can matter more than a model that wins isolated single-turn benchmarks. This is the “cost-to-completion” lens NVIDIA uses to position NVIDIANemotron for enterprise agents.

  • AgentLoop — the loop that consumes context on every turn
  • BoundedAgent — bounding scope is the primary context control
  • ClaudeCowork — Cowork task design practices for context management
  • ClaudeSDKAndCowork — practical synthesis of context management in Cowork workflows