How I Stop My Claude Code Sessions From Rotting

TL;DR

You know the feeling. Claude flies for the first hour, then starts losing the thread - repeating things you settled 40 minutes ago.
You're not imagining it. Context rot is a measurable drop in answer quality as context grows. Chroma tested 18 models, every one degrades.
Hook #1 - a context monitor on the UserPromptSubmit event: counts real assistant turns and transcript size, fires a gentle nudge at two thresholds.
The hook never blocks. It injects a message via additionalContext, so the decision stays with me or the agent.
Hook #2 - /handoff: writes state to a file, restarts a fresh session in the same tmux window, and prompts that session to read the file and keep going. Clean context, no lost thread.

You know the feeling. You've been working with Claude for two hours. The first hour it flew. Now it's starting to lose the thread. It repeats things we settled 40 minutes ago. It suggests a fix we already rejected. It forgets which file the bug lives in. First thought: the model is having an off day. Second thought, the correct one: I left the session running too long and the context rotted.

This is the first post in a five-part series about my Claude Code hooks. They all circle one observation - an agent left to its own devices rots. I'm starting with the most tangible symptom: a long session that gets dumber with every turn.

The problem: you can actually measure context rot

For a long time I figured "the model gets tired" was just me projecting. I'm tired after three hours of debugging, so I assume Claude is too. Turns out I'm not making it up.

Chroma published a study called Context Rot. They tested 18 models, including Claude, GPT and Gemini, on tasks where they controlled one variable - the length of the input. The result is brutally consistent. Every model degrades as context grows. Not linearly, not predictably, but always downward. The same prompt on a short input does clearly better than on a long one, even though the task itself is identical.

There's a second effect called "lost in the middle". The model remembers the start and the end of the context best, and the middle blurs. Stanford described this in a separate paper. In practice it means the things you settled halfway through a long session are the most likely to get "forgotten".

Now map that onto Claude Code. An agent session runs for dozens of turns, stuffed with whole files, command output, stack traces, your corrections, and dead ends we never use again. After an hour of active work the session transcript can swell past a million bytes. The agent drags all of that ballast into every next answer. Most of it is junk - failed attempts, files read once, long logs. But the model doesn't know what's junk, so it treats everything equally.

The mindset shift: answer quality doesn't depend on how long you've been working. It depends on how much ballast the agent is dragging in context. Measure turns and transcript size, not the clock.

Hook #1 - a context monitor on UserPromptSubmit

If you can measure it, you can catch it automatically. I wrote a hook on the UserPromptSubmit event - the point where Claude Code runs your script right after you send a prompt, before it reaches the model. The hook receives, among other things, the path to the current session transcript.

The hook does two things. First it counts real assistant turns. And here's the catch - not every entry in the transcript is a turn. Plenty of them are tool-only (just a tool call) or thinking-only (thinking with no answer). Counting all of them would inflate the number. I filter those out, leaving only the turns where the agent actually said something. Second it measures the transcript size in bytes, the simplest proxy for "how much ballast has piled up so far".

Then two thresholds. Whichever one gets crossed first wins - size or turn count, whatever comes first:

# context monitor - two tiers, the first crossed threshold wins
SIZE_TIER1  = 500_000    # 500 KB  -> gentle nudge
SIZE_TIER2  = 1_000_000  # 1 MB    -> warning
TURNS_TIER1 = 20
TURNS_TIER2 = 35

The numbers aren't arbitrary. 20 turns or half a megabyte is roughly the point where you can feel the agent getting less precise. It still works fine, but it's a good moment to stop and think. 35 turns or a megabyte is the threshold past which quality genuinely drops, and there's no point pretending otherwise.

The whole hook logic fits in a dozen lines:

# hook on UserPromptSubmit
data = json.load(sys.stdin)
transcript = read_transcript(data["transcript_path"])

size  = transcript_size_bytes(transcript)
turns = count_real_turns(transcript)   # skips tool-only and thinking-only entries

msg = None
if size >= SIZE_TIER2 or turns >= TURNS_TIER2:
    msg = "Long sessions degrade answer quality. Use /handoff."
elif size >= SIZE_TIER1 or turns >= TURNS_TIER1:
    msg = "Good moment for /handoff if the task is shifting."

if msg:
    # don't block - just inject context for the agent
    print(json.dumps({
        "hookSpecificOutput": {
            "hookEventName": "UserPromptSubmit",
            "additionalContext": msg,
        }
    }))

That's it. No magic, no model, no API call. Read a file, count two numbers, compare to thresholds. The hook runs locally and costs a fraction of a second.

Why it nudges instead of blocking

The most important decision in this hook is what it doesn't do. It doesn't block. It doesn't kill the session by force. It doesn't force a restart.

The mechanism is deliberately soft. The hook injects a message through additionalContext, which appends one sentence to the turn's context for the agent to see. That's all. The decision whether to actually run a handoff stays with the agent, or with me.

Why? Because a hard block at 20 turns would be a nightmare in practice. Sometimes I'm halfway through a delicate refactor that has to finish in one go. Sometimes a long session is justified because the task genuinely needs all that context. A hook that threw me out right then would get disabled fast, and a disabled guardrail is worse than none, because it gives you a false sense that something is watching.

A soft signal works differently. It reminds, it doesn't manage. If the task is shifting, or I feel the agent starting to wander, a handoff is one command away. If not, I ignore the hint and keep going. Same approach as a "humble" mode in a prompt: a layer that suggests, not a layer that forbids.

A hook that blocks too often gets turned off. A hook that only reminds stays on forever. The best guardrail is one you don't want to bypass.

Hook #2 - /handoff, a reset that keeps the thread

The monitor says "time to reset". But a reset that loses what we were working on would be pointless. I'd be re-feeding the agent the context out of my own head on every restart. That's what the second hook is for: /handoff.

The idea is simple and runs in three steps:

Write state to a handoff file. The agent writes a tight summary: what we're doing, what's already done, the next step, which files are in play. It lands in a handoff file in the project directory - on disk, not in the session context.
Restart a fresh session in the same tmux window. The old, bloated session ends. A new one starts in the same window with zero context. From the terminal's point of view nothing happens. It's still "the same" work, in the same place.
The fresh session reads the handoff and continues. Right after the restart, the handoff tool sends the new session a prompt telling it to read the handoff file. So it knows exactly where we left off. The thread keeps going, the ballast stays behind.

# /handoff - the flow
# 1. collect state: goal, done so far, next step, files in play
# 2. write it to a handoff file in the project directory
# 3. restart a fresh session in the same tmux window
# 4. send that session a prompt: "read the handoff file, then continue"

The effect is that I get task continuity without context continuity. The new session knows what to do because it read a 2 KB summary, not a megabyte of history where 90% is irrelevant logs and dead ends. It's a bit like the dropbox pattern from my morning dashboard - state lives in a file on disk, and the processes that read it can restart all they like.

The result: a chain of fresh sessions

Before these hooks, my typical day with Claude Code looked like this. One session, fired up in the morning, dragged out for three hours, dumber by the end. A classic rotting session, where the last hour was clearly weaker than the first.

Now it looks different. The monitor nudges me at the threshold, I run /handoff at a natural task boundary, and I get a chain of fresh sessions, maybe 30 minutes each. Every one starts with a clean context and full precision. The thread doesn't get lost, because the handoff file carries it. And the quality of the last session in the chain is the same as the first, because none of them had time to rot.

None of this makes Claude smarter. It's plain work hygiene. The same way you don't leave one process running forever and then act surprised it ate all your RAM, you don't leave a single agent session running all day.

Series: Claude Code hooks (5 parts)

Context rot + handoff - how I stop sessions from rotting (you're here)
Guardrails - hooks that won't let the AI shoot me in the foot
Quality gates - forcing the agent to verify before it says "done"
The prompt layer - a local LLM that reads my prompt before it hits Claude
The ambient layer - the hooks you stop noticing

All five parts are now live.

Takeaway

Measure turns and transcript size instead of the clock. Three hours of light work rots slower than 40 minutes of dumping huge files into context, so the clock lies and the counter doesn't. Set a threshold, inject a soft signal, and when it lights up, reset before it rots. The handoff file carries the thread, the fresh session carries the precision.

This is part 1 of a series on Claude Code hooks. If you like this kind of home-grown automation, take a look at the dropbox pattern from my morning dashboard - another story about keeping state in files so one broken piece can't take down the whole thing.