Hooks That Won't Let the AI Shoot Me in the Foot

TL;DR

A bash tool is a pair of scissors. The agent will run rm -rf in the wrong place, or pipe a script off the internet into your shell, if a plausible task points that way. It has no skin in the game.
PreToolUse fires after the agent picks a tool and before the tool runs. My script sees the command while it's still just a string.
The guardrails are deny-by-pattern: block the handful of shapes I never want unattended, wave everything else through. Default is allow.
The veto is exit 2. Whatever I print to stderr goes back to the agent as the reason, so it reroutes instead of getting stuck.
Pattern-matching catches the known-bad shapes; real isolation needs a sandbox. Every rule here is a scar from something that already broke once.

Giving an agent a bash tool is handing it a pair of scissors and saying "go, run". Most of the time it runs fine. Then one day it doesn't, because the model has no idea that rm -rf in the wrong directory ends your afternoon, or that piping a script straight off the internet into your shell is how you get owned. It'll do either one without blinking if a plausible task leads there. It isn't reckless. It just has nothing at stake and no memory of the last time something like this bit you.

This is part 2 of my series on Claude Code hooks. Part 1 was about context rot, the slow kind of damage where a session quietly degrades over an hour. This one is the fast kind: a single command that wrecks something in a second. The layer that catches it is PreToolUse.

What PreToolUse actually is

PreToolUse runs in the gap between the agent deciding to call a tool and the tool actually running. For a bash command, that gap is where my script gets to read the exact command while it's still just text. Claude Code passes it the tool name and the tool input on stdin:

{
  "tool_name": "Bash",
  "tool_input": { "command": "rm -rf /tmp/build" }
}

One job: decide whether this runs. The bluntest answer is the exit code. exit 2 kills the call, and whatever the hook writes to stderr is handed back to the agent as the reason it was blocked. That last part matters more than it sounds, and I'll come back to it.

# hook on PreToolUse
data = json.load(sys.stdin)
cmd  = data.get("tool_input", {}).get("command", "")

if RISKY_PATTERN.search(cmd):
    print("Blocked: use the safe wrapper instead", file=sys.stderr)
    sys.exit(2)   # exit 2 = veto; stderr is fed back to the agent

sys.exit(0)       # no objection; normal permission flow continues

One command through a deny-hook: a match is vetoed with exit 2 and the reason is fed back so the agent reroutes; everything else just runs.

There's a structured form too, for when you want more than a yes or no. Print a permissionDecision of deny, allow, or ask, with a reason attached:

print(json.dumps({
    "hookSpecificOutput": {
        "hookEventName": "PreToolUse",
        "permissionDecision": "deny",
        "permissionDecisionReason": "Use the password manager, not a plaintext key.",
    }
}))

For a plain block, exit 2 and one honest stderr line is the whole hook. I only reach for the JSON form when I want ask, so a borderline command bounces to me for a yes or no instead of getting a flat refusal.

Why deny-by-pattern

Every guard I run is a denylist, never an allowlist. Trying to enumerate every safe command is a game you lose. The set is effectively infinite, it changes daily, and you spend your life unblocking yourself. So I do the inverse. I write down the few shapes I never want running unattended, match against those, and let everything else through untouched. The default is yes; the hook only ever opens its mouth to say no.

What makes this work is an asymmetry in the cost of being wrong. When the hook blocks something it shouldn't have, the price is one retry and a mildly confused agent. When it misses something it should have caught, the price is a deleted directory, a leaked key, or a server I have to go hunting for. Those two mistakes are not the same size. A short denylist that errs in the cheap direction beats a long allowlist that errs in the expensive one, every time.

An allowlist of safe commands is infinite and you fight it daily. A denylist of dangerous shapes is short, specific, and grows only when something bites you. Be wrong in the direction that costs a retry, not the one that costs an afternoon.

Guard 1: secrets in plaintext

The first one watches for secrets heading into plaintext. The moment a command looks like it's about to bake an API key, a password, or a token straight into a file, the hook stops it and tells the agent to stash the value in a password manager and read it back at runtime instead.

This guard is boring, which is exactly the point. A key pasted into a tracked file is harmless right up until it gets committed, and then it lives in git history forever. You can rotate the key. You can't un-leak it. Blocking the write costs nothing. Scrubbing a secret out of history after the fact costs a bad afternoon and a quiet jolt of panic when you realise how long it was sitting there.

Guard 2: filesystem discipline

The next guard has nothing to do with danger, just mess. Agents scatter temp files everywhere: a throwaway script here, a screenshot there, a downloaded PDF dropped in whatever directory they happened to be standing in. Give that a week and you can't tell the junk from anything that matters.

So a hook quietly catches those temp writes and points them at one per-project scratch directory. The agent thinks it wrote to a temp path; it actually landed somewhere I can wipe in a single command. Small thing. It's the difference between tidying up and doing archaeology.

Guard 3: the risky shapes (every one is a scar)

The third group is the interesting one, because I didn't design it. Every rule in it is a scar, a command shape that once did something I never asked for:

Raw SSH straight to a box, when there's a safe wrapper that should be used instead, so a fat-fingered hostname can't fire a command at the wrong server.
Processes that open a listening port and never get cleaned up, quietly stacking into a little graveyard of orphaned servers humming in the background.
Writes that look like they're about to clobber something that matters, where the path or the redirect just smells wrong.

Each one started as "huh, that shouldn't have happened" and ended as a line in a deny-hook so it can't happen the same way twice. The list isn't finished and never will be. It covers the mistakes I've made so far, which is a smaller and more honest claim than calling it done.

Where this stops working

Worth being straight about the limits. Matching patterns on a command string only goes so far. An agent that genuinely wanted to get around a regex could reword its way past it, and a hook can't catch what it never sees. If I needed real isolation I'd run the whole thing in a container or a throwaway VM and stop trusting the host at all.

But that was never the threat I had. The thing I'm defending against is a helpful agent, moving fast, with no memory of my past mistakes. For that, a cheap layer that catches the known-bad shapes covers most of the risk for almost none of the effort. And it buys me something I didn't expect when I started: the rails are what let me hand over more rope. I'll leave a session grinding on a long task unattended precisely because I know the few things that would actually hurt are already blocked. The guardrails don't slow the agent down. They're what make it safe to let it go faster.

Copy the pattern, not my list

Don't copy my rules. Your disasters won't be mine. Copy the shape:

Read the JSON on stdin, pull out tool_input.command.
Match it against the one risky pattern this hook cares about.
On a hit, exit 2 with a stderr message that says what to do instead, not just "no".
Otherwise exit 0 and get out of the way.

Two things decide whether you keep a hook or rip it out in a week. The stderr message has to reroute the agent, not just stop it. "Blocked, use the safe wrapper instead" gets read and acted on; a bare "denied" just makes the agent try the identical thing again and burn a turn. And keep each hook small and single-purpose, one concern per file. A mega-hook stuffed with every pattern you've ever written is its own hazard, because the day it has a bug it fails closed for everything at once.

Series: Claude Code hooks (5 parts)

Context rot + handoff - how I stop sessions from rotting
Guardrails - hooks that won't let the AI shoot me in the foot (you're here)
Quality gates - forcing the agent to verify before it says "done"
The prompt layer - a local LLM that reads my prompt before it hits Claude
The ambient layer - the hooks you stop noticing

All five parts are now live.

Takeaway

None of this came from a threat model. It accreted, one scar at a time, each small disaster turned into a rule so it can't land twice. That's the real reason it lives in a hook instead of my head: the hook remembers across every session, so I only have to learn each lesson once. Hand the agent the scissors. Just put a guard on the blade first.

This is part 2 of a series on Claude Code hooks. Part 1 covered context rot and the handoff pattern. Part 3 is next: the gates that won't let the agent say "done" until it has actually verified the work.