Published 2026-05-20 · 8 min read · By Ian Mu
How to fix Claude Code's "all tests passing" lies — a Stop hook in 50 lines bash
If you use Claude Code for more than a week, you've seen this pattern. The model writes some code, says "all tests passing ✅" or "build succeeded" or just "done!", and you merge. Twenty minutes later production is on fire and the tests it claimed passed were never actually invoked. This article walks through the exact .claude/hooks/ script I run in production on 14 parallel projects to catch this failure mode at session-end time, plus the 4 detection patterns that compose into a full guardrail.
TL;DR
Add a Stop hook to .claude/settings.json that blocks the session from ending if files have changed but no VERIFIED log entry exists from the last 5 minutes. Claude has to either prove it ran verification or admit it didn't. Full open-source script at github.com/ianymu/claude-verify-before-stop (MIT).
The pattern (why this happens)
Claude Code's underlying model is optimistic. When asked to implement a feature with tests, it often:
- Writes the implementation
- Writes the test file
- Looks at the test code, mentally simulates it, and concludes "this should pass"
- Reports
"tests passing ✅"in the final message without ever invoking the test runner
The model isn't lying on purpose. It's reasoning forward from code structure instead of actual execution output. The fix isn't a better prompt — better prompts get optimistically interpreted too. The fix is a workflow guard that requires proof of execution before allowing the session to end.
How Claude Code Stop hooks work
Claude Code fires hooks at six lifecycle events. The most leveraged one for catching lies-of-completion is Stop, which fires when the model tries to end a turn.
A Stop hook receives JSON on stdin:
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.json",
"stop_hook_active": false,
"last_assistant_message": "All tests passing ✅"
}
The hook's exit code determines what Claude Code does next:
exit 0→ allow the stop, session ends normallyexit 1→ generic error, Claude Code may or may not surface itexit 2→ block the stop and show stderr to the model; this is what we want
The 50-line hook
Save this as .claude/hooks/verify-before-stop.sh and chmod +x:
#!/bin/bash
# verify-before-stop.sh
INPUT=$(cat)
# 1. Loop guard — don't re-block if already continuing
STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('true' if d.get('stop_hook_active') else 'false')" \
2>/dev/null)
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
exit 0
fi
VERIFY_LOG=".claude/state/stop-verify.log"
mkdir -p .claude/state
# 2. No file changes → pure conversation → allow stop
HAS_CHANGES=$(git diff --name-only 2>/dev/null | head -5)
HAS_UNTRACKED=$(git ls-files --others --exclude-standard 2>/dev/null \
| grep -v '.claude/state/' | head -5)
if [ -z "$HAS_CHANGES" ] && [ -z "$HAS_UNTRACKED" ]; then
exit 0
fi
# 3. Files changed → require VERIFIED log entry from last 5 min
if [ -f "$VERIFY_LOG" ]; then
FIVE_MIN_AGO=$(date -v-5M +%s 2>/dev/null \
|| date -d '5 minutes ago' +%s 2>/dev/null \
|| echo 0)
LAST_VERIFY=$(grep '|VERIFIED' "$VERIFY_LOG" 2>/dev/null \
| tail -1 | cut -d'|' -f1)
LAST_ACTION=$(grep '|VERIFY_ACTION' "$VERIFY_LOG" 2>/dev/null \
| tail -1 | cut -d'|' -f1)
if [ -n "$LAST_VERIFY" ] && [ "$LAST_VERIFY" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
if [ -n "$LAST_ACTION" ] && [ "$LAST_ACTION" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
echo "$(date +%s)|STOP_ALLOWED" >> "$VERIFY_LOG"
exit 0
fi
fi
fi
# 4. Block stop, tell model exactly what to log
echo "$(date +%s)|STOP_BLOCKED" >> "$VERIFY_LOG"
echo "⛔ BLOCKED: files changed but no verification logged in last 5 min." >&2
echo "Required: log a VERIFY_ACTION + VERIFIED entry, e.g.:" >&2
echo ' echo "$(date +%s)|VERIFY_ACTION|npm test passed" >> .claude/state/stop-verify.log' >&2
echo ' echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log' >&2
exit 2
Wire it in .claude/settings.json
{
"hooks": {
"Stop": [{
"matcher": "*",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/verify-before-stop.sh" }
]
}]
}
}
Restart your Claude Code session. From now on, when the model tries to end with file changes but no verification log, it gets blocked with explicit instructions.
How verification actually works in a session
When the model wants to end, it must first log proof. Two lines:
# After running the test command
echo "$(date +%s)|VERIFY_ACTION|npm test all green, 47/47" >> .claude/state/stop-verify.log
echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log
The hook checks for both entries within the last 5 minutes. If they exist, the stop proceeds. If they don't, the model sees the stderr message and either:
- Option A: actually runs the verification, logs proof, ends cleanly
- Option B: admits in the next turn that it couldn't verify (rare DB setup, missing dependencies, etc.) — at which point you, the human, know to be careful before merging
Gotchas I hit so you don't have to
The 8-block cap
Claude Code v2.1.143+ added a built-in safeguard: after roughly 8 consecutive Stop-hook blocks, the turn ends with a warning regardless. This is intentional — it prevents broken hooks from infinite-looping the model. Override via:
export CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=20
But the better fix is to design your hook to give the model exactly what to do next on the first block. The whole point of exit 2 is the stderr message — make it actionable, not just blame.
The stop_hook_active loop guard
If your hook doesn't check stop_hook_active first, it fires on every continuation attempt, the cap kicks in, your guard becomes a no-op. Always:
STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print(d.get('stop_hook_active', False))")
if [ "$STOP_HOOK_ACTIVE" = "True" ]; then exit 0; fi
Exit 2 vs exit 1
Claude Code treats exit 2 as a structured block with stderr passed to the model. exit 1 is treated as a generic error and may not surface the message. Always use exit 2 for blocks.
.claude/state/ self-trigger
If your hook writes to a state file that's tracked by git (or in untracked-but-not-gitignored), the next Stop sees the file change and self-blocks. Fix: git diff | grep -v '.claude/state/' in your change check, and add .claude/state/ to .gitignore.
Four detection patterns for lies of completion
The hook above is pattern 1 of 4 I use. The full set:
Pattern 1: Block Stop on missing VERIFIED log
What this article covers. Catches ~80% of cases.
Pattern 2: Tail test invocations via PreToolUse(Bash)
A separate hook on PreToolUse(Bash) logs every actual test command invocation:
CMD=$(echo "$INPUT" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print(d.get('tool_input',{}).get('command',''))")
if echo "$CMD" | grep -qE '(npm test|pytest|go test|cargo test|jest|mocha)'; then
echo "$(date +%s)|TEST_INVOKED|$CMD" >> .claude/state/test-invocations.log
fi
exit 0
Then in verify-before-stop, also require a TEST_INVOKED entry, not just VERIFIED. Catches cases where the model fabricates a VERIFIED log without actually running tests.
Pattern 3: Diff assertion vs reality (manual)
Run tail -f .claude/state/test-invocations.log in a second terminal while Claude works. When it claims "X passed", you immediately see whether TEST_INVOKED ever fired for X. Eye-opening — most "passed" claims correlate with zero invocations.
Pattern 4: Externalize verification
Don't trust the model's claim or its log. Verify against an external artifact:
- CI build status via GitHub Actions API
- Production canary metrics via Vercel Analytics or Grafana
- DB row counts via
psql -c 'SELECT COUNT(*)'
The hook can curl these and exit 2 if the artifact doesn't match the claim.
The other 5 hooks I run
Verify-before-stop is the gold-tier one but it works best alongside:
- cost-tracker.sh — logs every session's spend to
costs.jsonlso you cantail -fyour Opus burn live - block-secrets.sh — PreToolUse scanner for
sk-ant-, JWT, AWS keys, GitHub PATs before commit - force-progress-update.sh — every 5 actions, checkpoints progress.json to survive compaction
- pre-compact-diary.sh — dumps WIP state to a diary file before Claude compacts the conversation
- enforce-autoplan.sh — blocks Write/Edit/Bash until a plan file exists
The free MIT version of verify-before-stop is at github.com/ianymu/claude-verify-before-stop.
The full 6-hook pack with pre-tuned settings.json and one-command installer is at landing-ianymu.vercel.app — $19 lightning, $49 regular, 30-day money-back.
Use the free one for a week. If it catches even one regression, the full pack is worth it. If it doesn't — the free one alone solves 80% of lies-of-completion.
Related resources
- Claude Code Hooks Cheat Sheet — Stop/PreToolUse/PostToolUse contract + gotchas
- 4 detection patterns discussion — community thread on the repo
- awesome-claude-code-hooks — curated list of related hooks
- Hook Generator tool — customize the script for your stack
- Settings.json Checker — audit your config for missing guards
Questions or feedback? Email me at ian.y.mu@gmail.com or open an issue on the repo.