Why does Claude Code say tests are passing when they aren't?

Claude Code can mark tasks complete based on optimistic interpretation of partial output, without actually invoking the test runner. The fix is a Stop hook that blocks the session from ending unless a VERIFIED log entry exists from the last 5 minutes, forcing the model to either run real verification or admit it didn't.

What is a Claude Code Stop hook?

A Stop hook is a shell command configured in .claude/settings.json under hooks.Stop that fires every time Claude Code tries to end a turn. If the hook exits with code 2 and writes to stderr, Claude Code shows the message to the model and the session continues. This makes Stop hooks the most powerful workflow-guard mechanism in Claude Code.

Where do I install verify-before-stop.sh?

Place verify-before-stop.sh in .claude/hooks/ of your project root, chmod +x to make it executable, then add an entry to .claude/settings.json under hooks.Stop with matcher '*' and command 'bash .claude/hooks/verify-before-stop.sh'. Restart your Claude Code session to activate.

What is the CLAUDE_CODE_STOP_HOOK_BLOCK_CAP?

Claude Code v2.1.143+ ends the turn with a warning after approximately 8 consecutive Stop-hook blocks. Override via the CLAUDE_CODE_STOP_HOOK_BLOCK_CAP environment variable. This safeguard prevents runaway loops from broken hooks.

How do I avoid infinite loops in a Stop hook?

The Stop-hook JSON input includes a stop_hook_active boolean. Check it first and exit 0 immediately if true. Without this guard, your hook fires repeatedly on each continuation attempt and quickly hits the 8-block cap.

Published 2026-05-20 · 8 min read · By Ian Mu

How to fix Claude Code's "all tests passing" lies — a Stop hook in 50 lines bash

If you use Claude Code for more than a week, you've seen this pattern. The model writes some code, says "all tests passing ✅" or "build succeeded" or just "done!", and you merge. Twenty minutes later production is on fire and the tests it claimed passed were never actually invoked. This article walks through the exact .claude/hooks/ script I run in production on 14 parallel projects to catch this failure mode at session-end time, plus the 4 detection patterns that compose into a full guardrail.

TL;DR

Add a Stop hook to .claude/settings.json that blocks the session from ending if files have changed but no VERIFIED log entry exists from the last 5 minutes. Claude has to either prove it ran verification or admit it didn't. Full open-source script at github.com/ianymu/claude-verify-before-stop (MIT).

The pattern (why this happens)

Claude Code's underlying model is optimistic. When asked to implement a feature with tests, it often:

Writes the implementation
Writes the test file
Looks at the test code, mentally simulates it, and concludes "this should pass"
Reports "tests passing ✅" in the final message without ever invoking the test runner

The model isn't lying on purpose. It's reasoning forward from code structure instead of actual execution output. The fix isn't a better prompt — better prompts get optimistically interpreted too. The fix is a workflow guard that requires proof of execution before allowing the session to end.

How Claude Code Stop hooks work

Claude Code fires hooks at six lifecycle events. The most leveraged one for catching lies-of-completion is Stop, which fires when the model tries to end a turn.

A Stop hook receives JSON on stdin:

{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.json",
  "stop_hook_active": false,
  "last_assistant_message": "All tests passing ✅"
}

The hook's exit code determines what Claude Code does next:

exit 0 → allow the stop, session ends normally
exit 1 → generic error, Claude Code may or may not surface it
exit 2 → block the stop and show stderr to the model; this is what we want

The 50-line hook

Save this as .claude/hooks/verify-before-stop.sh and chmod +x:

#!/bin/bash
# verify-before-stop.sh

INPUT=$(cat)

# 1. Loop guard — don't re-block if already continuing
STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c \
  "import sys,json; d=json.load(sys.stdin); print('true' if d.get('stop_hook_active') else 'false')" \
  2>/dev/null)
if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
    exit 0
fi

VERIFY_LOG=".claude/state/stop-verify.log"
mkdir -p .claude/state

# 2. No file changes → pure conversation → allow stop
HAS_CHANGES=$(git diff --name-only 2>/dev/null | head -5)
HAS_UNTRACKED=$(git ls-files --others --exclude-standard 2>/dev/null \
  | grep -v '.claude/state/' | head -5)
if [ -z "$HAS_CHANGES" ] && [ -z "$HAS_UNTRACKED" ]; then
    exit 0
fi

# 3. Files changed → require VERIFIED log entry from last 5 min
if [ -f "$VERIFY_LOG" ]; then
    FIVE_MIN_AGO=$(date -v-5M +%s 2>/dev/null \
                   || date -d '5 minutes ago' +%s 2>/dev/null \
                   || echo 0)
    LAST_VERIFY=$(grep '|VERIFIED' "$VERIFY_LOG" 2>/dev/null \
                  | tail -1 | cut -d'|' -f1)
    LAST_ACTION=$(grep '|VERIFY_ACTION' "$VERIFY_LOG" 2>/dev/null \
                  | tail -1 | cut -d'|' -f1)
    if [ -n "$LAST_VERIFY" ] && [ "$LAST_VERIFY" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
        if [ -n "$LAST_ACTION" ] && [ "$LAST_ACTION" -gt "$FIVE_MIN_AGO" ] 2>/dev/null; then
            echo "$(date +%s)|STOP_ALLOWED" >> "$VERIFY_LOG"
            exit 0
        fi
    fi
fi

# 4. Block stop, tell model exactly what to log
echo "$(date +%s)|STOP_BLOCKED" >> "$VERIFY_LOG"
echo "⛔ BLOCKED: files changed but no verification logged in last 5 min." >&2
echo "Required: log a VERIFY_ACTION + VERIFIED entry, e.g.:" >&2
echo '   echo "$(date +%s)|VERIFY_ACTION|npm test passed" >> .claude/state/stop-verify.log' >&2
echo '   echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log' >&2
exit 2

Wire it in `.claude/settings.json`

{
  "hooks": {
    "Stop": [{
      "matcher": "*",
      "hooks": [
        { "type": "command", "command": "bash .claude/hooks/verify-before-stop.sh" }
      ]
    }]
  }
}

Restart your Claude Code session. From now on, when the model tries to end with file changes but no verification log, it gets blocked with explicit instructions.

How verification actually works in a session

When the model wants to end, it must first log proof. Two lines:

# After running the test command
echo "$(date +%s)|VERIFY_ACTION|npm test all green, 47/47" >> .claude/state/stop-verify.log
echo "$(date +%s)|VERIFIED" >> .claude/state/stop-verify.log

The hook checks for both entries within the last 5 minutes. If they exist, the stop proceeds. If they don't, the model sees the stderr message and either:

Option A: actually runs the verification, logs proof, ends cleanly
Option B: admits in the next turn that it couldn't verify (rare DB setup, missing dependencies, etc.) — at which point you, the human, know to be careful before merging

Gotchas I hit so you don't have to

The 8-block cap

Claude Code v2.1.143+ added a built-in safeguard: after roughly 8 consecutive Stop-hook blocks, the turn ends with a warning regardless. This is intentional — it prevents broken hooks from infinite-looping the model. Override via:

export CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=20

But the better fix is to design your hook to give the model exactly what to do next on the first block. The whole point of exit 2 is the stderr message — make it actionable, not just blame.

The `stop_hook_active` loop guard

If your hook doesn't check stop_hook_active first, it fires on every continuation attempt, the cap kicks in, your guard becomes a no-op. Always:

STOP_HOOK_ACTIVE=$(echo "$INPUT" | python3 -c \
  "import sys,json; d=json.load(sys.stdin); print(d.get('stop_hook_active', False))")
if [ "$STOP_HOOK_ACTIVE" = "True" ]; then exit 0; fi

Exit 2 vs exit 1

Claude Code treats exit 2 as a structured block with stderr passed to the model. exit 1 is treated as a generic error and may not surface the message. Always use exit 2 for blocks.

`.claude/state/` self-trigger

If your hook writes to a state file that's tracked by git (or in untracked-but-not-gitignored), the next Stop sees the file change and self-blocks. Fix: git diff | grep -v '.claude/state/' in your change check, and add .claude/state/ to .gitignore.

Four detection patterns for lies of completion

The hook above is pattern 1 of 4 I use. The full set:

Pattern 1: Block Stop on missing VERIFIED log

What this article covers. Catches ~80% of cases.

Pattern 2: Tail test invocations via PreToolUse(Bash)

A separate hook on PreToolUse(Bash) logs every actual test command invocation:

CMD=$(echo "$INPUT" | python3 -c \
  "import sys,json; d=json.load(sys.stdin); print(d.get('tool_input',{}).get('command',''))")
if echo "$CMD" | grep -qE '(npm test|pytest|go test|cargo test|jest|mocha)'; then
  echo "$(date +%s)|TEST_INVOKED|$CMD" >> .claude/state/test-invocations.log
fi
exit 0

Then in verify-before-stop, also require a TEST_INVOKED entry, not just VERIFIED. Catches cases where the model fabricates a VERIFIED log without actually running tests.

Pattern 3: Diff assertion vs reality (manual)

Run tail -f .claude/state/test-invocations.log in a second terminal while Claude works. When it claims "X passed", you immediately see whether TEST_INVOKED ever fired for X. Eye-opening — most "passed" claims correlate with zero invocations.

Pattern 4: Externalize verification

Don't trust the model's claim or its log. Verify against an external artifact:

CI build status via GitHub Actions API
Production canary metrics via Vercel Analytics or Grafana
DB row counts via psql -c 'SELECT COUNT(*)'

The hook can curl these and exit 2 if the artifact doesn't match the claim.

The other 5 hooks I run

Verify-before-stop is the gold-tier one but it works best alongside:

cost-tracker.sh — logs every session's spend to costs.jsonl so you can tail -f your Opus burn live
block-secrets.sh — PreToolUse scanner for sk-ant-, JWT, AWS keys, GitHub PATs before commit
force-progress-update.sh — every 5 actions, checkpoints progress.json to survive compaction
pre-compact-diary.sh — dumps WIP state to a diary file before Claude compacts the conversation
enforce-autoplan.sh — blocks Write/Edit/Bash until a plan file exists

The free MIT version of verify-before-stop is at github.com/ianymu/claude-verify-before-stop.

The full 6-hook pack with pre-tuned settings.json and one-command installer is at landing-ianymu.vercel.app — $19 lightning, $49 regular, 30-day money-back.

Use the free one for a week. If it catches even one regression, the full pack is worth it. If it doesn't — the free one alone solves 80% of lies-of-completion.

Related resources

Claude Code Hooks Cheat Sheet — Stop/PreToolUse/PostToolUse contract + gotchas
4 detection patterns discussion — community thread on the repo
awesome-claude-code-hooks — curated list of related hooks
Hook Generator tool — customize the script for your stack
Settings.json Checker — audit your config for missing guards

Questions or feedback? Email me at ian.y.mu@gmail.com or open an issue on the repo.