Published 2026-05-20 · 7 min read · By Ian Mu
Why does Claude Code burn through Opus credits so fast? — and how to cut 40% with one hook
My Claude Code spend was averaging $30-100/day on Opus across 14 parallel projects. Most of it was invisible until the invoice arrived. Then I built a cost-tracker.sh Stop hook that surfaces token spend in real time to .claude/state/costs.jsonl. Combined with workflow patterns it caught, my Opus spend dropped roughly 40% in the first month. This article walks through why Claude Code is expensive, the three patterns that drive most overspend, and the exact hook that surfaces them.
TL;DR
Add a Stop hook that writes {session_id, model, input_tokens, output_tokens, est_cost_usd} to .claude/state/costs.jsonl on every turn. Then tail -f .claude/state/costs.jsonl | jq . in a side terminal while you work. You'll see exactly when a session crosses your budget threshold and you can intervene before it crosses again.
Why is Claude Code so expensive?
Three reasons compound:
1. Opus pricing. Claude Opus 4.7 is priced at $15/million input tokens and $75/million output tokens. A 30,000-token system prompt + project context + transcript read once costs $0.45 in input alone. Multiply by the dozens of turns in a session.
2. Agentic loops re-read everything. Unlike a single-shot chat completion, an agentic coding session reads files, calls tools, gets results, reads more files, decides what to do next. Each tool round-trip can grow context by 5-30% (the assistant message + tool result get appended to the next turn's input).
3. Compaction loses context, model re-reads. When the conversation grows beyond model context limits, Claude Code summarizes (compacts) and starts fresh. But the model now needs to re-read files it already read. Long sessions can double or triple effective token cost via compaction-driven re-reads.
The three patterns that drive most overspend
Pattern 1: Compaction-driven re-reads
Symptom: a 4-hour session uses 3x the tokens of a 1-hour session that did the same work split into 4 sittings. Cause: each compaction event forces re-reads.
Mitigation: explicit checkpoint hook (force-progress-update.sh) that writes a progress.json file every 5 actions. After compaction, the model can read the small progress file instead of re-reading source code.
Pattern 2: Deep exploration when grep would suffice
Symptom: model reads 20 files when it only needed to find 1 line in 1 file. Cause: it's exploring vs. searching.
Mitigation: a custom system prompt or PreToolUse(Read) hook that suggests grep-first. We didn't fully solve this with a hook — the cleanest fix is a CLAUDE.md instruction: "Before reading any file, grep for the relevant symbol/string first."
Pattern 3: Opus for routine edits
Symptom: model uses Opus for mechanical work (rename a variable, add a typehint, fix a typo). Cost: $0.20 per turn vs $0.04 on Sonnet.
Mitigation: switch model mid-session. Use /model sonnet when leaving planning phase. Or have the cost-tracker emit a warning when an Opus turn cost > some threshold for low-complexity work.
The cost-tracker.sh hook
This is a Stop hook (fires when Claude finishes a turn). It reads the transcript file Claude Code passes via stdin, extracts the model + token counts, computes estimated cost, and appends to a JSONL log.
#!/bin/bash
# cost-tracker.sh
# Logs every Claude Code session's spend to .claude/state/costs.jsonl
INPUT=$(cat)
COSTS_FILE=".claude/state/costs.jsonl"
mkdir -p .claude/state
python3 - "$INPUT" <<'PY'
import json, sys, os
from datetime import datetime, timezone, timedelta
CST = timezone(timedelta(hours=-4))
now = datetime.now(CST).isoformat()
raw = sys.argv[1].strip() if len(sys.argv) > 1 else ''
if not raw: sys.exit(0)
try:
data = json.loads(raw)
except: sys.exit(0)
session_id = data.get('session_id', 'unknown')
transcript = data.get('transcript_path', '')
# Estimate cost based on Anthropic published pricing
PRICING = {
'opus': {'in': 15.00, 'out': 75.00}, # per 1M tokens
'sonnet': {'in': 3.00, 'out': 15.00},
'haiku': {'in': 0.80, 'out': 4.00},
}
# Walk the transcript file (if accessible) to sum tokens
input_tokens = output_tokens = 0
model_guess = 'opus'
try:
with open(transcript) as f:
for line in f:
try:
msg = json.loads(line)
u = msg.get('usage', {})
input_tokens += u.get('input_tokens', 0)
output_tokens += u.get('output_tokens', 0)
m = msg.get('model', '')
if 'opus' in m.lower(): model_guess = 'opus'
elif 'sonnet' in m.lower(): model_guess = 'sonnet'
elif 'haiku' in m.lower(): model_guess = 'haiku'
except: continue
except: pass
price = PRICING.get(model_guess, PRICING['opus'])
est_cost = (input_tokens * price['in'] + output_tokens * price['out']) / 1_000_000
entry = {
'ts': now,
'session_id': session_id,
'model': model_guess,
'input_tokens': input_tokens,
'output_tokens': output_tokens,
'est_cost_usd': round(est_cost, 4),
}
with open('.claude/state/costs.jsonl', 'a') as f:
f.write(json.dumps(entry, ensure_ascii=False) + '\n')
PY
exit 0
Wire into .claude/settings.json:
{
"hooks": {
"Stop": [{
"matcher": "*",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/cost-tracker.sh" }
]
}]
}
}
How to watch your burn in real time
In a second terminal:
tail -f .claude/state/costs.jsonl | jq .
Output:
{
"ts": "2026-05-20T15:43:21-04:00",
"session_id": "abc123",
"model": "opus",
"input_tokens": 142000,
"output_tokens": 8200,
"est_cost_usd": 2.745
}
You'll see costs scroll past as Claude works. When a turn crosses your threshold (e.g., >$1 for a single turn), you intervene.
Weekly budget reconciliation
Quick one-liner to see today's spend by session:
jq -s 'group_by(.session_id)[] | {session: .[0].session_id, total: map(.est_cost_usd) | add}' \
.claude/state/costs.jsonl
Or this week's total:
jq -s 'map(.est_cost_usd) | add' .claude/state/costs.jsonl
Most weeks I'd see one or two sessions accounting for >40% of total spend. That's where the workflow intervention focused — adjusting the prompts that started those runaway sessions.
Other hooks that compound the savings
cost-tracker.sh tells you what's happening. These others prevent the patterns it surfaces:
- verify-before-stop.sh — eliminates retry sessions from lies-of-completion (these are pure waste tokens)
- force-progress-update.sh — checkpoint every 5 actions, survives compaction without re-reads
- pre-compact-diary.sh — preserves WIP state before compaction so the next session starts informed
- enforce-autoplan.sh — blocks code-write until a plan exists, eliminating "let me try this and see" loops
- block-secrets.sh — saves you a separate-incident cost (a leaked API key can cost more than a month of Opus)
The free verify-before-stop.sh is on GitHub (MIT). The full 6-hook pack with install script is at landing-ianymu.vercel.app ($19 lightning / $49 regular, 30-day money-back).
Related
- How to fix Claude Code's "all tests passing" lies — the verify-before-stop article
- Claude Code Hooks Cheat Sheet
- awesome-claude-code-hooks — curated list of hooks
Questions? Email ian.y.mu@gmail.com. Open issues / PRs welcome.