Claude Code Optimization

From $750/month to 12% of quota in 3 days

Same workflow. Same Claude. Just the right settings, cache habits, and a few small swaps that most people never make.

Root Cause 1

Cache Misses

If your prompt cache breaks mid-session, you pay full price every turn instead of 0.1×.

What breaks the cache

×Adding or removing a tool in the middle of a session
×Switching models in the middle of a session

What to do

✓Decide your tools and model BEFORE you start the session
✓Never change them mid-session
✓A healthy cache hit rate is around 90%+

Root Cause 2

Context Bloat

The longer your session runs, the more Claude has to remember — and that costs tokens. Opus 4.7 defaults to a 1M context window which is overkill.

Settings to add

CLAUDE_CODE_DISABLE_1M_CONTEXT = 1
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE = 80

5 habits to keep context clean

/compactat 50% full or after every task — don't wait

/clearbetween unrelated tasks — fresh session, fresh cache

/rewindif a turn went wrong — cheaper than working around bad context

Subagentsuse for bulk work — keeps your main session lean

@filenametag files directly so Claude doesn't search for them

Subagent model guide

Haiku

boring mechanical work

Sonnet

research, code exploration

Opus

planning, complex decisions only

Root Cause 3

Wrong Model or Effort Level

Most people use Opus-level reasoning on tasks that only need Haiku. Default reasoning uses ~2× more tokens than medium.

Effort levels to set per prompt

/effort lowquick fixes, simple tasks

/effort mediummost everyday prompts (big savings)

/effort highcomplex reasoning

/effort xhighdefault for agentic coding

/effort maxalmost never worth it

Model routing strategy

→Start a Sonnet session when work is simple — cheaper
→Start an Opus session when planning is involved — delegate the rest to Sonnet/Haiku
→If you keep hitting limits, route through OpenRouter — same Claude Code interface, ~1/12th the cost

Root Cause 4

Wrong Input Format

Some file types cost way more tokens than necessary by default.

Don't use

Screenshots for web pages

→

Use instead

agent-browser

~90% fewer tokens

Don't use

Claude's built-in PDF reader

→

Use instead

pdftotext

Avoids loading PDFs as images

Don't use

Re-reading whole repo each task

→

Use instead

code-review-graph

6.8× fewer tokens on reviews, up to 49× on daily tasks

Root Cause 5

Not Watching Your Usage

You can't fix what you can't see. Three tools to track it.

phuryn/claude-usageHistorical view, breakdown by day/week/session (Pro/Max/Team)

Gronsten/claude-usage-monitorReal-time, shows your current 5-hour window live

platform.claude.com/usage/cacheAnthropic's cache dashboard — API users only

The One-Line Summary

Lock your tools and model before you start. Compact early and often. Use cheaper models for simple tasks. Use the right input format. Track your usage. That's it.