Stop Hitting Limits

10 Things to Stop Rate Limits

I stopped hitting Claude's usage limits. Here are 10 things I changed. Claude doesn't count messages - it counts tokens.

Most people blame Claude for strict limits. The real issue? Token management. All you need to do is use tokens wisely.

When Claude doesn't get your thoughts right, you might feel tempted to send:

"No, I meant [your message]"
"Ugh, that's not what I wanted [your message]"

Don't do that!

Every subsequent message is added to the conversation history. Claude re-reads ALL of it every turn - burning tokens on context that didn't even help.

Token cost per message = all previous messages + your new one.
Total = S × N(N+1) / 2 (S = avg tokens per exchange, N = message count)

At ~500 tokens per exchange:
• 5 messages: 7.5K tokens
• 10 messages: 27.5K tokens
• 20 messages: 105K tokens
• 30 messages: 232K tokens

Message 30 costs 31x more than message 1

Instead: click Edit on your original message → fix it → regenerate. The old exchange gets replaced, not stacked.

Fix the prompt, don't feed the history.
Token costs grow with every message.

Ideally, you should start a new chat every 15-20 messages.

Now imagine a chat with 100+ messages. At ~500 tokens per exchange, that's over 2.5 million tokens burned - most of it just re-reading old history.

One developer tracked his usage and found that 98.5% of tokens were spent on re-reading the history. Only 1.5% went toward actually outputting the result.

When a chat gets long → ask Claude to summarize everything → copy it → new chat → paste as first message.
Many people believe that splitting questions into separate messages leads to better results. Almost always, the opposite is true.

Three separate prompts = three context loads.
One prompt with three tasks = one context load.

You save tokens twice: fewer context reloads, and you stay further from hitting your limit.

Instead of:
"Summarize this article"
"Now list the main points"
"Now suggest a headline"

Write: "Summarize this article, list the main points, and suggest a headline."

Bonus: the answers often turn out better because Claude immediately sees the full picture.

Three questions. One Prompt. Always!
If you upload the same PDF to multiple chats, Claude re-tokenizes that document every single time.

Use the Projects feature instead.

Upload your file once → it gets cached. Every new conversation inside that project references it without burning tokens again.

Cached project content doesn't eat into your usage when you access it repeatedly.

If you work with contracts, briefs, style guides, or any long docs - this alone could cut your token spend dramatically.
Every new chat without saved context wastes 3-5 messages on setup: "I'm a marketer, I write in a casual style, I prefer short paragraphs…"

You've probably seen people start every prompt with "Act as a..." - that's tokens burned on repeat. Claude can remember this permanently.

Go to "Settings" → "Memory and User Settings." Save your role, communication style, and settings once. Claude will automatically apply them to every new chat.
Web search, connectors, and "Explore" mode - all of these add tokens to every response, even if you don't need them.

Writing your own content? Turn off the "Search and Tools" feature.

The "Advanced Thinking" feature also consumes tokens. Keep it turned off by default. Only turn it on if your first attempt was unsatisfactory.

Rule: if you didn't turn this feature on intentionally, turn it off.
Grammar checking, brainstorming, formatting, quick translations, short answers - Haiku handles all of this at a much lower cost than Sonnet or Opus.

Choosing the right model is the most important decision you make every day.

Haiku for drafts and simple tasks → frees up 50-70% of your budget for tasks that truly require powerful models.

Mental model:
• Haiku → quick tasks, low cost
• Sonnet → real work, medium cost
• Opus → deep thinking, high cost

You don't need powerful models for simple tasks!
The Claude system uses a rolling 5-hour window. It does not reset at midnight - your limit gradually decreases. Messages sent at 9 a.m. will no longer count by 2 p.m.

If you use up your entire limit during a single morning session, most of your daily limit will remain unused.

Divide your day into 2-3 sessions: morning, afternoon, and evening. By the time you return, your previous usage is no longer counted, and you have a new limit.
Starting March 26, 2026: Anthropic will now use up your 5-hour session limit more quickly during peak hours:

5:00 AM to 11:00 AM Pacific Time / 8:00 AM to 2:00 PM Eastern Time on weekdays.

Same query, same chat - but during peak hours, it impacts your limit more.

Your weekly limit remains the same. But how it's distributed has changed. Running resource-intensive tasks in the evening or on weekends will significantly stretch your plan.

If you're outside the U.S. (in Europe, Latin America, or Asia), peak hours may actually fall during your afternoon, so check the calculation based on time zones.
Subscribers to the Pro, Max 5x, and Max 20x plans can enable the "Overage" feature in the 'Settings' → "Usage" section.

When your session limit is reached, Claude won't block your access. It will switch to pay-as-you-go billing at API rates.

You set a monthly spending limit to avoid unexpected bills.

This isn't about saving tokens, it's about not losing your work at the worst possible moment.

Conclusion

At first, it will be very difficult to follow all the rules, but once you can apply them automatically, you'll almost never hit your limits.

You might even switch from the max plan to a regular one - you'll have plenty of tokens!

Claude doesn't count messages. It counts tokens.