Claude Sonnet 5: Near-Opus Performance, But a Tokenizer That Quietly Taxes the Savings
Anthropic's new Sonnet closes most of the gap to Opus 4.8 at a lower headline price, but a new tokenizer and the loss of sampling controls change the math for anyone running it in production agents.
By TRAGenX Desk
Anthropic shipped Claude Sonnet 5 on June 30, calling it the most agentic Sonnet yet — able to plan, reach for tools like browsers and terminals, and work autonomously at a level that, until recently, only the pricier Opus tier could manage. The headline claim: performance close to Opus 4.8, at lower prices. For anyone building agentic dev tooling or trading-adjacent automation on Claude, the interesting details are in the fine print, not the announcement post — which is exactly where developer Simon Willison went looking.
What actually changed
Sonnet 5 keeps a 1 million token context window with a 128,000-token output ceiling, matching the rest of the current Claude line. It's now the default model on Free and Pro plans, and available to Max, Team, and Enterprise users. Base pricing lands at $3 per million input tokens and $15 per million output tokens — identical to Sonnet 4.6 — with an introductory discount to $2/$10 through August 31, which Anthropic says is meant to make the transition roughly cost-neutral for existing workloads.
The catch: a hungrier tokenizer
That cost-neutral framing assumes token counts stay flat — they don't. Sonnet 5 ships with a new tokenizer, and in Willison's own testing it produces noticeably more tokens for the same text: roughly 1.4x for English, 1.33x for Spanish, and 1.28x for Python code, with Simplified Mandarin essentially unchanged. For builders running high-volume agent loops — code review bots, research pipelines, anything chewing through English-language prose or Python — that's a real offset against the lower per-token price, and worth re-benchmarking before assuming a straightforward savings on a Sonnet 4.6 to Sonnet 5 migration.
No more sampling knobs
A quieter but more consequential change for engineering teams: Sonnet 5 no longer accepts temperature, top_p, or top_k, and ships with adaptive thinking on by default. If your pipeline tunes sampling for reproducibility — backtests, eval harnesses, anything where you want the model's variance bounded and predictable — that lever is gone. Adaptive thinking should help raw task performance, but it also means less direct control over how the model arrives at an answer, which matters more in regulated or audit-sensitive workflows than in a chat UI.
A tiered safety story worth noting
Sonnet 5's system card is also where Anthropic explains how the model shipped without the deployment friction larger releases sometimes face: it states that "Sonnet 5 is significantly less capable at cyber tasks than Mythos 5," so its safeguards mirror those applied to Opus 4.7 and Opus 4.8 — models more capable than Sonnet 5, but well short of Mythos 5. It's a reminder that Anthropic is now running capability-tiered safety controls across its lineup, not a single bar for every model — relevant context for anyone evaluating which Claude tier is appropriate for an autonomous, tool-using deployment rather than a sandboxed chat assistant.
What it means for builders
If you're running agentic coding tools, research pipelines, or LLM-in-the-loop systems on Claude, the practical move is to benchmark your actual workload — token count and latency, not just the headline price — before assuming Sonnet 5 is a clean cost win over Sonnet 4.6. The capability jump is real; the savings depend on what you're feeding it.
FAQ
Frequently asked questions
- Is Claude Sonnet 5 cheaper than Opus 4.8?
- Yes on a per-token basis — Sonnet 5 is priced at $3/$15 per million input/output tokens (discounted to $2/$10 through August 31, 2026), well below Opus 4.8's rates, while Anthropic says its performance is close to Opus 4.8's.
- Will Sonnet 5 actually cost less to run than Sonnet 4.6?
- Not necessarily. The base price per token is unchanged from Sonnet 4.6, and a new tokenizer generates more tokens for the same input — about 1.4x for English and 1.28x for Python in independent testing — so effective cost depends heavily on your language and content mix.
- Can I still control Claude Sonnet 5's output with temperature or top_p?
- No. Sonnet 5 drops support for temperature, top_p, and top_k sampling parameters and runs with adaptive thinking enabled by default, removing a control teams previously used for reproducible or tightly-tuned generation.
Sources
- What's new in Claude Sonnet 5 — Simon Willison
- Introducing Claude Sonnet 5 — Anthropic
- Anthropic launches Claude Sonnet 5 as a cheaper way to run agents — TechCrunch