Skip to content
AI Models & Tooling4 min read

Claude Sonnet 5: Near-Opus Performance, But a Tokenizer That Quietly Taxes the Savings

Anthropic's new Sonnet closes most of the gap to Opus 4.8 at a lower headline price, but a new tokenizer and the loss of sampling controls change the math for anyone running it in production agents.

By TRAGenX Desk

Share

Anthropic shipped Claude Sonnet 5 on June 30, calling it the most agentic Sonnet yet — able to plan, reach for tools like browsers and terminals, and work autonomously at a level that, until recently, only the pricier Opus tier could manage. The headline claim: performance close to Opus 4.8, at lower prices. For anyone building agentic dev tooling or trading-adjacent automation on Claude, the interesting details are in the fine print, not the announcement post — which is exactly where developer Simon Willison went looking.

What actually changed

Sonnet 5 keeps a 1 million token context window with a 128,000-token output ceiling, matching the rest of the current Claude line. It's now the default model on Free and Pro plans, and available to Max, Team, and Enterprise users. Base pricing lands at $3 per million input tokens and $15 per million output tokens — identical to Sonnet 4.6 — with an introductory discount to $2/$10 through August 31, which Anthropic says is meant to make the transition roughly cost-neutral for existing workloads.

The catch: a hungrier tokenizer

That cost-neutral framing assumes token counts stay flat — they don't. Sonnet 5 ships with a new tokenizer, and in Willison's own testing it produces noticeably more tokens for the same text: roughly 1.4x for English, 1.33x for Spanish, and 1.28x for Python code, with Simplified Mandarin essentially unchanged. For builders running high-volume agent loops — code review bots, research pipelines, anything chewing through English-language prose or Python — that's a real offset against the lower per-token price, and worth re-benchmarking before assuming a straightforward savings on a Sonnet 4.6 to Sonnet 5 migration.

No more sampling knobs

A quieter but more consequential change for engineering teams: Sonnet 5 no longer accepts temperature, top_p, or top_k, and ships with adaptive thinking on by default. If your pipeline tunes sampling for reproducibility — backtests, eval harnesses, anything where you want the model's variance bounded and predictable — that lever is gone. Adaptive thinking should help raw task performance, but it also means less direct control over how the model arrives at an answer, which matters more in regulated or audit-sensitive workflows than in a chat UI.

A tiered safety story worth noting

Sonnet 5's system card is also where Anthropic explains how the model shipped without the deployment friction larger releases sometimes face: it states that "Sonnet 5 is significantly less capable at cyber tasks than Mythos 5," so its safeguards mirror those applied to Opus 4.7 and Opus 4.8 — models more capable than Sonnet 5, but well short of Mythos 5. It's a reminder that Anthropic is now running capability-tiered safety controls across its lineup, not a single bar for every model — relevant context for anyone evaluating which Claude tier is appropriate for an autonomous, tool-using deployment rather than a sandboxed chat assistant.

What it means for builders

If you're running agentic coding tools, research pipelines, or LLM-in-the-loop systems on Claude, the practical move is to benchmark your actual workload — token count and latency, not just the headline price — before assuming Sonnet 5 is a clean cost win over Sonnet 4.6. The capability jump is real; the savings depend on what you're feeding it.

FAQ

Frequently asked questions

Is Claude Sonnet 5 cheaper than Opus 4.8?
Yes on a per-token basis — Sonnet 5 is priced at $3/$15 per million input/output tokens (discounted to $2/$10 through August 31, 2026), well below Opus 4.8's rates, while Anthropic says its performance is close to Opus 4.8's.
Will Sonnet 5 actually cost less to run than Sonnet 4.6?
Not necessarily. The base price per token is unchanged from Sonnet 4.6, and a new tokenizer generates more tokens for the same input — about 1.4x for English and 1.28x for Python in independent testing — so effective cost depends heavily on your language and content mix.
Can I still control Claude Sonnet 5's output with temperature or top_p?
No. Sonnet 5 drops support for temperature, top_p, and top_k sampling parameters and runs with adaptive thinking enabled by default, removing a control teams previously used for reproducible or tightly-tuned generation.

Sources

Share

Read next