Claude Opus 4.7: What Changed, and Should You Care?

Anthropic's new Claude Opus 4.7 retook the top of the LLM benchmarks. Here's what actually changed for coders and what it means for a UK budget.

Abstract visualisation of a neural network with glowing nodes

Updated 14 June 2026 How we review →

By Rob14 June 2026 · 7 min read

Anthropic released Claude Opus 4.7 in April 2026, which on the benchmark charts puts it back at the top of the "most powerful generally available LLM" rankings, narrowly ahead of OpenAI's and Google's competing tiers. That's worth maybe one paragraph of attention. The more useful question for someone actually using these tools is: what does this release change about my workflow, and is it worth paying for? The answers are smaller and more specific than the benchmark race suggests.

What actually changed in 4.7?

Three categories of improvement, in descending order of how much you'll notice.

Coding got measurably better. Per Anthropic's release notes, CursorBench (a coding benchmark scored on real-world tasks inside the Cursor IDE) moved from 58% on 4.6 to 70% on 4.7. Rakuten's internal SWE-Bench saw 4.7 resolve three times as many production tasks as 4.6. CodeRabbit's automated code-review benchmark logged a 10%+ recall improvement. These are the only numbers in the release that genuinely matter for daily use; the model is noticeably better at the kinds of long, multi-file tasks that an AI coding assistant gets thrown.

Vision got bigger. 4.7 accepts images up to roughly 3.75 megapixels (the long edge can be 2,576 pixels), more than three times the previous limit. That's the difference between "I can paste a screenshot" and "I can paste a full-resolution Figma export". Useful if you regularly hand the model design comps; irrelevant if you don't.

Finance-agent and reasoning got the rest. 4.7 hits state-of-the-art on Anthropic's Finance Agent eval and improves on a handful of mixed reasoning benchmarks. The practical impact for non-specialists is small; it tells you the model is broadly stronger, but it won't change a workflow that wasn't doing financial analysis to begin with.

What does the pricing work out to in real terms?

The headline is that 4.7 is no more expensive than 4.6: $5 per million input tokens, $25 per million output tokens. In sterling and at typical coding-session sizes, that translates to numbers that surprise people in both directions.

A casual chat session (a few prompts, a few hundred words of replies) costs pennies. A typical Claude Code, Anthropic's terminal-based agent coding assistant for software engineering, session that loads your CLAUDE.md, reads ten files, writes two, runs a few tests, and produces a 200-line edit will spend maybe 50,000 input tokens and 5,000 output tokens. That's about 30p per session at current exchange rates. Run ten sessions a day and you're at £3 a day, or roughly £60 a month.

The Claude Pro subscription (£15-ish a month at UK pricing) bundles a generous quota that covers far more than that. For a hobbyist, the subscription wins on cost as long as you're not running batch jobs through the API. For an API user processing documents or running production agents, the per-token cost dominates and the maths gets more interesting.

Compared to running a similar-quality open-source model locally on a beefy Mac: locally it's free per call, but you're amortising several thousand pounds of hardware and you're not getting 4.7-class output. For most people who write code professionally, the cost-per-month of Claude Pro is cheaper than an hour of their own time.

What's new in Claude Code specifically?

Three workflow features ship alongside the model, all relevant if you live in Claude Code day-to-day.

/ultrareview slash command

A dedicated bug and design-review pass: hand it a diff or a file, get a structured critique that surfaces issues the normal flow misses. Pro and Max plans get three free ultrareviews per month included; further reviews are pay-as-you-go.

Auto mode for Max users on longer tasks

Auto mode (let the agent run trusted commands without confirming each one) is now extended to Max-tier users for longer-running tasks. Less interruption for the kinds of jobs that should run to completion without a babysitter.

New xhigh effort level

Sits between high and max. Useful when high isn't quite enough and max feels like overkill (and overspend). Worth setting deliberately on complex multi-file refactors.

Should I upgrade?

If you're already on a Claude paid plan: yes, automatically. The model is the default in Claude.ai, the API, and Claude Code. There's no opt-in step.

If you're on the free tier: probably not just for this release. 4.7's gains are concentrated on long, multi-step coding tasks where the free tier's rate limits already get in the way. The model is technically better; you'll likely not have enough headroom to feel the difference. If you start hitting limits often, the £15 Pro tier becomes the obvious next step, but "because 4.7 launched" isn't by itself a reason.

If you're paying for ChatGPT Plus or Gemini Advanced instead: the comparison is genuinely close in 2026. Claude leads on coding agents and long-context work. GPT-5 is stronger on raw reasoning benchmarks and arguably better at creative writing. Gemini 3 has the longest context window in mainstream use and tighter Google Workspace integration. Pick by which workflow you're actually trying to support, not by which model topped the latest benchmark.

When is Sonnet or Haiku still right?

Anthropic ships three tiers: Opus (most capable), Sonnet (balanced), Haiku (fastest and cheapest). Opus 4.7's launch doesn't change when each is the right choice.

Sonnet 4.6 is the default for almost everything that isn't a hard agentic task. Drafting prose, summarising documents, answering questions about a codebase you've already explained, light refactors: Sonnet handles them at a fraction of the cost and latency. The quality gap to Opus is small enough on bounded tasks that most users would not notice.

Haiku 4.5 is for batch jobs and high-volume API workloads where the per-token cost matters and the task is well-defined (classification, extraction, simple summarisation). Anything where you're running thousands or millions of calls and each one is small.

Opus 4.7 earns its keep on long-running coding tasks, multi-step agent flows, and analysis that needs to keep many things in mind at once. If your typical session is "write me a quick helper function", Sonnet does it for less. If it's "refactor this auth flow across twelve files and update all the tests", Opus is the right tool.

Frequently asked questions

Q01How do I know if I'm using 4.7 already?

On Claude.ai it's the default for paid users when you select the Opus tier; the model identifier in the dropdown will say claude-opus-4-7. In Claude Code it's similarly the default Opus once you upgrade. On the API the model string claude-opus-4-7 selects it explicitly. If you've not changed anything since April 2026, you're probably already on it.

Q02Does the higher CursorBench score actually translate to better day-to-day output?

Yes, mostly on long or multi-file tasks. Short edits ("add this property to this type") feel about the same as 4.6. Long agentic sessions (refactors, migrations, multi-step debugging) finish faster and need less back-and-forth, which is where the benchmark gains come from in practice.

Q03Is there a price increase coming?

Nothing announced. 4.7 launched at 4.6's pricing ($5 input / $25 output per million tokens). Anthropic has historically kept pricing flat across point releases within a major version, so the cost picture is likely stable until a 5.x release at the earliest.

Q04What's the difference between high, xhigh, and max effort?

Effort levels control how much compute the model spends per request. high is the default for serious tasks; max is for the toughest cases. xhigh is the new midpoint: more compute than high (better answers on hard tasks) but less than max (faster, cheaper). For most complex tasks where high isn't quite right, xhigh is now the smarter choice.

Q05Can I run Opus 4.7 locally on my Mac?

No. Opus is a hosted model only; the weights are not released. If you want a local equivalent the closest options are open-weights models like Llama 4 or Qwen 3, neither of which match Opus on coding benchmarks. For local use, expect a quality gap; the trade-off is privacy and zero per-call cost.