Why Your AI Coding Tool Feels Different Week to Week

Your AI coding assistant didn't get worse overnight. Six hidden levers can change how it behaves while the underlying model stays the same.

Abstract glowing circuit board representing AI model infrastructure

Updated 14 June 2026 How we review →

By Rob14 June 2026 · 8 min read

Every few weeks the same complaint goes round on Reddit and Twitter: my AI coding assistant has gotten dumber. The replies usually split into two camps. Camp one: the vendor secretly downgraded the model. Camp two: it's all in your head. Both are usually wrong. What actually changed is one of six invisible levers behind the scenes - and the vendors mostly don't tell you when they move them.

What changes when nothing changes?

The model itself (the weights inside Claude Opus, Anthropic's flagship Claude model designed for complex reasoning tasks, 4.7, GPT-5, Gemini, whatever you're using) is shipped on a release schedule that the vendor announces. New version, new name, new docs. So far, so transparent.

What doesn't get announced is the operational stack around the model. That's the layer that decides how much thinking the model is allowed to do per request, how long it can keep your previous turns in fast cache, when it summarises (and partly forgets) older context, and what happens when the platform hits an internal incident. Move any one of those dials and the same model feels measurably different to use, even though the underlying weights are bit-for-bit identical.

Anthropic itself acknowledged this dynamic in a 2025 quality postmortem after a wave of "Claude got dumb" complaints turned out to map to infrastructure changes, not the model.

What are the six hidden levers?

Order roughly by how often each one is the actual culprit when you notice a bad day.

Effort defaults

Most hosted assistants run at one of several "effort levels" - low, medium, high. Lower effort spends less compute per request and produces faster, cheaper, lighter answers. Vendors quietly shift the default level for free tiers, busy hours, or specific tasks. The model is the same; the budget it gets to spend is not.

Adaptive thinking

The model decides for itself how much reasoning to spend per request, based on its read of the task. If the task is classified as simple, it skips the deep thinking. Misclassify a complex task as simple (easy to do at the front of a conversation, before any code is loaded) and you get a shallow answer to a deep problem.

Prompt-cache duration

Big context (your whole repo, your CLAUDE.md, your design system) is expensive to send. Vendors cache it for a few minutes so follow-up turns are cheaper. Shorten that cache window from an hour to five minutes and the same coding session burns far more quota in invisible recomputation, with no visible difference in behaviour.

Context compaction

When a conversation runs long, the system summarises and trims older turns to keep within the model's context window. Compaction discards detail. The assistant "forgets" that you defined a helper function 800 turns ago, then writes a duplicate. You blame the model; the cause is the compaction policy.

Quota policy

When you're near a limit, you ration. You stop asking follow-up questions. You skip the long brief. You truncate the conversation. The model didn't get worse; you got more cautious about feeding it. The effect is real but the cause is your own self-throttling, not the underlying capability.

Status incidents

Brief platform-level outages, degraded inference servers, regional routing changes. Most are resolved within hours, but the user experience during the incident is markedly worse. Without a status-page habit, every infrastructure blip looks like a permanent quality drop.

Why don't vendors just tell us?

Two reasons, both of them sort of understandable.

The first is operational. Some of these levers (cache duration, regional routing) are tuned constantly based on traffic patterns. There's no clean release note for "we lowered the cache TTL on the eu-west-1 cluster from 60 minutes to 5 minutes between 14:00 and 17:00 GMT due to capacity pressure". So the change happens, and your session feels different, and there's no announcement because there isn't really a release.

The second is competitive. Effort defaults and quota policy are part of how vendors monetise. Telling users "free-tier requests now run at medium effort instead of high" is just begging for a churn event. Easier to leave that one ambient.

The result is that the public version number (Claude Opus 4.7, GPT-5, etc.) describes maybe 60% of what determines your day-to-day experience. The other 40% is undocumented and changes more often.

What can I actually do about it?

Five tactics, ranked by leverage.

Check the status page first. Most vendors run one (status.anthropic.com, status.openai.com). When the tool feels off, that's the cheapest five-second check. Roughly a third of "the AI got dumber today" anecdotes resolve to an open incident the user wasn't aware of.

Notice your own rationing. If you're near a quota cap, are you actually asking the same questions you would normally ask, or are you mentally pruning to save tokens? The quietest cause of perceived regression is the user being less ambitious without noticing.

Set the effort level explicitly when you can. Some tools (Claude Code, Copilot, Cursor) expose the effort dial. If a session is genuinely complex, set it to high and accept the cost. The default exists because most sessions don't need it; that doesn't mean yours doesn't.

Keep sessions short, briefs long. Compaction kicks in only when conversation history exceeds the model's working window. Cut sessions at natural boundaries (one task per session) instead of letting one long conversation accumulate. You move the cost from "forgotten detail" to "longer briefs", which is the better trade.

Cross-check against a fresh session. If a tool feels off, paste the same prompt into a brand-new conversation, ideally with someone else's account on a different network. If the result is materially better, the problem is your session state (cache miss, compaction loss, in-context drift), not the model itself.

Should I switch tools when this happens?

Usually no. Every hosted AI assistant has the same operational stack with the same dials. Switching from Claude to GPT-5 because Claude felt off today is just moving to a different set of hidden levers, set to different defaults. The grass-is-greener effect lasts about two weeks, until the new vendor's levers shift and you start the cycle again.

The honest exception: if a vendor has had three or more bad weeks in a quarter, and the bad weeks correlate with their published quality postmortems or status-page incidents, the platform itself is unstable for now. That's a real reason to switch. But "one bad day" almost never is.

Frequently asked questions

Q01Is the model actually unchanged when this happens?

In most cases yes. Major model updates ship with a version bump (Claude Opus 4.6 to 4.7, GPT-4 to GPT-5) and the vendor announces them. Between those announced updates, the weights stay frozen. What changes is the operational stack around the weights - and that changes far more often.

Q02Why does the same prompt give different answers across sessions even on the same model?

Two reasons. First, hosted models use sampling - they don't deterministically output the single most likely next token, they sample from a distribution. So two identical prompts at temperature > 0 will produce different responses by design. Second, the operational stack (cache state, effort budget, regional routing) varies. Combined, they make session-to-session reproducibility a deliberate non-feature.

Q03Are paid tiers immune to this?

No, but the dials are usually set more favourably. Paid users typically get longer cache windows, higher default effort, more generous quotas, and priority routing during incidents. The levers still exist; they're just set in your favour rather than tuned aggressively for cost.

Q04Should I screenshot bad outputs to send to the vendor?

Yes, if the output is genuinely surprising or repeatable. Vendors do read those reports, and patterns across many users feed back into the quality postmortems and operational tuning. One-off bad answers usually don't move the needle, but a screenshot showing a clear regression on a task that previously worked is useful signal.

Q05If I rely on AI for work, how do I make my workflow resilient to this?

Three habits. Keep important context (project conventions, design rules, decisions) in a file you can re-paste, so a cache miss or compaction event doesn't lose it. Cross-check critical outputs against a second model or a manual review; never single-source a production-affecting decision. And accept that hosted AI is infrastructure now, with the same kind of week-to-week variability that any other hosted service has.