Why Your AI Coding Tool Feels Different Week to Week
Your AI coding assistant didn't get worse overnight. Six hidden levers can change how it behaves while the underlying model stays the same.

Every few weeks the same complaint goes round on Reddit and Twitter: my AI coding assistant has gotten dumber. The replies usually split into two camps. Camp one: the vendor secretly downgraded the model. Camp two: it's all in your head. Both are usually wrong. What actually changed is one of six invisible levers behind the scenes - and the vendors mostly don't tell you when they move them.
What changes when nothing changes?
The model itself (the weights inside Claude Opus, Anthropic's flagship Claude model designed for complex reasoning tasks, 4.7, GPT-5, Gemini, whatever you're using) is shipped on a release schedule that the vendor announces. New version, new name, new docs. So far, so transparent.
What doesn't get announced is the operational stack around the model. That's the layer that decides how much thinking the model is allowed to do per request, how long it can keep your previous turns in fast cache, when it summarises (and partly forgets) older context, and what happens when the platform hits an internal incident. Move any one of those dials and the same model feels measurably different to use, even though the underlying weights are bit-for-bit identical.
Anthropic itself acknowledged this dynamic in a 2025 quality postmortem after a wave of "Claude got dumb" complaints turned out to map to infrastructure changes, not the model.
Order roughly by how often each one is the actual culprit when you notice a bad day.
Effort defaults
Most hosted assistants run at one of several "effort levels" - low, medium, high. Lower effort spends less compute per request and produces faster, cheaper, lighter answers. Vendors quietly shift the default level for free tiers, busy hours, or specific tasks. The model is the same; the budget it gets to spend is not.
Adaptive thinking
The model decides for itself how much reasoning to spend per request, based on its read of the task. If the task is classified as simple, it skips the deep thinking. Misclassify a complex task as simple (easy to do at the front of a conversation, before any code is loaded) and you get a shallow answer to a deep problem.
Prompt-cache duration
Big context (your whole repo, your CLAUDE.md, your design system) is expensive to send. Vendors cache it for a few minutes so follow-up turns are cheaper. Shorten that cache window from an hour to five minutes and the same coding session burns far more quota in invisible recomputation, with no visible difference in behaviour.
Context compaction
When a conversation runs long, the system summarises and trims older turns to keep within the model's context window. Compaction discards detail. The assistant "forgets" that you defined a helper function 800 turns ago, then writes a duplicate. You blame the model; the cause is the compaction policy.
Quota policy
When you're near a limit, you ration. You stop asking follow-up questions. You skip the long brief. You truncate the conversation. The model didn't get worse; you got more cautious about feeding it. The effect is real but the cause is your own self-throttling, not the underlying capability.
Status incidents
Brief platform-level outages, degraded inference servers, regional routing changes. Most are resolved within hours, but the user experience during the incident is markedly worse. Without a status-page habit, every infrastructure blip looks like a permanent quality drop.
Why don't vendors just tell us?
Two reasons, both of them sort of understandable.
The first is operational. Some of these levers (cache duration, regional routing) are tuned constantly based on traffic patterns. There's no clean release note for "we lowered the cache TTL on the eu-west-1 cluster from 60 minutes to 5 minutes between 14:00 and 17:00 GMT due to capacity pressure". So the change happens, and your session feels different, and there's no announcement because there isn't really a release.
The second is competitive. Effort defaults and quota policy are part of how vendors monetise. Telling users "free-tier requests now run at medium effort instead of high" is just begging for a churn event. Easier to leave that one ambient.
The result is that the public version number (Claude Opus 4.7, GPT-5, etc.) describes maybe 60% of what determines your day-to-day experience. The other 40% is undocumented and changes more often.
What can I actually do about it?
Five tactics, ranked by leverage.
Check the status page first. Most vendors run one (status.anthropic.com, status.openai.com). When the tool feels off, that's the cheapest five-second check. Roughly a third of "the AI got dumber today" anecdotes resolve to an open incident the user wasn't aware of.
Notice your own rationing. If you're near a quota cap, are you actually asking the same questions you would normally ask, or are you mentally pruning to save tokens? The quietest cause of perceived regression is the user being less ambitious without noticing.
Set the effort level explicitly when you can. Some tools (Claude Code, Copilot, Cursor) expose the effort dial. If a session is genuinely complex, set it to high and accept the cost. The default exists because most sessions don't need it; that doesn't mean yours doesn't.
Keep sessions short, briefs long. Compaction kicks in only when conversation history exceeds the model's working window. Cut sessions at natural boundaries (one task per session) instead of letting one long conversation accumulate. You move the cost from "forgotten detail" to "longer briefs", which is the better trade.
Cross-check against a fresh session. If a tool feels off, paste the same prompt into a brand-new conversation, ideally with someone else's account on a different network. If the result is materially better, the problem is your session state (cache miss, compaction loss, in-context drift), not the model itself.
Should I switch tools when this happens?
Usually no. Every hosted AI assistant has the same operational stack with the same dials. Switching from Claude to GPT-5 because Claude felt off today is just moving to a different set of hidden levers, set to different defaults. The grass-is-greener effect lasts about two weeks, until the new vendor's levers shift and you start the cycle again.
The honest exception: if a vendor has had three or more bad weeks in a quarter, and the bad weeks correlate with their published quality postmortems or status-page incidents, the platform itself is unstable for now. That's a real reason to switch. But "one bad day" almost never is.
20 Actually Useful Things to Ask ChatGPT
Free AI Tools You Should Be Using in 2026