Building AI Agents That Survive Production
Most AI agents shipping today are held together with sellotape. Four architectural bets separate the production-ready ones from the prototypes.

The interesting failure mode of AI agents in 2026 isn't that the models are wrong. It's that the surrounding infrastructure is wrong, and the model gets blamed. Sessions time out mid-task, context windows fragment, credentials leak across boundaries, and the agent comes back looking dumber than it is. Web developer Addy Osmani published a piece in April 2026 outlining the four architectural bets a serious agent stack has to make. Reading it is a useful exercise even if you're not running production agents yet, because the bets describe what's about to be table stakes.
Why is this an architecture problem, not a model problem?
The default assumption in 2025 was that agents needed better models. Smarter, longer-context, more capable. That bet was half right. Models did get dramatically better; agents that used them still broke in production, just for different reasons. The failures clustered in four places that had nothing to do with model quality and everything to do with the plumbing around it.
That's a familiar pattern. Cloud computing went through it: the first generation of cloud apps used virtual machines as if they were physical servers and got eaten by reliability problems. The second generation accepted that cloud was a different substrate and built for it. AI agents are at the same crossover. Per Osmani's analysis, the teams shipping reliable agents in 2026 are the ones who've stopped treating the model as the product and started treating the surrounding stack as the product.
What are the four bets?
Identity, not borrowed credentials
Today most agents act as a service account, sharing one identity with every other agent in the company. When something goes wrong (an audit log shows a million-dollar transfer), nobody can tell which agent did it. The bet: give every agent its own unforgeable identity at the platform layer, recognised by your IAM and your audit tooling, so individual agent actions are traceable. Closes the 'ghost in the system' problem before regulators force you to.
Universal context, not scraped windows
An agent that has to reason across your CRM, your support tickets, your finance data, and your codebase currently does so by your engineering team writing custom plumbing for each system. Endless boilerplate, brittle integrations, and the agent only sees what the plumbing remembered to pass through. The bet: integrate context once at the platform layer (Model Context Protocol, enterprise data fabric) so the agent reasons across systems without you stitching JSON together by hand.
Surviving the session
Real workflows take hours or weeks (a procurement process, a software migration, a multi-stage onboarding). Most agents today have a context window measured in megabytes and a session lifetime measured in minutes. They hit a ceiling, lose state, and the human has to restart. The bet: durable execution with state checkpointing, long-horizon memory, and explicit human-in-the-loop gates so the agent picks up where it left off after a credential rotation, an outage, or simply a long weekend.
Platforms, not custom stacks
Every team building agents in 2025 wrote its own memory layer, its own observability, its own retry logic. That's the same waste-of-energy pattern that justified the move from bare metal to cloud. The bet: build on open primitives and managed platforms (Temporal, Restate, DBOS, LangGraph, the emerging MCP-based ecosystems) so your team can spend time on the part that's actually domain-specific. Solving 'agents need a memory layer' the eighteenth time is not a competitive advantage.
What does this mean for a small team?
If you're a solo builder or a small team, the four bets read like enterprise problems. Mostly they are. But two of them matter even at the smallest scale, and the other two are worth understanding before you commit to architecture decisions that will hurt later.
Identity matters from day one. If your agent calls an API on a user's behalf, the credentials need to look like that user's, not a generic service account. Get this wrong on a hobby project and the cost is a leaked key. Get it wrong on a paid product and the cost is a compliance event. Either way the fix is much cheaper at the start than at scale.
Session survival becomes urgent the moment your agent does anything that takes more than a few minutes. If your agent's longest task is one prompt-and-reply, the surrounding session story doesn't matter. The moment you ask it to process a queue, fill a form across multiple steps, or run an overnight job, you need state checkpointing or you'll be hand-restarting it daily.
The other two (universal context, platform stack) you can defer. Most hobbyists don't have an enterprise data fabric to integrate with, and the cost of using a custom stack is rounding-error at small scale. Revisit when the scale forces you to.
When is the simpler approach still right?
Three categories where the four-bet architecture is overkill.
Throwaway scripts. An agent that runs for ten minutes once a week to do a thing you'd otherwise do by hand doesn't need an identity layer or durable execution. A shell script with a one-line LLM call is the right answer. Don't over-engineer.
Interactive editor agents. Claude Code, Cursor, and similar in-editor agents already live inside your editor's session. They borrow your identity (your git config, your shell credentials) and lose state when you close the window, and that's correct for the interactive use case. The four bets matter for agents acting autonomously; interactive ones have a human in the loop already.
Prototypes you intend to throw away. If you're learning the space, building a quick demo, or testing whether an agent can do X at all, custom plumbing is fine. The four bets matter when you're committing to running the thing for years; before that point the architectural discipline is friction without payoff.
Free AI Tools You Should Be Using in 2026
20 Actually Useful Things to Ask ChatGPT