What biases AI shopping agents to pick your products?

Yale and Columbia researchers ran AI shopping agents through thousands of buying scenarios. The biases they found are stranger than you'd expect.

Updated 14 June 2026 How we review →

By Rob14 June 2026 · 8 min read

Something quietly weird is happening to online shopping. You no longer have to be the one typing 'best floor lamp for living room under £100' into a search box. ChatGPT, Claude, Perplexity, and Google's own AI Mode can now do the typing, the filtering, the comparing, and the choosing, and hand you back a single answer. The shop is the same. The shopper is not.

That raises a question almost no one was asking until a few months ago: if an AI is doing the picking, what is it actually picking on?

A new working paper from researchers at MyCustomAI, Columbia, and Yale tried to find out. They built a test harness, threw thousands of shopping scenarios at three of the big agent models, and watched which products the agents kept gravitating to. The biases they found are sometimes obvious, sometimes funny, and in at least one case worth an 80-point swing in your sales.

How did they actually test this?

The team ran three agent models (Claude Sonnet, Anthropic's mid-tier Claude model balancing speed and intelligence, 4, GPT-4.1, and Gemini 2.5 Flash) through thousands of buying scenarios across eight product categories. The categories that have surfaced publicly include iPhone covers, toilet paper, mousepads, and washing machines, so a deliberate mix of low-stakes and high-stakes purchases. The agents were given a buying brief (the equivalent of a human saying 'I need a floor lamp for an office, around £50, decent reviews') and asked to pick.

Then the researchers changed one variable at a time: the title, the sponsored tag, the review count, the price, and so on. Anything that changed the agent's pick was a bias worth knowing about.

The headline finding: the first few tokens of your title decide everything

The single biggest lever in the whole experiment was the product title, and within that, the first few tokens.

The clearest example uses a real listing. 'SUNMORY Floor Lamps for Living Room, the common shared family space used as an example in AI prompts,' was the original title. When the team renamed it to 'SUNMORY Office Floor Lamp' and ran the agent through a buying scenario where the user wanted an office lamp, the selection rate swung by 80 percentage points. Same product. Same photo. Same reviews. Same price. The agent reads the first few words, decides whether it matches the use case the buyer asked about, and largely moves on.

The practical implication is uncomfortable for marketers used to writing titles for human eyeballs. A title like 'SUNMORY Floor Lamps for Living Room - 3 Brightness Modes, Remote Control, Energy Saving' reads fine to a person scanning a results page. To an agent shopping for an office lamp, the use-case mismatch in the first three tokens is fatal.

The sponsored-label penalty

The next finding is the one that should worry every advertising team in retail. Across all three models tested, the AI shopping agents actively avoided products with sponsored tags. The 'this product is paid for' label is a strong negative signal to an agent in a way it almost never is to a human.

You can argue about whether that is the agents being clever or the agents being naive. Humans do the same thing at the supermarket sometimes, but they also click on plenty of sponsored listings every day on Amazon. Agents treat 'sponsored' as a flag for 'trust this less'. If you are paying for placement on the assumption that AI-mediated traffic will keep flowing through the same channels, this is the first finding to take seriously.

Reviews still matter, but each model values them differently

One of the more surprising results was how unevenly the models weighted review signals. The paper reports that Claude Sonnet 4 tolerated a 19.4% higher price tag on a product with stronger review signals, while GPT-4.1 tolerated a 37.4% higher price for the same lift. Gemini sat somewhere in between.

Put another way: the same product, with the same reviews, will be picked by Claude at one price ceiling and by GPT-4.1 at a noticeably higher one. If you are a brand pricing premium against budget competitors, which agent your customer is using genuinely changes whether you make the cut.

That is the part of the research that should make every marketing team uncomfortable, because right now we are picking marketing tactics by guessing what 'AI in general' likes. There is no 'AI in general'. There is Claude, GPT, Gemini, and they each have a slightly different shopping personality.

What this means if you sell things online

None of this is a finished playbook. The paper is a working draft and the agents are improving every quarter. But there are five practical changes worth making now if you list products in spaces (Amazon, your own Shopify store, anywhere a ChatGPT search might surface you) where AI agents are increasingly the shopper:

Rewrite your product titles around use cases, not features. 'Office Floor Lamp' before 'Floor Lamps for Living Room'. Test the obvious use cases your buyer would type.
Audit how often your listings carry a sponsored tag. If a meaningful share of your traffic shifts to AI-mediated shopping, that tag becomes a tax, not a boost.
Lean into review velocity and review quality. Reviews are the single signal agents consistently trust to justify a higher price.
Run your real product descriptions through each of the three big models with a realistic buyer brief and see who picks you. The model-to-model gap is real.
Stop trying to game it. The research is recent and the cat-and-mouse will only intensify. Authentic product quality and clearly described use cases will still be there when the next round of model retraining shifts the rules.

What this means if you are a normal shopper

This part rarely gets talked about. If AI shopping agents have systematic blind spots (favouring the first few title tokens, avoiding sponsored listings, treating Claude and GPT and Gemini as basically the same thing), then sometimes you are going to be handed the wrong recommendation.

The mitigation is simple: when an AI agent gives you a one-line shopping verdict, ask it to show you its top three with reasons. Then read the actual product page yourself. The agent's job is to narrow the field. Your job is still to make the call. We are not yet at the point where 'just trust the agent' is the right answer for anything you are spending real money on.

Where to read the actual research

The working paper is by a team of five researchers affiliated with MyCustomAI, Columbia Business School, and Yale. A summary from Columbia Business School's Digital Future Initiative is the most readable public version. Sciencesays.com has a paid breakdown of the headline 80-point finding. There has also been a podcast episode and a YouTube walkthrough from the retail-media community. If you want primary sources, those are the four to start with.

None of which we can lift verbatim because this is a working paper, not a press release. What is fair to take from it is what the researchers themselves are publicly saying: AI agents have shopping biases, the biases are measurable, and the highest-leverage thing you can do today is rewrite your product titles for the first three words.

Q01Is the 80 percentage point swing real, or is it a marketing headline?

It is a published finding from the Columbia/Yale/MyCustomAI working paper, specifically about moving an intent-matched keyword to the start of a product title in scenarios where the buyer wanted that exact use case. It is not the typical lift you'd see across every product. Treat it as the upper bound: what's possible when there is a sharp use-case mismatch you can fix.

Q02Do these biases apply to every AI shopping agent equally?

No. The research specifically compared Claude Sonnet 4, GPT-4.1, and Gemini 2.5 Flash and found meaningful differences in price tolerance, review weighting, and how aggressively each one penalised sponsored tags. If your customers are using one of those models more than the others, the optimisation looks different.

Q03Will sponsored ads stop working in AI search entirely?

Almost certainly not. Retailers and ad networks are already working on new ad formats designed to be agent-friendly. The current 'sponsored' tag is a hangover from the human-shopping era and is structurally easier for agents to filter on than for humans. Expect the labelling to evolve.

Q04I have hundreds of product listings. Where do I start?

Start with your top 20 by revenue. Rewrite each title to lead with the use case your buyer would type, then run the new titles through ChatGPT, Claude and Gemini with a realistic buyer brief. The ones that flip from 'not picked' to 'picked' are your highest-confidence wins.

Q05Does this apply to services and non-physical products too?

Probably yes for service listings on directories (Yelp, Google Business Profile, comparison sites) where the agent reads a title and short description. There is no published research on services yet, but the underlying mechanism (the agent reads the first few tokens, decides, moves on) is the same.