What biases AI shopping agents to pick your products?
Yale and Columbia researchers ran AI shopping agents through thousands of buying scenarios. The biases they found are stranger than you'd expect.

Something quietly weird is happening to online shopping. You no longer have to be the one typing 'best floor lamp for living room under £100' into a search box. ChatGPT, Claude, Perplexity, and Google's own AI Mode can now do the typing, the filtering, the comparing, and the choosing, and hand you back a single answer. The shop is the same. The shopper is not.
That raises a question almost no one was asking until a few months ago: if an AI is doing the picking, what is it actually picking on?
A new working paper from researchers at MyCustomAI, Columbia, and Yale tried to find out. They built a test harness, threw thousands of shopping scenarios at three of the big agent models, and watched which products the agents kept gravitating to. The biases they found are sometimes obvious, sometimes funny, and in at least one case worth an 80-point swing in your sales.
How did they actually test this?
The team ran three agent models (Claude Sonnet, Anthropic's mid-tier Claude model balancing speed and intelligence, 4, GPT-4.1, and Gemini 2.5 Flash) through thousands of buying scenarios across eight product categories. The categories that have surfaced publicly include iPhone covers, toilet paper, mousepads, and washing machines, so a deliberate mix of low-stakes and high-stakes purchases. The agents were given a buying brief (the equivalent of a human saying 'I need a floor lamp for an office, around £50, decent reviews') and asked to pick.
Then the researchers changed one variable at a time: the title, the sponsored tag, the review count, the price, and so on. Anything that changed the agent's pick was a bias worth knowing about.
The headline finding: the first few tokens of your title decide everything
The single biggest lever in the whole experiment was the product title, and within that, the first few tokens.
The clearest example uses a real listing. 'SUNMORY Floor Lamps for Living Room, the common shared family space used as an example in AI prompts,' was the original title. When the team renamed it to 'SUNMORY Office Floor Lamp' and ran the agent through a buying scenario where the user wanted an office lamp, the selection rate swung by 80 percentage points. Same product. Same photo. Same reviews. Same price. The agent reads the first few words, decides whether it matches the use case the buyer asked about, and largely moves on.
The practical implication is uncomfortable for marketers used to writing titles for human eyeballs. A title like 'SUNMORY Floor Lamps for Living Room - 3 Brightness Modes, Remote Control, Energy Saving' reads fine to a person scanning a results page. To an agent shopping for an office lamp, the use-case mismatch in the first three tokens is fatal.
The sponsored-label penalty
The next finding is the one that should worry every advertising team in retail. Across all three models tested, the AI shopping agents actively avoided products with sponsored tags. The 'this product is paid for' label is a strong negative signal to an agent in a way it almost never is to a human.
You can argue about whether that is the agents being clever or the agents being naive. Humans do the same thing at the supermarket sometimes, but they also click on plenty of sponsored listings every day on Amazon. Agents treat 'sponsored' as a flag for 'trust this less'. If you are paying for placement on the assumption that AI-mediated traffic will keep flowing through the same channels, this is the first finding to take seriously.
Reviews still matter, but each model values them differently
One of the more surprising results was how unevenly the models weighted review signals. The paper reports that Claude Sonnet 4 tolerated a 19.4% higher price tag on a product with stronger review signals, while GPT-4.1 tolerated a 37.4% higher price for the same lift. Gemini sat somewhere in between.
Put another way: the same product, with the same reviews, will be picked by Claude at one price ceiling and by GPT-4.1 at a noticeably higher one. If you are a brand pricing premium against budget competitors, which agent your customer is using genuinely changes whether you make the cut.
That is the part of the research that should make every marketing team uncomfortable, because right now we are picking marketing tactics by guessing what 'AI in general' likes. There is no 'AI in general'. There is Claude, GPT, Gemini, and they each have a slightly different shopping personality.
What this means if you sell things online
None of this is a finished playbook. The paper is a working draft and the agents are improving every quarter. But there are five practical changes worth making now if you list products in spaces (Amazon, your own Shopify store, anywhere a ChatGPT search might surface you) where AI agents are increasingly the shopper:
- Rewrite your product titles around use cases, not features. 'Office Floor Lamp' before 'Floor Lamps for Living Room'. Test the obvious use cases your buyer would type.
- Audit how often your listings carry a sponsored tag. If a meaningful share of your traffic shifts to AI-mediated shopping, that tag becomes a tax, not a boost.
- Lean into review velocity and review quality. Reviews are the single signal agents consistently trust to justify a higher price.
- Run your real product descriptions through each of the three big models with a realistic buyer brief and see who picks you. The model-to-model gap is real.
- Stop trying to game it. The research is recent and the cat-and-mouse will only intensify. Authentic product quality and clearly described use cases will still be there when the next round of model retraining shifts the rules.
What this means if you are a normal shopper
This part rarely gets talked about. If AI shopping agents have systematic blind spots (favouring the first few title tokens, avoiding sponsored listings, treating Claude and GPT and Gemini as basically the same thing), then sometimes you are going to be handed the wrong recommendation.
The mitigation is simple: when an AI agent gives you a one-line shopping verdict, ask it to show you its top three with reasons. Then read the actual product page yourself. The agent's job is to narrow the field. Your job is still to make the call. We are not yet at the point where 'just trust the agent' is the right answer for anything you are spending real money on.
Where to read the actual research
The working paper is by a team of five researchers affiliated with MyCustomAI, Columbia Business School, and Yale. A summary from Columbia Business School's Digital Future Initiative is the most readable public version. Sciencesays.com has a paid breakdown of the headline 80-point finding. There has also been a podcast episode and a YouTube walkthrough from the retail-media community. If you want primary sources, those are the four to start with.
None of which we can lift verbatim because this is a working paper, not a press release. What is fair to take from it is what the researchers themselves are publicly saying: AI agents have shopping biases, the biases are measurable, and the highest-leverage thing you can do today is rewrite your product titles for the first three words.