Why ChatGPT Cites Your Page (or Doesn't): 1.4M Prompts
An Ahrefs analysis of 1.4M ChatGPT prompts shows what actually wins citations - and why Reddit's huge presence rarely earns credit.

If you are running a personal site or a small business blog and wondering whether ChatGPT will ever surface you as a source, the most useful piece of public data on the topic so far landed in April 2026: an Ahrefs analysis of 1.4 million ChatGPT 5.2 prompts. The findings are surprising in places, obvious in others, and have specific practical takeaways for anyone outside a big enterprise SEO team.
This post walks through what the study actually measured, the most counter-intuitive finding (Reddit's invisible influence), what determines whether your page gets cited, and four specific things a small site owner can change today.
What did Ahrefs actually measure?
The study took 1.4 million ChatGPT 5.2 desktop prompts from February 2025 and isolated each retrieved URL by source - what Ahrefs calls the ref_type. ChatGPT does not retrieve everything from one place. It pulls from at least five distinct sources, each with its own retrieval pipeline and very different citation rates:
- Search (general web index): 25.5M data points, 88.46% citation rate
- News: 3.94M data points, 12.01% citation rate
- Reddit: 16.18M data points, 1.93% citation rate
- YouTube: 953K data points, 0.51% citation rate
- Academia: 185K data points, 0.40% citation rate
Across the whole dataset, only 49.98% of retrieved URLs ended up as visible citations. The other half were used for context and discarded. That gap is most of the story.
Why is Reddit cited so rarely if ChatGPT uses it so much?
This is the headline finding of the study. ChatGPT pulled in 16+ million Reddit URLs across the prompt set. It cited fewer than 2% of them. By Ahrefs's count, 67.8% of all non-cited URLs in the dataset came from Reddit - meaning ChatGPT is reading Reddit at industrial scale to understand topics, gauge consensus, and shape its answers, but almost never crediting the threads it learned from.
There are two reasonable readings, and they have very different practical implications:
Reading 1 - Reddit is reference material, not source material. The model uses Reddit threads the way a researcher uses a focus group: it forms a view of what real people think, then writes an answer in its own voice citing more authoritative places. Reddit gets the credit only when the question is explicitly community-shaped ("what does Reddit think of X?").
Reading 2 - Reddit has high noise, high duplication, and low per-page authority, so each individual thread loses the ranking tournament to a single Wikipedia paragraph or a single product-spec page. Reddit's value is aggregate, not per-URL.
Both readings are probably right at once. Either way, the practical lesson is the same: if your strategy involves "I'll just be active on Reddit and ChatGPT will surface me," the citation numbers say it probably won't. Reddit shapes ChatGPT's thinking; the search index gets the credit.
What actually makes ChatGPT cite a page?
Three things, in roughly this order of weight, based on the Ahrefs correlations:
1. Did the page rank in search? The single biggest signal. Pages retrieved via the search ref_type are cited 88.46% of the time. If you are not ranking in normal search results for the query, ChatGPT is unlikely to pull you in - and even less likely to cite you when it doesn't. Classic SEO is the foundation of AI search visibility, not a separate game.
2. Does the title semantically match the question's sub-queries? The study measured cosine similarity (a 0-to-1 score of how close two pieces of text are in meaning) between prompt and title. Cited pages scored 0.602 on average; non-cited pages scored 0.484. More importantly, when Ahrefs measured similarity against ChatGPT's internal "fanout queries" - the sub-questions the model invents while answering you - cited pages scored 0.656. Titles that mirror how an AI assistant would phrase a sub-question outperform titles tuned only to the original prompt.
3. Does the URL slug read like English? Pages with natural-language slugs (e.g. /best-mesh-router-uk) were cited 89.78% of the time. Pages with opaque slugs (numeric IDs, hash strings, vendor-database paths) were cited 81.11%. Not a huge gap, but real - and free to act on.
Does freshness matter?
Less than you might think. Across cited pages, the median age was around 500 days, and Ahrefs recorded cited pages as old as 2,700 days (more than seven years). News is the exception - freshness acts as a tiebreaker there when relevance scores are equal. Everywhere else, relevance trumps recency.
This matters because the standard SEO instinct of constantly republishing the same post with a new date doesn't show up in this data as a meaningful citation lever. Better to write something worth citing once than to re-date the same article every quarter.
What should a small UK site owner do differently?
Four things, in order of effort-to-impact ratio:
- Write titles that match the question, not the keyword. "Best mesh router UK 2026" is a keyword phrase. "What's the best mesh router for a UK home in 2026?" is the sub-question ChatGPT will internally generate while answering. They look similar; the latter scores higher on semantic similarity. Mix the noun-phrase form (for normal search ranking) with question-shape H2s inside the post (for AI sub-query matching).
- Use natural-language URL slugs from day one. Most modern CMSes default to this; double-check that anything you publish has a slug like
/dog-friendly-pubs-cornwall/rather than/p/2387/article-final-v3. If you have a legacy archive with opaque slugs, leave them alone unless you can 301-redirect cleanly - changing URLs for citation lift is rarely worth the SEO risk. - Treat normal search ranking as your primary AI-visibility tactic. 88% of cited URLs come from the search ref_type. Anything that ranks you in Google ranks you in ChatGPT's search retrieval. Internal links, schema markup, fast page load, useful original content - all the things you would do anyway. There is no separate "GEO" (Generative Engine Optimisation) playbook that ignores SEO and works.
- Stop expecting Reddit traffic to flow to your site via ChatGPT. Reddit posts can drive direct subreddit clicks if helpful, but ChatGPT is not going to surface your username or domain via Reddit citations at any meaningful rate. If you want ChatGPT visibility, the lever is on your own site, not r/yourniche.
How does this fit with the wider AI-search picture?
Two pieces of context worth knowing alongside the Ahrefs data:
5W Research found Wikipedia and Reddit together drive over 25% of ChatGPT citations in US results. That looks like a contradiction of Ahrefs - it isn't. The 5W study measured the cited-source mix across visible results; Ahrefs measured citation rate against the broader retrieved pool. Both can be true: Reddit accounts for a meaningful slice of the cited URLs in absolute terms because it gets retrieved so much, but each individual Reddit URL has a low chance of being cited.
The model and the index will both change. The Ahrefs data is from ChatGPT 5.2 with the Feb 2025 retrieval mix. OpenAI ships changes to the retrieval pipeline several times a year. The structural lessons - rank in search, write titles that match questions, use readable URLs - will probably keep holding because they map to underlying retrieval principles. The exact citation rates per source will move.
Frequently asked questions
Q01Will ChatGPT eventually cite Reddit more?
Q02Is this the same as 'GEO' (Generative Engine Optimisation)?
Q03Does this apply to Perplexity, Google AIO, and Claude with web search too?
Q04How long should a post be to get cited?
Q05Should I add my site to ChatGPT directly somehow?
Context Rot: Why Long AI Sessions Get Worse
ChatGPT vs Claude vs Gemini: Which Should You Use?