Skip to main content

Why your long-form guides don't get cited in ChatGPT

By Jason Roy
Why your long-form guides don't get cited in ChatGPT

If you've spent the last few years building "ultimate guides," which are long pages that cover every angle of a topic, the data has some uncomfortable news. 

AirOps ran a comprehensive study of 353,799 pages and 16,851 queries, which revealed some very interesting insights into how ChatGPT retrieves and cites content.

Perhaps the most useful one is that covering 100% of a topic's subtopics adds just 4.6 percentage points to your ChatGPT citation rate compared to covering none of them. FOUR.POINT.SIX! 

That's nearly no return on what is often hundreds of hours of content work.

The reason has to do with how ChatGPT retrieves content. When someone asks a question, the model doesn't grab one page and read it top to bottom. It breaks the query into sub-queries and retrieves pages for each one separately. A focused page that cleanly answers one specific sub-query will outperform a sprawling guide that technically answers all of them, because it matches each sub-query better in retrieval.

The analysis confirms this directly: pages covering 26–50% of a topic's subtopics outperform pages covering 100%. The exhaustive guide actively performs worse.

If you have long guides on your site, the useful exercise isn't trimming them. It's auditing them. List every distinct question the page answers. Each of those questions is a candidate for its own standalone page. That's the structural shift the data is pointing to.

Headings need to match the query, not label the content

When ChatGPT retrieves a page for a specific sub-query, it looks for structural signals that the page actually answers that question. Headings are one of the clearest signals available, and most content teams are getting this wrong.

The data is specific: pages where headings match the query at a score of 0.90 or above earn a 41% citation rate. Pages with weaker heading matches sit at 30%. That's an 11-point gap from one formatting decision.

Consider the difference between "Meta description best practices" and "How long should a meta description be?" The first labels a category of content. The second matches the way someone would actually ask the question. The second tells the retrieval system: this page answers a specific query, not that it covers a topic area.

Most content teams write headings from the inside out. They know what the page covers and label it accordingly. ChatGPT retrieval works from the outside in. It's looking for pages whose structure signals a match to a specific question. Those two approaches produce very different heading styles.

The fix is straightforward. 

Pull your top queries from Google Search Console and compare them against your current headings. Where your heading is a topic label instead of a question, rewrite it. You don't need to touch the content underneath. Just the heading.

There's a length ceiling, not just a floor

The study puts the optimal length range at 500–2,000 words, with 7–20 subheadings. Both ends of that range matter, and the upper one is the end most content teams ignore.

Below 500 words, a page typically doesn't have enough substance to satisfy a sub-query. That's the floor most people already know about. But above 2,000 words, a different problem kicks in. Pages that long are almost always covering multiple distinct questions. That breadth dilutes focus, and diluted focus is exactly what hurts citation rate.

A 900-word page built around one question will outperform a 3,000-word guide covering that same question plus eight related ones. Length is a symptom of focus, not a measure of quality. A long page is more likely to be unfocused.

Filter your content by word count. Pull every page over 2,500 words and ask how many distinct questions it answers. If the number is more than one or two, those questions are candidates for separate pages. 

The goal isn't to trim — trimming doesn't fix the underlying structure. It's to split by question.

Citations are binary (build for the 25%, not the 58%)

Here's the finding that should change how you think about optimization: 58% of pages retrieved by ChatGPT are never cited. A further 17% are cited inconsistently. Only 25% of retrieved pages are cited consistently; every time they appear in retrieval, they get cited.

That 25% group isn't made up of pages that scored marginally better across various metrics. The gap between the cited group and the uncited group is structural. Pages in the 25% share a consistent set of characteristics: focused coverage, strong heading-to-query match, 500–2,000 words, and a strong retrieval rank.

This matters for how you decide where to spend time. If a page is being retrieved but not cited, the instinct is to optimize it: adjust headings, add content, tweak structure. 

But the data suggests that incremental work on a structurally weak page isn't the path. There's no meaningful middle ground between the 25% and the 58%.

The useful diagnostic is to check which of your pages appear in ChatGPT responses but don't get cited. Those pages aren't broken in a way that's easy to patch. Stop working on them. The better use of that time is building new pages that are structurally sound from the start: focused, question-led, within the length range, and targeting specific sub-queries.

The Wikipedia exception, and why it's not a model to follow

Wikipedia sits at a median retrieval rank of 24, has the lowest query-match scores in the analysis at 0.576, and yet earns a 59% citation rate. On every structural measure the data identifies as important, Wikipedia is a poor performer. And it still gets cited more consistently than almost everything else.

The explanation isn't complicated. Wikipedia's citation rate comes from two decades of institutional authority built across millions of pages and a link graph that no other site comes close to replicating. That authority is baked into retrieval in a way that no content strategy decision will change.

This matters because Wikipedia's numbers can be read as evidence that exhaustive, encyclopedic content works. It doesn't. 

Treat it as a statistical outlier with its own explanation, not a template. If you find yourself using Wikipedia's citation rate to justify building longer, broader content, the data is pointing the other way.

What to do this week

  • Pull your ten longest pages and list every distinct question each one answers. Any page covering more than two questions is a split candidate. Map out one focused page per question before you do anything else.
  • Open Google Search Console and pull your top 50 queries for the pages you care about. Compare each query against the headings on the matching page. Rewrite any heading that labels a topic instead of asking a question.
  • Search ChatGPT for 5 to 10 queries your content targets and note which of your pages appear. For any page that appears but doesn't get cited, stop further work on it and move it to a rebuild list.
  • For pages already in the 500–2,000 word range with focused coverage, check heading-to-query alignment. This group is closest to the 25% citation threshold, and the work is usually a heading rewrite, not a full page overhaul.
  • Build one new focused page this week from a question you identified in step one. Keep it under 2,000 words, write the heading as the question itself, and cover that question only.

Want to see how your existing content is performing technically before you start restructuring? Run a free audit at seositecheckup.com. It checks the signals that matter in about 2 minutes.

Dominate search today on Google and AI Engines.

Join 85,000+ SaaS Marketers, Growth Agencies, Content-Led Companies and E-commerce Brands.

See Pricing
Dashboard preview showing SEO site checkup metrics, page group insights, and issue prioritization