In” The Science Of How AI Listens ,” I examined 1 2 million ChatGPT responses to comprehend exactly how AI checks out a web page. In” The Scientific research Of Exactly How AI Picks Its Sources ,” I analyzed 98, 000 citation rows to comprehend which pages make it right into the analysis swimming pool in all.
This is Part 3
Where Part 1 informed you where on a page AI looks, and Component 2 told you which web pages AI regularly thinks about, this set informs you what AI really compensates inside the content it reads.
The data clarifies:
- Many AI SEO composing suggestions does not hold at range. There is no global “write such as this to obtain mentioned” formula– the signals that lift one industry’s citation rates can actively injure one more.
- The entity kinds that predict citation are not the ones being targeted. DATE and NUMBER are universal positives. Rate reduces citation in 5 of 6 verticals, and KG-verified entities are an adverse signal.
- The one writing signal that holds across all 7 verticals: Declarative language in your intro, + 14 % accumulated lift.
- Heading framework is binary. Dedicate to the right number for your vertical or make use of none. Three to four headings are even worse than zero in every upright.
- Corporate web content controls. Reddit does not. AI citation habits does not mirror what happened to organic search in 2023 – 2024
1 Details Creating Signals Impact Citation, While Others Damage It
While” The Scientific research Of Exactly How AI Focuses covers parts of the web page and types of composing that influence ChatGPT exposure, I wanted to recognize which writing-level signals– word count, structure, language design– predict higher AI citation rates across verticals.
Method
- I contrasted high-cited pages (greater than three special punctual citations) vs. low-cited throughout seven creating metrics: word count, definitive language, hedging, checklist products, named entity thickness, and intro-specific signals.
- I evaluated the first 1, 000 words for list item count, called entity density, intro clear-cut language token density, and introductory number count.
Results: Throughout all verticals, definitive wording and including pertinent entities matter. But the majority of signals are flat.
What The Industry Patterns Revealed
When splitting the information up by upright, we instantly see choices:
- Total word matter was greatest in CRM/SaaS (1 59 x).
- Finance was an abnormality with word count: Much shorter web pages win (0. 86 x word count).
- Conclusive expressions in the very first 1, 000 characters declared for a lot of verticals.
- Education and learning is a signal gap. Writing design clarifies nearly absolutely nothing regarding citation probability there.
Leading Takeaways
1 There is no global “compose like this to obtain mentioned” formula. For instance, the signals that raise CRM/SaaS citation rates actively hurt Financing. Instead, match material style to vertical standards.
2 The one universal regulation: open with a straight declarative statement. Not an inquiry, not context-setting, not prelude. The type is” [X] is [Y] or” [X] does [Z]” This is the only writing instruction that holds despite vertical, content type, or size.
3 LLMs “punish” hedging in your intro. “This might assist teams understand” performs even worse than “Teams that do X see Y.” Get rid of qualifiers from your opening paragraph before any type of various other optimization.
2 The Entity Types That Predict Citation Are Not The Ones Being Targeted
Most AEO guidance focuses on named entities as a category: Cram in a lot more known brand, device names, numbers. The cross-vertical entity kind analysis below informs an extra particular (and better) tale.
Approach
- Ran Google’s Natural Language API on the very first 1, 000 characters (concerning 200 – 250 words) of each unique link.
- Calculated lift per entity type: % of high-cited pages keeping that kind/ % of low-cited pages.
- Examined 5, 000 pages throughout 7 verticals.
* A fast note on terminology: Google NLP classifies software, applications, and SaaS tools as CONSUMER_GOOD, a legacy tag from when the API was developed for physical retail. Throughout this analysis, CONSUMER_GOOD means software/product entities.
Results: DATE and NUMBER are one of the most universal positive signals. Remarkably, cost is the best global unfavorable.
What The Market Patterns Showed
- DATE is the most universal positive signal, with the exemption of Financing (0. 65 x).
- NUMBER is the 2nd most global. Particular matters, metrics, and statistics in the introduction constantly anticipate greater citation prices. Finance (0. 98 x) and Product Analytics (1 10 x) mark the floor and ceiling of that variety.
- COST is the toughest universal negative. Pages that open with pricing signal industrial intent. Financing is the single exception at 1 16 x, likely because rate right here means fee portions and price comparisons, which are the actual referral data financial queries are seeking.
- CONSUMER_GOOD (software/product entities) is blended. In Health care, product entities signal established brand names and tools. In Crypto, naming details methods and products is core to answering technological questions.
- PHONE_NUMBER is a favorable signal in Health care (1 41 x) and Education and learning (1 40 x). In both cases, it is almost certainly a proxy for recognized brands/institutions/providers with actual physical existence, not a literal signal to add contact number to your web pages.
The Knowledge Chart inversion deserves its very own note here:
- The data showed that high-cited pages average 1 42 KG-verified entities vs. 1 75 for low-cited pages (lift: 0. 81 x).
- Pages developed around well-known, KG-verified entities (major brand names, organizations, famous individuals) have a tendency towards generic insurance coverage, which isn’t chosen by ChatGPT.
- High-cited web pages are dense with particular, specific niche entities: a certain approach, a specific statistic, a called contrast. A lot of those niche entities have no KG entries whatsoever. That specificity is what AI grabs.
Leading Takeaways
1 Include the publish date to your web pages and aim to utilize at least one details number in your content. That combination is the closest point to a global AI citation signal this dataset generated. Yet Financing gets there via rate information and location specificity rather.
2 Stay clear of opening up with rates in non-finance verticals. Price-dominant introductories associate with lower citation prices.
3 KG visibility and brand authority do not convert to an AI citation advantage. Going after Wikipedia access, brand panels, or KG confirmation is the incorrect lever. Particular, particular niche entities (even ones without KG entries) surpass famous ones.
3 Heading Framework: Dedicate To One Or Don’t Bother
We know headings matter for citations from the previous two evaluations. Next off, I intended to comprehend whether heading count anticipates citation rates and whether the optimum structure differs by vertical.
Strategy
- Counted complete headings per web page (H 1 +H 2 +H 3 throughout all pointed out URLs.
- Organized pages into 7 heading-count buckets: 0, 1 – 2, 3 – 4, 5 – 9, 10 – 19, 20 – 49, 50 +.
- Calculated high-cited rate (% of Links that are high-cited) per container per upright.
Results: Consisting of even more headings in your web content is not globally better. The sweet spot relies on vertical and content kind. One searching for holds everywhere: Strangely, 3 – 4 headings are worse than no.
What The Market Patterns Revealed
- CRM/SaaS is the only vertical where the 20 + heading lift is confirmed: 12 7 % high-cited rate at 20 – 49 headings vs. a 5 9 % standard. The 50 + container reaches 18 2 %. Long organized reference web pages and contrast overviews with one area per tool exceed whatever else right here.
- Health care inverts most dramatically. The high-cited price decreases from 15 1 % at no headings to 2 5 % at 20 – 49 headings. A page with 30 H 2 s on telehealth topics signals optimization intent, not professional authority.
- Money tops at 10 – 19 headings (29 4 % high-cited rate). Structured yet not exhaustive: think price tables, governing failures, and advisor contrast pages with modest heading depth.
- Crypto heights at five to nine headings (34 7 % high-cited price). Technical paperwork in this vertical tends toward dense prose with modest navigating structure. Over-structuring breaks up the technological deepness.
- Education is flat throughout all headcount, which follows the writing signals discovering. Heading structure discusses virtually absolutely nothing concerning citation possibility in education content.
- The 3 to 4 heading dead area holds throughout every vertical without exception. Partial structure puzzles AI navigating without offering the full advantage of a fully commited pecking order.
Leading Takeaways
1 The 20 + heading searching for from Part 1 is a CRM/SaaS searching for, not a global one. Using it to health care, education and learning, or finance could proactively reduce citation prices in those verticals.
2 The concept that holds anywhere: Commit to structure or don’t use it. The middle ground prices you in every upright. A fully-structured page with the appropriate heading depth exceeds a half-structured page in every upright.
3 Make use of the optimal heading range for your vertical. Crypto: 5 – 9 Finance and Education: 10 – 19 CRM/SaaS: 20 + (with H 3 s). Medical care: 0 or 5 – 9 at most. Lengthy CRM referral web pages with 50 + areas are the one instance where maximum heading depth pays off.
4 UGC Does Not Dominate
The “Reddit impact” reshaped natural search between 2024 and 2025 I wished to understand whether ChatGPT mentions user-generated web content (Reddit, forums, evaluations) at significant prices or whether corporate/editorial web content dominates.
The typical sector assumption– that AI also preferentially cites neighborhood voices– is not what we located in the data.
Approach
- Classified these cited Links as (1 UGC: Reddit, Quora, Heap Overflow, online forum subdomains, Medium, Substack, Item Quest, Tumblr, or (2 community/forum prefixes or corporate/editorial by domain name.
- Computed citation share per classification per upright.
- Dataset: 98, 217 citations across 7 verticals.
Results: Company material accounts for 94 7 % of all citations. UGC is almost invisible.
What The Industry Patterns Showed
- Finance is one of the most corporate-locked vertical at 0. 5 % UGC. YMYL (Your Money, Your Life) content appears to methodically suppress citations to neighborhood point of view.
- Healthcare rests at 1 8 % UGC for the exact same structural factor. Clinical, telehealth, and HIPAA material draws virtually specifically from institutional resources.
- Crypto has the greatest UGC penetration in the dataset at 9 2 %. Community-generated content (Reddit technical strings, Medium tutorials, developer forum blog posts) addresses a significant proportion of examined questions. In a fast-moving technological particular niche where main documents consistently delays, community blog posts fill up the void.
- Product Analytics and Human Resources Tech sit at 6 9 % and 5 8 % UGC. Both are verticals where Reddit comparison strings and product review areas give authentic signal alongside corporate content.
Top Takeaways
1 The “Reddit effect” in search engine optimization has actually not equated proportionally to AI citations. In a lot of verticals, reddit.com captures 2 – 5 % of overall citations. This finding remains in line with other market research, consisting of this record from Profound
2 For financing and medical care: UGC has near-zero AI citation value. Invest in structured, reliable corporate content with clear sourcing. Community interaction might matter for various other reasons, however it does not contribute meaningfully to AI citation share in these verticals.
3 For crypto, product analytics, and HR tech: Community presence has quantifiable citation value. Detailed Reddit contrast strings, technical Tool articles, and organized designer discussion forum responses can supplement company content reach.
What This Indicates For Just How You Strategize For LLM Exposure
Across all 3 components of this study, the constant searching for is that AI citation is not mainly a composing quality issue.
Part 2 revealed it is a content design issue: Thin single-intent pages are structurally shut out no matter exactly how well they’re created. This item shows the same reasoning uses inside the content itself.
The accumulated writing signals table is one of the most essential graph in this analysis. Not because it shows you what to do, but due to the fact that it shows how much of what the AI SEO/GEO/AEO sector is informing you does not make it through cross-vertical examination. Word matter, listing density, called entity matters … all flat or unfavorable at the accumulation. The signals that work are vertical-specific and smaller than our sector’s agreement indicates.
The meta-lesson from this evaluation is that findings are upright (and possibly topic) details, which is no various in search engine optimization.
This component wraps up the Scientific research of AI– for now. Since the AI environment is continuously changing.
Technique
We examined ~ 98, 000 ChatGPT citation rows pulled from around 1 2 million ChatGPT reactions from Gauge.
Due to the fact that AI acts in a different way depending upon the topic, we isolated the information across 7 distinct, confirmed verticals to guarantee the findings weren’t skewed by one particular sector.
Examined verticals:
- B 2 B SaaS
- Financing
- Health care
- Education
- Crypto
- Human resources Technology
- Product Analytics
Included Picture: CoreDESIGN/Shutterstock; Paulo Bobita/Search Engine Journal
Advised AI Advertising And Marketing Devices
Disclosure: We might make a compensation from affiliate web links.
Original insurance coverage: www.searchenginejournal.com


Leave a Reply