GEO · 11 min read
Passage-Level Optimization: The Inverted Pyramid for AI
Summary
AI engines extract 120-180 word chunks, not whole pages. Here's the inverted pyramid pattern that gets passages cited — with concrete examples.
By The Foundgrove team · Published June 15, 2026 · Updated June 29, 2026
Most agencies still write pages like magazine articles: a clever lede, three or four meandering paragraphs, then finally the substance. That pattern was fine in 2018 when humans read the whole page. It is fatal in 2026 when AI engines extract 120-180 word chunks and stitch them into answers. The inverted pyramid is the single highest-leverage formatting change you can make for GEO.
This post is the operator manual for passage-level optimization. For the broader GEO context, see the full pillar. For schema specifics, see the schema deep-dive.
Why do AI engines extract passages instead of whole pages?
Because the synthesis layer (the LLM that composes the answer) has limited context window and tight latency budgets. Retrieving and embedding a whole 3,000-word page wastes tokens. The retrieval pipeline chunks each page into passages, indexes the chunks, retrieves the top-scoring chunks for a given query, and feeds only those chunks to the LLM. Typical chunk size is 120-180 words.
The implication: every passage on your page is competing for citation independently. A buried passage on page 2 of your blog post is irrelevant if the first passage on competitor X's page is cleaner. You're not optimizing a page — you're optimizing 15-25 passages on a page.
What is the inverted pyramid pattern?
The inverted pyramid is a journalism structure where the most important information goes first, supporting facts come next, and context comes last. Applied to GEO, every H2 section opens with a 40-80 word answer capsule that fully answers the H2's question, then 1-3 supporting paragraphs add nuance, examples, and stats. It is the opposite of how marketing copy is usually written and the same as how news articles open.
- Sentence 1: complete answer in 15-25 words. Name the entity. Include a number if possible.
- Sentence 2: scope or qualifier. "For service businesses," "on transactional queries," "as of 2026."
- Sentence 3: supporting fact. A specific range, a comparison, a named tool.
- Optional sentence 4: an attributable source. Cite a real, linkable study (e.g. "per Ahrefs' AI search research") — never a benchmark you can't back.
- Total: 40-80 words. Below 40 = too thin. Above 80 = truncated.
What does a before/after rewrite look like for a dental practice?
Imagine the H2 "How much does a dental implant cost in Austin?". The bad version: "Great question — let's dive in! Dental implants are a serious investment, but the good news is that they can last a lifetime. The cost varies depending on several factors. Read on for our full guide." That's 38 words of nothing. AI engines will skip it.
The inverted-pyramid version: "A single dental implant in Austin costs $3,500-$6,000 including the post, abutment, and crown, with most practices clustered between $4,200 and $5,200. Full-mouth implant cases (All-on-4) run $24,000-$35,000 per arch. Pricing varies with implant brand (Straumann premium, Hahn mid-tier), bone-graft requirements, and whether sedation is included. Most Austin practices accept third-party financing via CareCredit or Cherry." That's 65 words with 7 specific numbers and 5 named entities. It gets cited.
What does it look like for an HVAC company?
Take the H2 "How long does a new HVAC system take to install?". The bad version: "It depends on the size of your home and the system you choose, but most installations take a few days. Our team works efficiently to get your new system up and running fast." Useless to the AI.
The inverted-pyramid version: "A standard residential HVAC replacement takes 6-10 hours for a single-stage system in a 2,000-square-foot home, or 1-2 days for a multi-zone system or full ductwork replacement. Same-day installs are common for direct one-for-one replacements with no electrical or duct modifications. Complex jobs — geothermal, dual-zone variable-speed, or new-construction installs — can run 3-5 days. Most contractors quote a fixed timeline upfront after a 30-minute site assessment." 72 words, 5 numbers, names the system types. Citable.
What does it look like for a personal injury law firm?
Take the H2 "How long does a personal injury case take to settle in Texas?". The bad version: "Every case is different. Some settle quickly, others take longer. A dedicated attorney will fight hard for the best possible outcome." That answers nothing.
The inverted-pyramid version: "Most Texas personal injury cases settle in 6-18 months from the date of injury, with simple soft-tissue claims often resolving in 4-9 months and cases involving surgery, disputed liability, or commercial defendants typically running 12-24 months. Cases that go to trial add another 12-18 months. Texas's 2-year statute of limitations sets a hard deadline for filing suit, but settlement negotiations can continue past the filing date." 72 words, 6 time ranges, names the statute. Citable, and useful.
Where on the page should citation-bait passages go?
Toward the top. A large share of citations tends to come from content high on the page — not because the top is magic, but because most pages put their best definitions there and chunking pipelines surface early, well-formed passages readily. Treat "front-load the answer" as the operating rule and don't make the engine dig for your best material.
Practical implication: put your highest-priority H2 answer capsules in the first 3-5 H2 sections of the page. Save backstory, history, and tangential context for the bottom half. Don't bury the lede.
How long should the page itself be?
Counterintuitively, shorter, tightly structured pages tend to extract more cleanly per passage. A focused 800-word Q&A page gives a chunking pipeline an unambiguous answer to pull; a sprawling 3,000-word guide dilutes any single passage's prominence. This isn't about absolute citation count — long pillar pages still earn more total citations because they contain more passages. It's about per-passage extraction efficiency, and it's a directional rule of thumb, not a precise ratio.
- Short Q&A pages (800-1,200 words): highest extraction rate per passage. Best for high-intent commercial queries.
- Mid-length cluster posts (1,500-2,500 words): balanced extraction + depth. The sweet spot for most cluster content.
- Long pillar pages (3,000-5,000 words): lower per-passage extraction but higher total citations because of passage count. Necessary for topic-authority signal but not the citation workhorses.
- Mega-guides (5,000+ words): diminishing returns on extraction. Only worth it for true pillar pages with strong internal linking.
What rhythm and sentence length works best?
Vary it. The most common AI-tell is a paragraph of 3-4 sentences of nearly identical length and rhythm — that's how LLMs generate by default, and Google's helpful-content classifier picks it up. Mix sentence lengths: a 22-word lead, an 8-word punch, a 14-word qualifier. Use occasional sentence fragments for emphasis. Write the way an experienced operator talks, not the way a junior writer thinks an expert sounds.
Avoid the em-dash as a comma replacement. AI overuses these. Avoid the bulleted list of 4 identical-length items — also an AI-tell. Use specific numbers, named tools, and direct comparisons. Write like a craftsperson who knows the work, not a content factory.
How do you audit existing pages for passage extractability?
Three-step audit. Step 1: pull your top 25 commercial pages. Step 2: for each page, check whether the first paragraph under each H2 is a 40-80 word answer capsule. Score yes/no. Step 3: rewrite the no's. Most pages written for human readers rather than AI extraction score poorly on the first pass — the marketing intro buries the answer. The goal after a rewrite sprint is 90%+ yes, on the premise that cleanly extractable passages are what the synthesis layer can actually cite.
- Audit metric 1: percentage of H2 sections whose first paragraph is 40-80 words. Baseline target: 90%+.
- Audit metric 2: percentage of first paragraphs that name the entity in the first 15 words. Baseline target: 85%+.
- Audit metric 3: average count of specific numbers per H2 section. Baseline target: 2 or more.
- Audit metric 4: ratio of question-formatted H2s. Baseline target: 80%+.
- Audit metric 5: presence of Article + FAQPage compound schema. Baseline target: 100% on Q&A pages.
Score each page against all five metrics. Pages scoring under 60% across the rubric are rewrite candidates. Pages scoring 80%+ are healthy. The work is straightforward — most pages take 45-90 minutes to rewrite once a writer has internalized the pattern. Budget 30-60 hours for a top-25-page rewrite sprint.
If you want this run for you, book a strategy call and we'll quote the audit + rewrite work. For dental practices specifically, see our dental SEO program. For the full GEO context, the SEO pillar covers the whole program.
Where does this fit in your stack?
If you're running a US service business, the playbook in this post pairs with our full services lineup and applies cleanly across our supported industries and US locations. If you want help implementing it, book a free strategy call — we'll review your current setup and prioritize the next three moves.
For the deeper engagement details, see our SEO service. New to the terminology here? Our SEO & marketing glossary defines every acronym in this post.
What are the most common questions about this topic?
Common questions readers send us about this topic.
How many words should each AI-extractable passage be?
40-80 words per passage. Below 40 words is too thin — AI engines skip it because there isn't enough context to extract. Above 80 words risks truncation and reduces the chance of a clean pull. Lead with a 15-25 word complete answer, then add 2-3 supporting sentences with specific numbers, ranges, or named entities.
Where on the page should I put my most important passages?
Toward the top. A large share of citations tends to come from content high on the page — partly because most pages put their best definitions there, and partly because chunking pipelines surface early, well-formed passages readily. Put your highest-priority H2 answer capsules in the first 3-5 H2 sections and save backstory for the bottom half.
Does page length affect extraction rates?
As a rule of thumb, yes — shorter, tightly structured pages tend to surface clean passages more efficiently per chunk, while sprawling pages dilute any single passage's prominence. Long pillar pages still earn more total citations because they contain more passages; they just extract less efficiently per chunk. Treat this as directional guidance, not a precise ratio.
What's an answer capsule?
An answer capsule is a 40-80 word definition-first paragraph that opens an H2 section and fully answers the H2's question. It includes the named entity in the first sentence, at least one specific number, and a scope qualifier. It is the unit of content AI engines extract — and the single highest-leverage formatting change for GEO.
Should I write FAQ-style pages or long-form pages for GEO?
Both, for different jobs. FAQ-style 800-1,200 word pages have the highest per-passage extraction rate and are ideal for high-intent commercial queries. Long-form 2,500-5,000 word pillar pages get more total citations because of passage count and earn topic-authority signal. A balanced cluster (1 pillar + 5-8 supporting Q&A clusters) covers both jobs.
Can I just add an FAQ block at the bottom and call it done?
No. FAQ blocks help, but they aren't a substitute for passage-level optimization across the whole page. AI engines extract from anywhere on the page, and a large share of citations comes from content high on the page. Restructure the main body with the inverted-pyramid pattern and add FAQ at the bottom as reinforcement.
What's the AI-tell I should avoid in my writing?
Paragraphs of 3-4 sentences of nearly identical length, bulleted lists of 4 items with identical rhythms, em-dashes used as comma replacements, and generic openers like 'In today's digital landscape.' Google's helpful-content classifier and Perplexity's content-quality signals both demote these patterns. Vary sentence length and write like an operator, not a content factory.
About Foundgrove
The Foundgrove team
Foundgrove helps US service businesses win qualified leads from search and AI. We write about the practical, measurable side of acquisition — what works in production, not what looks good in a conference deck.
Related reading
Other tactical pieces from the Foundgrove blog.
- GEO · 20 min read
Generative Engine Optimization (GEO): Complete 2026 Guide
AI engines cite differently than Google ranks. Here's the full GEO playbook — the 4-stage citation pipeline and the 40-80 word capsule pattern.
Read the geo playbook → - GEO · 11 min read
How to Get Cited by Google AI Overviews in 2026
AI Overviews now appear on roughly 48% of queries. Here's the passage pattern, schema combo, and concrete before/after rewrites that get you cited.
Read the geo playbook → - GEO · 10 min read
How to Rank in Perplexity in 2026: The Recency Play
Perplexity is strongly recency-biased and cites its sources transparently. Here's the recency-first, clean-HTML playbook that gets you cited.
Read the geo playbook → - GEO · 10 min read
Schema Markup for GEO: Which Schemas Actually Matter
Most schema work is wasted on GEO. Here are the 4 schemas that move AI citation rates — and the ones (Speakable, HowTo rich results) that no longer do.
Read the geo playbook → - SEO · 9 min read
How Google AI Overviews quietly rewrote the SEO playbook
AI Overviews now appear on roughly half of all queries. Here's what changed, what didn't, and the three things every service business should do this quarter.
Read the seo playbook →
Want help applying this to your business?
Book a free 30-minute call. We'll review your current acquisition stack and show you the three highest-leverage moves for your industry and state. Or read how our SEO service works.