Foundgrove
← All posts

GEO · 9 min read

llms.txt Explained: Does It Work, and How to Do It Right

Summary

A growing set of domains has adopted llms.txt — but does GPTBot, ClaudeBot, or PerplexityBot actually fetch it? Here's the honest assessment and the right template.

By The Foundgrove team · Published May 28, 2026 · Updated June 29, 2026

llms.txt is the GEO topic that gets the most breathless coverage and the least honest assessment. The hype: it's the robots.txt of the AI era. The reality: as of late 2025, no major AI crawler actually fetches it at scale. The right move: publish it anyway, because the cost is one afternoon and the upside is real if adoption catches up. Here's the honest version of the story.

This post covers what llms.txt is, what the audit data actually shows, and exactly how to publish it correctly if you decide to. For the broader GEO context, see the pillar. For where to put your schema work instead, see the schema deep-dive.

What is llms.txt, exactly?

llms.txt is a proposed standard, introduced by Jeremy Howard (Answer.AI / fast.ai) in late 2024, for a markdown file at the root of your domain that tells AI crawlers and LLM tools which pages on your site are most important and provides clean markdown versions of your content. It is structurally similar to sitemap.xml but written for LLMs rather than search engines. The companion file llms-full.txt contains the actual concatenated content.

The standard lives at llmstxt.org and has a specific format: a top-level H1 with the site name, a blockquote with a short description, then sections of links organized by topic. Each link can have a short markdown description. It's deliberately simple — designed to be both human-readable and trivially parseable by any LLM.

How many sites have adopted llms.txt?

Adoption is growing but still niche. Community directories that track llms.txt files list a few thousand domains at most, skewed heavily toward developer-facing companies (Anthropic, Stripe, Vercel, and Cloudflare are commonly cited examples) and AI tooling companies. Service businesses are barely represented — if you publish one, you're an early adopter in that segment, not a laggard.

Adoption growth is steady but not exponential. The standard hasn't been formally adopted by any major AI vendor, which is the root cause of the next question.

Do AI crawlers actually fetch llms.txt?

There's no public evidence that they do, in measurable volume. As of late 2025, no major AI vendor has confirmed that GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google), or Bingbot operationalizes llms.txt, and the simplest way to check for yourself is your own server logs (covered below). If you want certainty, grep your access logs for hits to the file from named AI user agents rather than trusting anyone's headline number.

This is the honest, somewhat-deflating truth. The standard exists. Sites have adopted it. The major AI crawlers don't appear to have operationalized it. That could change — but as of this writing, there's no demonstrated effect of llms.txt on actual AI citation rates.

Then why publish it at all?

Four reasons. First, future-proofing — if any major AI vendor adopts the standard, you want to be ready. Second, internal RAG — if you build your own AI tooling against your content (chatbots, internal search), llms.txt is a clean source of truth. Third, professional signal — it telegraphs that you take AI-era discoverability seriously, which matters for B2B sales and PR. Fourth, the cost is one afternoon and the file is trivially maintainable.

The wrong reason to publish: thinking it will move AI citation rates this quarter. It won't. Anyone selling you a "$2,000 llms.txt optimization" service is selling snake oil. The right reason: it's a small bet on a future standard with negligible downside.

What's the right llms.txt template?

The standard format is markdown with a specific structure. Start with an H1 of the site name. Add a blockquote with a 1-2 sentence description. Then organize sections with H2 headers, each containing a markdown list of links. Optional sections include an "Optional" H2 (lower-priority links) and a "Notes" section for clarifications.

  • Line 1: # Site Name (H1 with the site name)
  • Line 2-3: > One-sentence description of what the site is and who it's for.
  • Section: ## Docs — links to product documentation, ordered most-to-least important
  • Section: ## Guides — links to longer educational content (pillars, deep-dives)
  • Section: ## Services — links to commercial service pages
  • Section: ## Case Studies — links to social proof and outcome stories
  • Section: ## Blog — links to high-priority blog posts (not everything, just the canonical pieces)
  • Section: ## Optional — lower-priority but still relevant links
  • Each link format: - Page Title: one-line description
  • Keep each section to 5-15 links. More than 20 dilutes signal.
  • Use absolute URLs everywhere — the file may be parsed by tools that don't know your domain context.
  • Update the file when major new pages are published or canonical pages change.

What about llms-full.txt?

llms-full.txt is the companion file that contains the actual concatenated markdown content of your most important pages. Where llms.txt is a curated link sitemap, llms-full.txt is the full-text version designed for LLMs to consume directly without needing to crawl each linked page. It is typically 50-500KB depending on site size.

Generate it programmatically by concatenating the markdown versions of every page listed in your llms.txt, separated by clear page markers (e.g., `---` and the page URL as a header). Most static-site generators (Next.js, Astro, Hugo) can build it in a few lines of script. Update it on every deploy that touches priority pages.

How do you test whether crawlers see your llms.txt?

Two checks. First, manual: curl your llms.txt with each crawler's user agent and verify the response is correct. Commands look like `curl -A 'GPTBot/1.0' https://yourdomain.com/llms.txt`. Repeat with ClaudeBot, PerplexityBot, Google-Extended, Bingbot. Second, server-log analysis: pull access logs for the past 30 days and grep for hits to /llms.txt from named AI user agents.

  • curl -A 'GPTBot/1.0' https://yourdomain.com/llms.txt
  • curl -A 'ClaudeBot/1.0' https://yourdomain.com/llms.txt
  • curl -A 'PerplexityBot/1.0' https://yourdomain.com/llms.txt
  • curl -A 'Mozilla/5.0 (compatible; Google-Extended)' https://yourdomain.com/llms.txt
  • grep 'llms.txt' /var/log/nginx/access.log | grep -iE 'gptbot|claudebot|perplexity|google-extended'
  • If you see zero hits from these UAs over 30 days, you're seeing the same pattern everyone else sees as of late 2025.

What's the practical recommendation?

Publish llms.txt and llms-full.txt. Spend an afternoon on it. Use the standard template. Don't pay anyone $2,000 to do it for you. Don't expect it to move citation rates this quarter. Re-audit in 6 months to see if any major crawler has started fetching it. If you want it deployed inside a broader GEO program, book a strategy call — we include it in every retainer because the cost is negligible and the optionality is real.

For the actual citation-moving work, focus your effort on passage-level optimization and schema deployment. Those are the levers that pay back this quarter. llms.txt is a small bet on next year.

Where does this fit in your stack?

If you're running a US service business, the playbook in this post pairs with our full services lineup and applies cleanly across our supported industries and US locations. If you want help implementing it, book a free strategy call — we'll review your current setup and prioritize the next three moves.

For the deeper engagement details, see our SEO service. New to the terminology here? Our SEO & marketing glossary defines every acronym in this post.

What are the most common questions about this topic?

Common questions readers send us about this topic.

What is llms.txt?

llms.txt is a proposed standard, introduced by Jeremy Howard in late 2024, for a markdown file at the root of your domain that tells AI crawlers which pages are most important and provides clean markdown content for LLM consumption. It is structurally similar to sitemap.xml but designed for LLMs. The full spec lives at llmstxt.org.

Do AI crawlers actually fetch llms.txt?

There's no public evidence that they do, in measurable volume, as of late 2025. No major AI vendor has confirmed that GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or Bingbot operationalizes llms.txt. The most reliable check is your own server logs: grep access logs for hits to /llms.txt from named AI user agents. The standard doesn't yet appear to be operationalized by major vendors.

Should I publish llms.txt anyway?

Yes — the cost is one afternoon and the upside is real if adoption catches up. It also helps with internal RAG pipelines, future-proofs your site if any major AI vendor adopts the standard, and serves as a professional signal that you take AI-era discoverability seriously. Don't pay a vendor more than a few hundred dollars to set it up.

What's the difference between llms.txt and llms-full.txt?

llms.txt is a curated markdown file listing your most important pages with descriptions. llms-full.txt is the companion file containing the actual concatenated markdown content of those pages. llms.txt is for navigation and prioritization; llms-full.txt is for direct LLM consumption without crawling each linked page.

How big should llms-full.txt be?

Typically 50-500KB depending on site size. Include the markdown content of every page listed in llms.txt, separated by clear page markers (e.g., a horizontal rule and the page URL as a header). Most static-site generators can build it programmatically in a few lines of script during the deploy pipeline.

How often should I update llms.txt?

Whenever major new pages are published or canonical pages change significantly. For most service businesses, that's quarterly. Add the file to your deploy pipeline so it regenerates from the canonical source list rather than being maintained manually. Stale llms.txt is worse than no llms.txt because it misroutes any future crawler that does adopt the standard.

Can llms.txt replace robots.txt or sitemap.xml?

No. llms.txt is additive — it doesn't replace robots.txt (which controls crawl access) or sitemap.xml (which is consumed by Googlebot and Bingbot). All three files serve different purposes. Publish all three. They are independent and non-conflicting.

About Foundgrove

The Foundgrove team

Foundgrove helps US service businesses win qualified leads from search and AI. We write about the practical, measurable side of acquisition — what works in production, not what looks good in a conference deck.

Want help applying this to your business?

Book a free 30-minute call. We'll review your current acquisition stack and show you the three highest-leverage moves for your industry and state. Or read how our SEO service works.

Free SEO & AI visibility auditGet my free audit