AEO · 7 min read
How to Measure AI Search Visibility Without Spending $500/mo
Summary
Enterprise AEO tools start at $500/mo. Here's the DIY stack that costs under $20/mo and gives cleaner data than most paid platforms.
By The Foundgrove team · Published June 4, 2026 · Updated June 29, 2026
Why do most AEO measurement tools cost too much for what they deliver?
AEO tooling is in its early-2018-SEO-software phase. Platforms like AthenaHQ, Profound, and Goodie AI charge $499-$4,000/month for what is fundamentally a prompt scheduler, a response logger, and a brand-mention extractor. The math behind the price is API costs (small), engineering overhead (real), and willingness-to-pay from VC-funded mid-market customers (high). For a business under $200K MRR, the data-to-cost ratio is poor — you pay for a polished dashboard rather than for unique insight.
The actionable insight from any AEO tool is the same: which prompts mention me, which brands compete in those prompts, what sources are being cited. You can replicate that for the cost of API calls if you are willing to write 200 lines of Python and maintain a Google Sheet.
What does the manual baseline look like before any tooling?
The manual baseline is 30-50 prompts per week, run by hand across ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews, with results logged in a single sheet. It takes about 90 minutes per week for one person and produces cleaner data than most paid tools because you can read the full response and catch nuances (sentiment, framing, source attribution) that automated tools miss.
- Prompt list: 40-60 buyer-intent prompts across the 5 categories (BEST-X, comparison, gap/objection, problem, branded).
- Engines: ChatGPT (paid, for web search access), Claude, Gemini, Perplexity, plus a manual check on Google AI Overviews.
- Log per row: prompt, engine, date, your brand mentioned (Y/N), position if mentioned, competing brands named, sources cited, sentiment.
- Sheet: one tab per engine, one row per (prompt, engine, date). Pivot weekly for citation share trend.
What does Search Console tell you about AI visibility?
Google Search Console added AI Overview click and impression data in mid-2025. You can now filter by 'Search Appearance: AI Overviews' to see which queries surface your pages inside an Overview and what the click-through looks like. This is the cleanest free signal on the GEO side of the house (Google AI Overviews specifically) and a useful proxy for how well your structured content lifts into AI summaries.
Limits: this is Google-only data, so it does not cover ChatGPT, Claude, or Perplexity. But it is free, accurate, and shows you exactly which pages and queries are working. Pair it with manual prompt testing on the chat engines to cover the full AEO surface area.
How do server logs reveal which AI crawlers are visiting?
Every major AI engine crawls your site with an identifiable user agent. Server log analysis tells you crawl frequency, which pages are being visited, and whether your robots.txt is accidentally blocking the bots you want. A simple weekly log pull, charted per bot, is enough to track the trend.
- GPTBot — OpenAI's training crawler.
- ChatGPT-User — OpenAI's live-RAG crawler (different from GPTBot, fires when a user triggers web search mid-conversation).
- ClaudeBot — Anthropic's training and retrieval crawler.
- Claude-Web — Anthropic's live retrieval crawler.
- PerplexityBot — Perplexity's crawler.
- Google-Extended — Google's AI training crawler (separate from Googlebot).
- Bytespider — ByteDance/TikTok's AI crawler.
- Applebot-Extended — Apple Intelligence training crawler.
A simple grep-and-count script over your access logs gives you a weekly bot-visit dashboard. Rising visit frequency for GPTBot and ClaudeBot is a leading indicator that your content is being absorbed into the next training cycle.
How do paid AEO tools compare on price and value in 2026?
- Otterly.ai — $29-$129/mo entry-level. Tracks ChatGPT, Perplexity, AI Overviews. Best for solo founders who want a dashboard, not a CSV.
- Peec AI — $99-$299/mo. Multi-engine + sentiment. Limited custom prompt taxonomy depth.
- AthenaHQ — $1,500-$4,000/mo. Enterprise-grade, full prompt clustering and competitive analysis. Worth it above $200K MRR with a dedicated owner.
- Profound — $499/mo entry, enterprise at $2,500+. Strong competitive analysis. Good for mid-market B2B.
- Goodie AI — bundled with managed-service offering only.
- DIY (Python + APIs) — $15-$20/mo. Best learning value. Best signal-per-dollar under $200K MRR.
What does the DIY Python script actually do?
The DIY script loops through your prompt list, fires each prompt against the OpenAI, Anthropic, and Gemini APIs (plus Perplexity if you have an API key), extracts brand mentions from each response using regex or a small classifier prompt, and writes the result to a Google Sheet or CSV. Total runtime is 15-20 minutes per weekly cycle for 100 prompts × 4 engines. API costs run roughly $15-$20/month at typical buyer-intent prompt lengths.
- Inputs: prompts.csv (your 40-60 prompts), competitors.csv (your brand + 5-10 competitor names), config (API keys).
- Loop: for each prompt × each engine, call the chat completion API with web-search enabled where supported.
- Extract: regex-match your brand name + competitor names in each response. Use a follow-up LLM call to classify position and sentiment.
- Log: append a row to a Google Sheet (via gspread) or CSV with prompt, engine, date, mentions, position, sentiment, cited URLs.
- Schedule: cron job on a $5/mo VPS or a GitHub Actions workflow on a weekly cadence.
- Dashboard: simple pivot in Sheets, or Looker Studio on top of the same dataset.
What is the cost breakdown of running the DIY stack monthly?
- OpenAI API (gpt-4o with web search) — ~$4-6/mo for 100 prompts/week.
- Anthropic API (Claude 3.7 Sonnet) — ~$3-5/mo for 100 prompts/week.
- Gemini API (1.5 Pro) — ~$2-4/mo (cheapest of the three).
- Perplexity API — ~$3-5/mo if included.
- Hosting (VPS or GitHub Actions) — $0-5/mo.
- Google Sheets / Looker Studio — free.
- Total: ~$15-25/mo for the full stack.
What dashboards should you build on top of the raw data?
Three dashboards cover most of the operational value. The first is citation share over time — a weekly line chart of (prompts where you appear) / (total prompts), split by engine. The second is competitor citation share for the same prompt set, so you can see whether you are gaining ground or simply riding category-wide growth. The third is source attribution — which third-party platforms are driving your citations, ranked by frequency. The third dashboard is the one that informs next moves: if 60% of your citations come from Reddit, double down on Reddit; if 40% come from Clutch, prioritize the Clutch playbook.
How do you compare DIY data to paid-tool data for sanity-checking?
Run a 2-week side-by-side: take 20 of your prompts, run them through your DIY script and through a free trial of Otterly or Peec AI. Compare the brand-mention extraction and citation share. In practice the two approaches agree on the large majority of prompts — the disagreements are almost always edge cases (sentiment classification, ambiguous mentions) that a human eyeballing the data can resolve faster than either system.
Upgrade to a paid platform when one of three conditions is true: your prompt list exceeds 200, you have multiple stakeholders who need dashboards, or you cross $200K MRR and the engineering opportunity cost outweighs the subscription. We cover the upgrade decision inside the AEO complete guide, and you can book a strategy call to walk through your specific setup. Engagement pricing on our pricing page.
Where does this fit in your stack?
If you're running a US service business, the playbook in this post pairs with our full services lineup and applies cleanly across our supported industries and US locations. If you want help implementing it, book a free strategy call — we'll review your current setup and prioritize the next three moves.
For the deeper engagement details, see our SEO service. New to the terminology here? Our SEO & marketing glossary defines every acronym in this post.
What are the most common questions about this topic?
Common questions readers send us about this topic.
Do I need to track all 4 engines or just ChatGPT?
All 4 (ChatGPT, Claude, Gemini, Perplexity). They differ enough in source selection that ChatGPT-only data understates your real coverage. Adding Claude and Gemini takes maybe 15 extra minutes per weekly cycle in the DIY setup.
How often should I re-baseline my prompt list?
Every 90 days. Buyer language shifts, new competitors emerge, and your service mix evolves. Stale prompt lists produce stale measurement. Refresh 10-15% of prompts each quarter.
Can I use a free tier of any of these LLM APIs?
Gemini's free tier is generous enough for 100 prompts/week of light usage. OpenAI and Anthropic do not offer meaningful free tiers for sustained automated use. Budget $15-$25/mo for the paid APIs.
Does Perplexity disclose source URLs in the API response?
Yes. Perplexity's API returns inline citations with source URLs, which makes it the easiest engine to track for source attribution. Use Perplexity's source data as a leading indicator of which third-party platforms are driving your citations.
How accurate is regex brand-mention extraction?
Roughly 90-95% accurate if your brand name is distinctive. For generic brand names ('Apex', 'Vertex'), use a follow-up LLM classification call to disambiguate. The classification call adds about $0.001 per prompt — negligible cost for a meaningful accuracy gain.
What is the single most important metric to track?
Citation share within your priority prompt set: (prompts where you are mentioned) / (total tracked prompts), measured weekly across all 4 engines. Everything else (position, sentiment, source attribution) is supporting detail.
Can I export DIY data into a paid tool later if I scale up?
Yes. AthenaHQ, Profound, and Otterly all accept CSV imports for historical prompt data. Starting DIY does not lock you out of the paid tier later — you bring your baseline with you and get faster onboarding.
About Foundgrove
The Foundgrove team
Foundgrove helps US service businesses win qualified leads from search and AI. We write about the practical, measurable side of acquisition — what works in production, not what looks good in a conference deck.
Related reading
Other tactical pieces from the Foundgrove blog.
- AEO · 15 min read
Answer Engine Optimization (AEO): Complete 2026 Guide
Customers ask ChatGPT and Claude before Google. Here's the AEO playbook to get cited inside the AI assistants that now pick the winners.
Read the aeo playbook → - AEO · 8 min read
How to Get Recommended by ChatGPT in 2026
ChatGPT shapes the shortlist before your prospect calls. Here's the source-selection logic and the 5 prompts that reveal if you're in the answer.
Read the aeo playbook → - AEO · 7 min read
How to Get Cited by Claude in 2026 Buyer Recommendations
Claude leans heavily on listicles and favors .edu and trade publishers. Here's the conservative selection logic and prompts that show your standing.
Read the aeo playbook → - AEO · 7 min read
Reddit AEO: Build Brand Mentions Without Getting Banned
Reddit appears in 92.8% of AI-search opportunities — and most brands get banned within 30 days trying to use it. Here's the operator playbook that works.
Read the aeo playbook →
Want help applying this to your business?
Book a free 30-minute call. We'll review your current acquisition stack and show you the three highest-leverage moves for your industry and state. Or read how our SEO service works.