Pricing

Pay per event

Google AI Overview Citation Scraper

Scrape which domains Google's AI Overview cites for your target queries — one row per (query × cited source) with position, snippet, and selector telemetry — export to JSON or CSV. AEO / generative-SEO data for 2026, with captcha-aware retry and Pydantic-validated output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

6 days ago

Last modified

Google AI Overview Tracker — Citation Scraper

▶️ Full tutorial on YouTube

▶️ 45-second demo on YouTube

We do the dirty work so your dataset stays clean. 😈

$5.50 / 1,000 (query x citation) rows. Track which domains Google's AI Overview cites for your target queries. Answer Engine Optimization (AEO) is a fast-growing SEO discipline in 2026 — AI Overview now surfaces above the organic results for many informational searches, and there is no first-party API for citation attribution. Pass in a list of queries; we handle the browser fingerprinting, proxy rotation, consent dialogs, and the retry loop — and you get one clean Pydantic-validated row per (query x cited source). Pay only for results that land. No credit card required to try.

🎯 What this scrapes

For each query you pass in, this Actor:

Opens https://www.google.com/search?q=<query>&hl=<language>&gl=<country> in a fresh Camoufox page.
Dismisses any EU consent dialog automatically.
Waits 4–15 seconds for the AI Overview block to lazy-render.
Probes an 8-selector priority battery to find the AI Overview container, recording which selector matched on every row — so you can detect Google rotating their markup with a single GROUP BY selector_used query.
Extracts every citation link inside the carousel: URL, registrable domain, anchor text, and 1-based position.
Emits one row per citation, or — when AI Overview did not appear — a single marker row with ai_overview_appeared=false. Absence of AI Overview is itself a meaningful AEO data point.

Output fields:

Field	Type	Description
`query`	string	The query the row was produced from
`country`	string	ISO-3166 alpha-2 country code (gl=)
`language`	string	ISO-639-1 language code (hl=)
`ai_overview_appeared`	boolean	True when an AI Overview block was rendered
`ai_overview_text_excerpt`	string \| null	First 200 chars of the AI Overview body
`citation_position`	integer \| null	1-based position in the citation carousel
`source_domain`	string \| null	Registrable domain (e.g. `imf.org`)
`source_url`	string \| null	Full https:// URL Google rendered
`source_title`	string \| null	Anchor text Google rendered
`selector_used`	string \| null	Which selector matched — drift telemetry
`scraped_at`	string	ISO 8601 UTC timestamp

🔥 Features

Camoufox-rendered, not headless Chromium — we use a Firefox fork with anti-detection patches. Standard Playwright and Selenium emit fingerprints that Google's defences pick off instantly; Camoufox is the only browser we allow for scraping (ADR-0002).
We rotate residential proxies on every block — fresh session_id, fresh exit IP via Apify Proxy RESIDENTIAL. You never fight for bandwidth against someone else's CAPTCHA loop.
We handle CAPTCHA interstitials — when Google serves the sorry/index reCAPTCHA page, we rotate the proxy session and retry before emitting a marker row. The run never silently returns an empty dataset.
8-selector priority battery with drift telemetry — Google has rotated the AI Overview DOM label at least three times since launch. We maintain an ordered selector list, probe all eight on each page, and record the winner in selector_used so you can chart selector drift over time.
We retry with exponential backoff on network failures and rate-limit responses. Up to five attempts per query before we surface a partial-success status.
Per-query session isolation — fresh proxy session and fresh browser page per query. Cookies and rate-limit state from one query never bleed into the next.
Pydantic v2 input + output validation — invalid input fails fast before any browser starts; row schema is enforced at push time. You get typed columns, not a bag of string fields.
Pay-Per-Event pricing — $0.05 start + $0.005 per row. No data, no charge beyond the warm-up fee.

💡 Use cases

AEO dashboard — schedule a weekly run for your 50 highest-priority queries; chart source_domain share-of-citation over time alongside ai_overview_appeared rate. Detect when AI Overview starts citing a new competitor in your space.
Pre-launch content gap analysis — feed in the queries you want to rank for, see which domains Google currently cites, and target outreach to publishers in the cite list rather than chasing pure backlink volume.
Brand citation monitoring — does AI Overview cite your domain for queries where your brand is the answer? Most brands have zero instrumentation here today; this is the direct way to find out.
Competitive intelligence — track exactly which 3–5 sources Google's generative system trusts for each of your category's head queries; compare to traditional SERP rank.
Selector drift monitoring — the selector_used column is a leading indicator of Google rotating AI Overview markup. Useful for SaaS observability of generative search behaviour even if you don't mine the citation data directly.
Localised AEO — pair country=us and country=gb runs over the same query list to detect locale-specific citation behaviour.

⚙️ How to use it

Open the Actor input form.
Paste your Search queries (1–50). Informational queries (what is, how to, best X 2026) have the highest AI Overview trigger rate.
(Optional) Set Country and Language (default us / en).
(Optional) Cap with Max queries per run (default 25).
(Optional) Raise Wait after DOM ready to 12000–15000 ms for slow proxy exits.
Pick a Proxy — leave the default RESIDENTIAL. We fall back to the BUYPROXIES94952 group automatically when residential is unavailable on your plan.
Click Start. Rows stream into the default dataset as each query completes.

Quick examples

Two informational queries, default settings (the QA fixture):

{
  "queries": ["best running shoes 2026", "what causes inflation"],
  "country": "us",
  "language": "en",
  "maxQueries": 2,
  "waitMsAfterLoad": 8000,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

UK English with a longer wait window:

{
  "queries": ["best mortgage rates", "how does pension lump sum tax work"],
  "country": "gb",
  "language": "en",
  "maxQueries": 10,
  "waitMsAfterLoad": 12000,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

📥 Input

Field	Type	Required	Default	Description
`queries`	array of strings	yes	—	1–50 search queries to probe
`country`	string	no	`"us"`	ISO-3166 alpha-2 (lowercase); maps to `gl=`
`language`	string	no	`"en"`	ISO-639-1 (lowercase); maps to `hl=`
`maxQueries`	integer	no	25	Hard cap per run (1–50)
`waitMsAfterLoad`	integer	no	8000	ms after DOMContentLoaded (4000–15000)
`proxyConfiguration`	object	yes	RESIDENTIAL	Apify Proxy config — required

📤 Output

One JSON row per (query x citation), or a single marker row when AI Overview did not appear for that query. Example citation row:

{
  "query": "what causes inflation",
  "country": "us",
  "language": "en",
  "ai_overview_appeared": true,
  "ai_overview_text_excerpt": "Inflation is caused by a combination of demand-pull factors, cost-push factors...",
  "citation_position": 1,
  "source_domain": "imf.org",
  "source_url": "https://www.imf.org/en/Publications/fandd/issues/Series/Back-to-Basics/Inflation",
  "source_title": "Inflation: Prices on the Rise",
  "selector_used": "div[aria-label=\"AI Overview\"]",
  "scraped_at": "2026-05-16T20:50:00.000Z"
}

Example no-AI-Overview marker row:

{
  "query": "spinach recipe",
  "country": "us",
  "language": "en",
  "ai_overview_appeared": false,
  "ai_overview_text_excerpt": null,
  "citation_position": null,
  "source_domain": null,
  "source_url": null,
  "source_title": null,
  "selector_used": null,
  "scraped_at": "2026-05-16T20:50:00.000Z"
}

💰 Pricing

Pay-Per-Event:

Event	Price (USD)	Trigger
`actor-start`	$0.05	Once per run when the Actor begins
`result-row`	$0.005	Per dataset row pushed

Worked example: a 50-query run with a ~30% AI Overview hit rate and ~4 citations per hit yields roughly 50 * 0.7 + 50 * 0.3 * 4 = 95 rows. Charge = $0.05 + 95 * $0.005 = $0.525. That's ~$5.50 per 1,000 rows — priced above commodity SERP scrapers because cross-domain citation attribution is not available from any first-party source.

🚧 Limitations

AI Overview triggers on ~30% of queries today. Queries that look transactional, navigational, or trademark-heavy will mostly produce ai_overview_appeared=false marker rows. That absence data is still valuable — and you're charged the same per row either way.
v0.1 is English-tuned. The text-based selector fallback looks for the literal string AI Overview. Non-English locales (e.g. gl=de) may emit false negatives on the fallback path. The CSS selector battery is locale-agnostic.
Apify Proxy is required. Google blocks datacenter IPs without proxy enrichment. The Actor fails fast at startup with a clear status message when no proxy group is reachable.
Mobile SERP is out of scope. Mobile AI Overview uses a different DOM structure; a separate Actor variant is planned.
No following citation links. This Actor records the cited URL but does not visit it. Pair with a downstream HTTP scraper when you need destination content.

❓ FAQ

Q: What is a Google AI Overview tracker and why do I need one? Google AI Overview is the AI-generated summary block that appears at the top of ~30% of Google searches and cites 3–8 external sources. There is no official API for citation attribution. An AI overview scraper like this one gives SEO teams and content strategists raw data on which domains Google's generative engine trusts — the same data the major SEO platforms are now billing $300–1,500/mo to approximate.

Q: How is this different from Ahrefs or Semrush's AEO features? Those platforms track your own domain's citation appearances. This Actor lets you track any domain across any query list — useful for competitor analysis, category audits, and building your own AEO share-of-voice metrics. You own the raw dataset; no subscription lock-in.

Q: Why Camoufox instead of plain Playwright? Google's defences fingerprint headless Chromium via the WebDriver/CDP signature, predictable navigator properties, and missing iframes-API behaviour. We use Camoufox — a Firefox fork with those signals patched — because it is the only browser automation layer that survives Google's current detection stack. Plain Playwright would be blocked before the AI Overview block ever loads.

Q: Why one row per citation instead of one row per query with an array? Tabular tools (Sheets, BigQuery, Excel pivots) handle nested arrays poorly. Long-form rows let you GROUP BY source_domain directly. If you need wide-form output you can pivot in five lines of SQL.

Q: Why charge for marker rows when AI Overview didn't appear? ai_overview_appeared=false is a meaningful AEO signal — knowing which of your queries do not trigger AI Overview is half the dashboard. Fair per-row pricing keeps the Actor sustainable.

Q: Does this work on the Apify FREE tier? Partially. FREE tier has no RESIDENTIAL proxy group, so the Actor falls back to BUYPROXIES94952 (5 IPs). Google serves CAPTCHAs more often on those exit IPs; expect a higher marker-row rate. Paid Apify plans with RESIDENTIAL get cleaner runs.

Q: How do I detect Google rotating their AI Overview DOM? Filter the dataset by selector_used over time. When the highest-priority selector stops hitting and a lower-priority one starts winning, the markup has shifted — raise an issue on the Store listing and we'll add the new selector to the battery.

Q: What is the google SGE tracker use case? Google SGE (Search Generative Experience) was the predecessor name for what is now called Google AI Overview. If you were tracking SGE citations, this is the equivalent tool for the current AI Overview product.

💬 Your feedback

Hit a selector miss, a parser edge case, or a feature gap? Open an issue on the Apify Store listing and we'll respond within a week. Pull requests are welcome — the source is structured for easy selector-battery extension (see src/parser.py).

Google AI Overview Tracker & AI Visibility API

tugelbay/google-ai-overview-tracker

Google AI Overview tracker for AEO/GEO reports: AIO status, source domains, brand/domain visibility. Guide: https://konabayev.com/tools/google-ai-overview-tracker/?utm_source=apify_info&utm_medium=referral&utm_campaign=google-ai-overview-tracker

Tugelbay Konabayev

Google AI Overview Scraper: Extract AI Summaries & Sources

clearpath/google-ai-overview

Extract Google's AI Overview summaries, cited sources, and organic results for any search query. Works best with question-style searches. Supports 52 countries and up to 10 queries per run.

ClearPath

1.0

Naver AI Overview API | Korean AEO Monitoring

johnvc/naver-ai-overview-api

Track Naver's AI Overview answers for any query: get the AI-generated overview, its cited sources, and related media as structured JSON. Monitor whether your brand appears in Naver's AI answers. Built for Korean AEO and GEO monitoring. Pay per query, MCP-ready.

John

Google AI Overview API

johnvc/Google-AI-Overview-API

Fetch Google AI Overviews for any query - get the AI-generated answer and its cited sources as structured JSON. Send one or many queries, target a country and language, and handle Google's deferred (page-token) generation automatically. Pay per retrieval. MCP-ready for Claude and AI agents.

John

5.0

Google AI Overview Tracker

mark_ramos/google-ai-overview-tracker

Track Google AI Overviews (AIO) at scale. Returns the clean generative answer text and the deduped list of domains Google cited — built for GEO (Generative Engine Optimization), SEO, and brand monitoring teams. The first AIO-native Actor on Apify Store.

Mark

Google AI Overviews Scraper

khadinakbar/google-ai-overviews-scraper

Scrape Google AI Overview (AIO) for any query — get the AI-generated answer text, cited source domains/URLs/titles, citation positions, and optional brand-match detection. Multi-country, desktop/mobile, MCP-ready for AI agents. Hybrid Camoufox scrape + optional SerpApi BYOK fallback.

Khadin Akbar

Google AI Mode Scraper — Generative Answers

scrape.badger/google-ai-mode-scraper

Scrape Google AI Mode (generative answer responses from udm=50): structured text blocks, citations with links and titles. Accepts one or more queries. Ideal for building AI-vs-AI comparisons, answer-engine monitoring, and SEO research on how Google's AI summarises topics.

Scrape Badger

AEO Auditor 2026: AI Search Visibility Scanner

automationnation/aeo-auditor

Audit your AI search visibility for April 2026. Track Google AI Overviews (SGE), detect citation gaps, and monitor brand authority across generative search. Perfect for SEO agencies looking to prove ROI in a zero-click world. Supports bulk keyword auditing and competitor citation analysis.

Nathan Carter

AI Search Visibility Tracker — AEO & Citation Audit

khadinakbar/ai-search-visibility-tracker

Check if your domain gets cited by Perplexity, ChatGPT, Claude & Gemini. Tracks citation rank, content gaps & competing domains per keyword. AEO audit.

Khadin Akbar

GEO Citation Analyzer - AI Citation Source Finder

alizarin_refrigerator-owner/geo-citation-analyzer

Discover which sources get cited by AI systems (ChatGPT, Perplexity, Gemini) & understand what content structure patterns get referenced. Essential for Generative Engine Optimization (GEO). Unlike traditional SEO targets Google rankings, GEO focuses on making your content the source for AI systems