Google AI Overview Citation Tracker avatar

Google AI Overview Citation Tracker

Pricing

Pay per event

Go to Apify Store
Google AI Overview Citation Tracker

Google AI Overview Citation Tracker

Track which domains Google's AI Overview cites for your target queries. AEO / generative-SEO data for 2026 — one row per (query x cited source) with selector telemetry, captcha-aware retry, and Pydantic-validated output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Google AI Overview Citation Tracker

Google AI Overview Citation Tracker

We do the dirty work so your dataset stays clean. 😈

$5.50 / 1,000 (query x citation) rows. Track which domains Google's AI Overview cites for your target queries. Answer Engine Optimization (AEO) is the hottest SEO category of 2026 — AI Overview is now the first result for roughly 30% of informational searches, and there is no first-party API for citation attribution. This Actor renders Google SERPs with Camoufox (Firefox-fork anti-detection), parses the AI Overview block with an 8-selector fall-through battery, and emits one Pydantic-validated row per (query x cited source) so you can monitor share-of-citation the same way you monitor SERP rank.

🎯 What this scrapes

For each query you pass in, this Actor:

  1. Opens https://www.google.com/search?q=<query>&hl=<language>&gl=<country> in a fresh Camoufox page.
  2. Dismisses any EU consent dialog.
  3. Waits 4-15 seconds for the AI Overview block to lazy-render.
  4. Probes an 8-selector priority battery to find the AI Overview container, recording which selector hit so you can detect Google rotating their markup.
  5. Extracts every citation link inside the carousel (URL, registrable domain, anchor text, 1-based position).
  6. Emits one row per citation, or — when AI Overview did not appear — a single marker row with ai_overview_appeared=false. Both row shapes are still meaningful (proving the absence of AI Overview is a valid AEO data point).

Output fields:

FieldTypeDescription
querystringThe query the row was produced from
countrystringISO-3166 alpha-2 country code (gl=)
languagestringISO-639-1 language code (hl=)
ai_overview_appearedbooleanTrue when an AI Overview block was rendered
ai_overview_text_excerptstring | nullFirst 200 chars of the AI Overview body
citation_positioninteger | null1-based position in the citation carousel
source_domainstring | nullRegistrable domain (e.g. imf.org)
source_urlstring | nullFull https:// URL Google rendered
source_titlestring | nullAnchor text Google rendered
selector_usedstring | nullWhich selector matched — drift telemetry
scraped_atstringISO 8601 UTC timestamp

🔥 Features

  • Camoufox-rendered, not headless Chromium — Google blocks plain Playwright/Selenium fingerprints; Camoufox is a Firefox fork with anti-detection patches per ADR-0002.
  • 8-selector priority battery — Google rotates AI Overview labels every few months. This Actor probes the eight selectors we've seen historically and records which one matched on every row, so you can chart selector drift with a single GROUP BY selector_used query.
  • CAPTCHA-aware — when Google serves the sorry/index reCAPTCHA interstitial, the Actor rotates the proxy session and retries once before emitting a not-appeared marker row. The run never green-status-with-empty-dataset (fail-loud per ADR-0002).
  • Per-query session isolation — fresh proxy session ID and fresh Playwright page per query, so cookies and rate-limit state from one query don't poison the next.
  • Apify Proxy mandatory — RESIDENTIAL preferred (cleanest exit IPs), automatic fallback to BUYPROXIES94952 when residential is unavailable on your plan.
  • Pydantic v2 input + output validation — invalid input fails fast before any browser starts; row schema is enforced at push time.
  • Pay-Per-Event pricing — $0.05 start + $0.005 per row; no charges if zero rows scraped.
  • PPE-safe Actor.charge — regression-tested against the SDK 3.x idempotency_key trap that broke an earlier Actor on first publish.

💡 Use cases

  • AEO dashboard — schedule a weekly run for your 50 highest-priority queries; chart source_domain share-of-citation over time alongside ai_overview_appeared rate. Detect when AI Overview starts citing a new competitor in your space.
  • Pre-launch content gap analysis — feed in the queries you want to rank for, see which domains Google currently cites, and target outreach to publishers in the cite list rather than chasing pure backlink volume.
  • Brand monitoring — does AI Overview cite your domain for queries where your brand is the answer? Most brands have no instrumentation here today; this is the cheapest way to find out.
  • Competitive intelligence for AI Overview — track exactly which 3-5 sources Google's generative system trusts for each of your category's head queries; compare to traditional SERP rank.
  • Selector drift monitoring — even if you don't care about citations, the selector_used column is a leading indicator of Google rotating AI Overview markup. Useful for SaaS observability of generative search behaviour.
  • Localised AEO — pair country=us and country=gb runs over the same query list to detect locale-specific citation behaviour (English-only in v0.1; multi-locale on the roadmap).

⚙️ How to use it

  1. Open the Actor input form.
  2. Paste your Search queries (1-50). Informational queries (what is, how to, best X 2026) have the best AI-Overview trigger rate.
  3. (Optional) Set Country and Language (default us / en).
  4. (Optional) Cap with Max queries per run (default 25).
  5. (Optional) Raise Wait after DOM ready to 12000-15000 ms for slow proxy exits.
  6. Pick a Proxy — leave default RESIDENTIAL. The Actor will fall back to BUYPROXIES94952 automatically if residential is unavailable on your plan.
  7. Click Start. Rows stream into the default dataset.

Quick examples

Two informational queries, default settings (the QA fixture):

{
"queries": ["best running shoes 2026", "what causes inflation"],
"country": "us",
"language": "en",
"maxQueries": 2,
"waitMsAfterLoad": 8000,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

UK English with a longer wait window:

{
"queries": ["best mortgage rates", "how does pension lump sum tax work"],
"country": "gb",
"language": "en",
"maxQueries": 10,
"waitMsAfterLoad": 12000,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

📥 Input

FieldTypeRequiredDefaultDescription
queriesarray of stringsyes1-50 search queries to probe
countrystringno"us"ISO-3166 alpha-2 (lowercase); maps to gl=
languagestringno"en"ISO-639-1 (lowercase); maps to hl=
maxQueriesintegerno25Hard cap per run (1-50)
waitMsAfterLoadintegerno8000ms after DOMContentLoaded (4000-15000)
proxyConfigurationobjectyesRESIDENTIALApify Proxy config — required

📤 Output

One JSON row per (query x citation), or a single marker row when AI Overview did not appear for that query. Example citation row:

{
"query": "what causes inflation",
"country": "us",
"language": "en",
"ai_overview_appeared": true,
"ai_overview_text_excerpt": "Inflation is caused by a combination of demand-pull factors, cost-push factors...",
"citation_position": 1,
"source_domain": "imf.org",
"source_url": "https://www.imf.org/en/Publications/fandd/issues/Series/Back-to-Basics/Inflation",
"source_title": "Inflation: Prices on the Rise",
"selector_used": "div[aria-label=\"AI Overview\"]",
"scraped_at": "2026-05-16T20:50:00.000Z"
}

Example no-AI-Overview marker row:

{
"query": "spinach recipe",
"country": "us",
"language": "en",
"ai_overview_appeared": false,
"ai_overview_text_excerpt": null,
"citation_position": null,
"source_domain": null,
"source_url": null,
"source_title": null,
"selector_used": null,
"scraped_at": "2026-05-16T20:50:00.000Z"
}

💰 Pricing

Pay-Per-Event:

EventPrice (USD)Trigger
actor-start$0.05Once per run when the Actor begins
result-row$0.005Per dataset row pushed

Worked example: a 50-query run with a ~30% AI-Overview hit rate and ~4 citations per hit yields roughly 50 * 0.7 + 50 * 0.3 * 4 = 95 rows. Charge = $0.05 + 95 * $0.005 = $0.525. That's ~$5.50 per 1,000 rows — priced above commodity SERP scrapers because the citation data is essentially unobtainable elsewhere.

🚧 Limitations

  • AI Overview only triggers on ~30% of queries today. Queries that look transactional, navigational, or trademark-heavy will mostly produce ai_overview_appeared=false marker rows. That's still useful data — and you're charged the same per row either way.
  • v0.1 is English-tuned. The text-based selector fallback looks for the literal string AI Overview. Non-English locales (e.g. gl=de) may emit false negatives on the fallback path. The CSS selector battery is locale-agnostic.
  • Apify Proxy is required. Google hard-blocks Apify's datacenter IPs without proxy enrichment. The Actor fails fast at startup with a clear status message when no proxy group is reachable.
  • Mobile SERP is out of scope. Mobile AI Overview has a different DOM structure; a separate Actor variant is planned.
  • No following citation links. This Actor records the cited URL but does not visit it. Pair with a downstream Actor (e.g. an HTTP scraper) when you need destination content.

❓ FAQ

Q: Why Camoufox instead of plain Playwright? Google's anti-bot fingerprints headless Chromium via the WebDriver/CDP signature, predictable navigator properties, and missing iframes-API behaviour. Camoufox is a Firefox fork with those signals patched; it's the only browser our org allows for scraping (ADR-0002).

Q: Why one row per citation instead of one row per query with an array? Tabular tools (Sheets, BigQuery, Excel pivots) hate nested arrays. Long-form rows let you GROUP BY source_domain directly. If you need wide-form you can pivot client-side in <5 lines of SQL.

Q: Why charge for marker rows when AI Overview didn't appear? Because ai_overview_appeared=false is a meaningful AEO signal — knowing which of your queries don't trigger AI Overview is half the dashboard. Costing the data fairly keeps the Actor sustainable.

Q: Will this work on the Apify FREE tier? Partially. FREE tier has no RESIDENTIAL group, so the Actor falls back to BUYPROXIES94952 (5 IPs). Google often serves CAPTCHA on those datacenter IPs; expect a high marker-row rate. Paid Apify plans with RESIDENTIAL get clean runs.

Q: How do I detect Google rotating their AI Overview DOM? Filter the dataset by selector_used over time. When the highest-priority selector stops hitting and a lower one starts winning, raise an issue — we'll add the new one to the battery.

💬 Your feedback

Hit a selector miss, a parser edge case, or a feature gap? Open an issue on the Apify Store listing and we'll respond within a week. Pull requests welcome — the source is structured for easy selector-battery extension (see src/parser.py).