Google Ads Scraper avatar

Google Ads Scraper

Pricing

$2.00 / 1,000 results

Go to Apify Store
Google Ads Scraper

Google Ads Scraper

See what competitors advertise on Google. Per ad: title, body, CTA, landing URL, thumbnail, YouTube preview, regions, platforms, days-active. Sort by days-active to find their winners. Search by keyword, domain or advertiser ID β€” strict word-boundary match, no `nikey` for `nike`.

Pricing

$2.00 / 1,000 results

Rating

0.0

(0)

Developer

VortexData

VortexData

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

3

Monthly active users

a day ago

Last modified

Share

🎯 Google Ads Scraper β€” Transparency Center Intelligence

See what your competitors are advertising on Google. Visible ad text, landing URL, thumbnail, video URL, lifecycle stats, regions, platforms β€” for every active or historical ad of any advertiser in Google's Ads Transparency Center.

Search by keyword (nike), domain (shopify.com), advertiser ID (AR…) or paste a Transparency-Center URL. Get back a clean per-ad dataset plus a per-advertiser summary report.

⚑ 120 ads/sec listing · 25 ads/sec full enrichment · ~$0.06 per 1,000 ads


✨ Why this scraper

πŸ” Keyword search that doesn't lie β€” nike matches Nike, Inc. and Nike SRL, but not nikey / Nikena. Strict word-boundary rule, no fuzzy ranking
πŸ“ Decoded ad text β€” title, body, CTA, landing URL extracted from Google's preview JS (Pikmin Bloom β†’ ["Pikmin Bloom", "Enjoy your Pikmin's company wherever you go", "Install"] + play.google.com/...)
πŸ“Ί All Google surfaces β€” YouTube, Search, Display, Maps, Shopping, Discover, Gmail, Play. Filter by surface
πŸ“Š Per-advertiser summary report β€” separate dataset with longestRunningCreativeId, topRegions, topPlatforms, format mix per advertiser
πŸš€ Speaks Google's internal RPC API directly β€” no browser, no DOM parsing, just JSON over HTTPS
🎯 Sparse output β€” fields with no data are omitted entirely. No null/[] clutter, table view stays clean
⚑ Skip-details mode runs ~4Γ— cheaper when you only need IDs + dates
πŸ›‘οΈ Stealthy by default β€” Chrome TLS fingerprint via curl_cffi, fresh proxy session per request
🧠 Smart input dedup β€” nike + shop.nike.com + ?query=nike resolve to one advertiser, processed once

πŸ“¦ What you get

The actor produces two datasets.

πŸ“‹ Default dataset β€” one row per ad

{
"advertiserName": "Niantic, Inc.",
"advertiserId": "AR08888592736429539329",
"creativeId": "CR01129991885394280449",
"format": "VIDEO",
"adTransparencyUrl": "https://adstransparency.google.com/advertiser/AR.../creative/CR...",
"isActive": true,
"firstShown": "2026-03-31",
"lastShown": "2026-05-04",
"daysActive": 33,
"regionsCount": 3,
"regions": ["Australia", "Canada", "United States"],
"preview": "https://i1.ytimg.com/vi/1izIj43UMaQ/hqdefault.jpg",
"media": [
"https://i1.ytimg.com/vi/1izIj43UMaQ/hqdefault.jpg",
"https://www.youtube.com/watch?v=1izIj43UMaQ"
]
}
SectionFieldsNotes
πŸ†” IdentityadvertiserName, advertiserId, creativeId, format, adTransparencyUrlStable IDs + ad type (VIDEO / IMAGE / TEXT) + click-through to view on Google
⏱️ LifecycleisActive, firstShown, lastShown, daysActiveDates as YYYY-MM-DD (UTC). isActive = lastShown within last 7 days. Sort by daysActive desc to find longest-running winners
🌍 ReachregionsCount, regions[]Country names sorted alphabetically. regionsCount is handy for sort/filter (e.g. "ads that ran in 5+ countries")
πŸ–ΌοΈ Mediapreview, media[]preview = single best image URL. The dataset_schema.json declares it as format: "image" so Apify Console renders it inline as a thumbnail in the table β€” open the πŸ–ΌοΈ Visual preview view for fast visual scan. media[] is the unified bucket: every image URL plus the YouTube watch URL when the ad is YouTube-hosted. Signed googlevideo.com streams and 1Γ—1 placeholder pixels (dot.gif) are deliberately filtered out.

Sparse output. Fields with no data (null, [], {}) are omitted entirely. format, isActive, regionsCount are always kept so filters/sorts work uniformly.


πŸš€ Quick start

Search by keyword or domain

{
"searchTargets": ["nike", "shopify.com", "tesla"],
"maxAdvertisersPerQuery": 5,
"resultsLimit": 100
}

nike matches advertiser names containing the word nike (Nike, Inc., Nike SRL, Nike Lee) β€” not nikey or Nikena. shopify.com strips the TLD and searches shopify. Per-keyword cap defaults to 5 advertisers.

Track a specific advertiser

{
"searchTargets": [
"https://adstransparency.google.com/advertiser/AR08888592736429539329"
],
"resultsLimit": 100
}

Or just the bare ID: ["AR08888592736429539329"].

Filter by region + format + platform + date

{
"searchTargets": ["AR08888592736429539329"],
"filterRegion": "DE",
"filterFormat": "VIDEO",
"filterPlatform": "YOUTUBE",
"timeRangePreset": "LAST_30_DAYS"
}

Speed mode β€” listing only, ~4Γ— cheaper

{
"searchTargets": ["AR08888592736429539329"],
"resultsLimit": 1000,
"skipDetails": true
}

Returns IDs + format + dates + advertiser name. Skips per-creative decoding (no textLines, landingUrl, preview.video).


βš™οΈ Input reference

🎯 What to scrape

FieldTypeDefaultDescription
searchTargets ⭐string[]requiredMix of keywords (nike), domains (nike.com), advertiser IDs (AR…) or Transparency URLs

πŸ“Š Limits

FieldTypeDefaultDescription
resultsLimitint50Max ads per advertiser. 0 = unlimited
maxAdvertisersPerQueryint5When the target is a keyword/domain, fetch ads from at most this many matched advertisers

πŸ”Ž Filters

FieldTypeDefaultDescription
filterRegionstringALLISO-2 code (US, DE, FR, JP, …) or ALL
filterFormatenumALLALL / TEXT / IMAGE / VIDEO
filterPlatformenumALLALL / YOUTUBE / SEARCH / DISPLAY / MAPS / SHOPPING / DISCOVER / GMAIL / PLAY
timeRangePresetenumALL_TIMELAST_7_DAYS / LAST_30_DAYS / LAST_90_DAYS / THIS_YEAR
customStartDatestringβ€”YYYY-MM-DD, overrides preset
customEndDatestringβ€”YYYY-MM-DD, overrides preset

URL-embedded filters (?region=US&format=VIDEO&platform=YouTube) override globals per target.

⚑ Speed mode

FieldTypeDefaultDescription
skipDetails ⚑boolfalseListing only (IDs, format, dates, advertiser name). Skips per-creative decoding. ~4Γ— faster, ~4Γ— cheaper

βš™οΈ Network

FieldTypeDefaultDescription
maxConcurrencyint10Parallel RPC calls (5–20 sweet spot)
proxyConfigurationobjectResidentialDefault. Google rate-limits Datacenter IP ranges aggressively, especially under heavy parallel load. Switch to Datacenter only for small batches (≀10 ads) where you want lower per-request cost

How keyword search works

Google's typeahead returns name-prefix matches: a search for nike surfaces Nike Inc. along with nikey, Nikena, etc. We apply a strict word-boundary filter (\bnike\b regex, case-insensitive Unicode):

Advertiser nameMatch?
Nike, Inc.βœ“ β€” nike is a complete word
Nike SRLβœ“
Nike Leeβœ“
Just Do It Nikeβœ“
nikeyβœ— β€” nike runs into y, no boundary
Nikenaβœ— β€” nike runs into na, no boundary

No fuzzy ranking, no relevance scoring β€” strictly matches or doesn't. If you actually want nikey, search for nikey exactly.

Domains (nike.com, shop.nike.com, amazon.co.uk) get the brand part extracted (nike, nike, amazon) and run through the same filter.

Same advertiser from multiple inputs? nike + shop.nike.com + ?query=nike all resolve to the same advertiser. We dedupe before fetching β€” you pay once.

Heuristic limits

textLines[] is the visible text from the iframe in document order. For most ads it reads like [brand, body, CTA] (e.g. ["Pikmin Bloom", "Enjoy your...", "Install"]), but some HTML5-bundle formats serve text we can't parse β€” in that case textLines is omitted, but preview.thumbnail, preview.video, adTransparencyUrl, landingUrl and lifecycle fields still work.

We deliberately don't auto-split into title / body / cta columns β€” Google's templates put text in unpredictable order and any auto-split is wrong for some fraction of ads. If you need a single-column title for analysis, textLines[0] is usually the brand name.


πŸ“Š Performance & cost

Measured on the live API:

WorkloadTimeThroughputCost (Datacenter proxy)
200 ads, listing only1.7 s120 ads/sec$0.005
50 ads, full enrichment2.0 s25 ads/sec$0.013
1,000 ads, full enrichment40 s25 ads/sec$0.06
10,000 ads, listing only (skipDetails: true)80 s125 ads/sec$0.05

That's ~16,000 fully-enriched ads per Apify compute hour, or ~430,000 ads in listing-only mode.


πŸ’‘ Use cases

  • Competitive intelligence β€” track every ad your rivals run, daily
  • Creative benchmarking β€” sort daysActive desc, copy what's been running for 6+ months (= what works)
  • Market research β€” quantify ad activity per country, format, platform, time period
  • Brand monitoring β€” detect impersonators or unauthorized resellers using your name
  • Media planning β€” see which Google surfaces (YouTube vs Search vs Maps) competitors prioritize
  • AI / data pipelines β€” feed structured ad data into LLMs for clustering, summarization, trend analysis

πŸ“€ Output access

Two views available in Apify Console (Storage β†’ Dataset):

ViewBest for
πŸ–ΌοΈ Visual previewThumbnails rendered inline β€” fast visual scan, advertiser, format, days active, regions count, click-through to Google
πŸ“‹ Full dataEvery column. Use for export or when you need every field. Sort by daysActive desc to find longest-running winners

Export to JSON / CSV / Excel / XML with one click. Or pull via API:

from apify_client import ApifyClient
client = ApifyClient("YOUR_TOKEN")
run = client.actor("YOUR_USERNAME/google-ads-scraper").call(run_input={
"searchTargets": ["AR08888592736429539329"],
"resultsLimit": 200,
})
# Per-creative records (default dataset)
for ad in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{ad['advertiserName']} {ad['daysActive']}d {ad.get('textLines', [None])[0]}")
# Per-advertiser summaries (named dataset)
summaries = client.dataset_collection().get_or_create(name="advertisers")
for s in client.dataset(summaries["id"]).iterate_items():
print(f"{s['advertiserName']}: {s['stats']['creativesFetched']} ads, "
f"longest-running {s['stats']['longestRunningDays']} days")

πŸ—οΈ Under the hood

Speaks Google's internal anji RPC API β€” the same backend the Transparency Center web app uses:

OperationEndpoint
Keyword β†’ advertiserPOST /anji/_/rpc/SearchService/SearchSuggestions
Advertiser β†’ creativesPOST /anji/_/rpc/SearchService/SearchCreatives
Advertiser metadataPOST /anji/_/rpc/LookupService/GetAdvertiserById
Creative detailsPOST /anji/_/rpc/LookupService/GetCreativeById

Engineering choices:

  • curl_cffi.AsyncSession with impersonate="chrome" β€” real Chrome TLS / JA3 fingerprint
  • Fresh proxy session per HTTP call β€” uuid4() session IDs, every request from a different IP
  • Three independent retry budgets β€” cookies / search / lookup each retry 3Γ— with exponential backoff
  • 15-second hard timeout on every individual call
  • Per-run preview-URL dedup β€” Google often serves the same ad under 4-7 cache-bust URLs; we fetch once
  • Asyncio semaphore-bounded concurrency β€” controlled parallelism, no thundering herd
  • Word-boundary keyword filter in pure regex β€” predictable, deterministic, no ranking heuristics

❓ FAQ

Q: Is this legal? Yes. Google's Ads Transparency Center exposes this data publicly under the EU Digital Services Act and similar transparency commitments. No login required, no private data, no ToS violation. Fair game for competitive analysis, research, journalism, brand monitoring.

Q: Why are some impressions {min: 0, max: 1000}? Google reports impressions in bucketed ranges. When a creative had < 1,000 impressions, that's the lowest bucket Google exposes. Matches what you see on transparency.google.com directly.

Q: Why is platforms empty for some creatives? App-install ads and certain dynamic creatives don't expose per-platform breakdown in the API. We omit the field rather than emit []. Both we and competitor scrapers face the same Google limitation.

Q: Can I get email / phone / address of advertisers? No β€” and no other scraper can either. Google deliberately excludes contact info from the Transparency API.

Q: Why does keyword shopify.com return Shopify instead of all .com advertisers? We extract the brand part (shopify) from the domain and search by word-boundary. So shopify.com ≑ shopify. Use a different keyword if you want a different advertiser.

Q: Why does nike not return nikey? Word-boundary rule (\bnike\b). nikey doesn't have nike as a complete word β€” it's a substring. If you want nikey, search for nikey exactly. Logs show what was dropped.

Q: How fast can I scrape one advertiser's full ad library? ~25 ads/sec with full enrichment, ~120 ads/sec listing-only. A typical advertiser with 500 active ads completes in 20–40 seconds.

Q: What about rate limits? Each RPC call goes out on a fresh proxy session ID, so each request hits a different IP. Datacenter proxy is fine for most use; switch to Residential if you see persistent 429s.


⚠️ Limitations

  • Per-region Γ— per-platform impression cross-tab β€” we report regions[] + platforms[] + overall impressions separately, not the cross-tab. If you need "Bulgaria Γ— YouTube β†’ 1000-2000 impressions", let us know.
  • Asset URLs (googlevideo.com/videoplayback?...) are signed and expire after a few hours β€” preview.video may 404 if you fetch the URL hours later. Re-run if you need fresh links.
  • Some HTML5-bundle ad templates don't expose decodable text β€” textLines is omitted for those. The preview block (thumbnail/video) and adTransparencyUrl still work; click through to see the ad on Google.
  • The internal f.req JSPB schema is reverse-engineered and may shift if Google changes their protocol β€” we follow upstream.

πŸ“œ Disclaimer

All data extracted by this actor is publicly available through Google's official Ads Transparency Center. This actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Google or Alphabet Inc. Use responsibly and respect rate limits.