Google Ads Scraper
Pricing
$2.00 / 1,000 results
Google Ads Scraper
See what competitors advertise on Google. Per ad: title, body, CTA, landing URL, thumbnail, YouTube preview, regions, platforms, days-active. Sort by days-active to find their winners. Search by keyword, domain or advertiser ID β strict word-boundary match, no `nikey` for `nike`.
Pricing
$2.00 / 1,000 results
Rating
0.0
(0)
Developer
VortexData
Actor stats
0
Bookmarked
5
Total users
3
Monthly active users
a day ago
Last modified
Categories
Share
π― Google Ads Scraper β Transparency Center Intelligence
See what your competitors are advertising on Google. Visible ad text, landing URL, thumbnail, video URL, lifecycle stats, regions, platforms β for every active or historical ad of any advertiser in Google's Ads Transparency Center.
Search by keyword (nike), domain (shopify.com), advertiser ID (ARβ¦) or paste a Transparency-Center URL. Get back a clean per-ad dataset plus a per-advertiser summary report.
β‘ 120 ads/sec listing Β· 25 ads/sec full enrichment Β· ~$0.06 per 1,000 ads
β¨ Why this scraper
π Keyword search that doesn't lie β nike matches Nike, Inc. and Nike SRL, but not nikey / Nikena. Strict word-boundary rule, no fuzzy ranking | |
π Decoded ad text β title, body, CTA, landing URL extracted from Google's preview JS (Pikmin Bloom β ["Pikmin Bloom", "Enjoy your Pikmin's company wherever you go", "Install"] + play.google.com/...) | |
| πΊ All Google surfaces β YouTube, Search, Display, Maps, Shopping, Discover, Gmail, Play. Filter by surface | |
π Per-advertiser summary report β separate dataset with longestRunningCreativeId, topRegions, topPlatforms, format mix per advertiser | |
| π Speaks Google's internal RPC API directly β no browser, no DOM parsing, just JSON over HTTPS | |
π― Sparse output β fields with no data are omitted entirely. No null/[] clutter, table view stays clean | |
| β‘ Skip-details mode runs ~4Γ cheaper when you only need IDs + dates | |
π‘οΈ Stealthy by default β Chrome TLS fingerprint via curl_cffi, fresh proxy session per request | |
π§ Smart input dedup β nike + shop.nike.com + ?query=nike resolve to one advertiser, processed once |
π¦ What you get
The actor produces two datasets.
π Default dataset β one row per ad
{"advertiserName": "Niantic, Inc.","advertiserId": "AR08888592736429539329","creativeId": "CR01129991885394280449","format": "VIDEO","adTransparencyUrl": "https://adstransparency.google.com/advertiser/AR.../creative/CR...","isActive": true,"firstShown": "2026-03-31","lastShown": "2026-05-04","daysActive": 33,"regionsCount": 3,"regions": ["Australia", "Canada", "United States"],"preview": "https://i1.ytimg.com/vi/1izIj43UMaQ/hqdefault.jpg","media": ["https://i1.ytimg.com/vi/1izIj43UMaQ/hqdefault.jpg","https://www.youtube.com/watch?v=1izIj43UMaQ"]}
| Section | Fields | Notes |
|---|---|---|
| π Identity | advertiserName, advertiserId, creativeId, format, adTransparencyUrl | Stable IDs + ad type (VIDEO / IMAGE / TEXT) + click-through to view on Google |
| β±οΈ Lifecycle | isActive, firstShown, lastShown, daysActive | Dates as YYYY-MM-DD (UTC). isActive = lastShown within last 7 days. Sort by daysActive desc to find longest-running winners |
| π Reach | regionsCount, regions[] | Country names sorted alphabetically. regionsCount is handy for sort/filter (e.g. "ads that ran in 5+ countries") |
| πΌοΈ Media | preview, media[] | preview = single best image URL. The dataset_schema.json declares it as format: "image" so Apify Console renders it inline as a thumbnail in the table β open the πΌοΈ Visual preview view for fast visual scan. media[] is the unified bucket: every image URL plus the YouTube watch URL when the ad is YouTube-hosted. Signed googlevideo.com streams and 1Γ1 placeholder pixels (dot.gif) are deliberately filtered out. |
Sparse output. Fields with no data (
null,[],{}) are omitted entirely.format,isActive,regionsCountare always kept so filters/sorts work uniformly.
π Quick start
Search by keyword or domain
{"searchTargets": ["nike", "shopify.com", "tesla"],"maxAdvertisersPerQuery": 5,"resultsLimit": 100}
nike matches advertiser names containing the word nike (Nike, Inc., Nike SRL, Nike Lee) β not nikey or Nikena. shopify.com strips the TLD and searches shopify. Per-keyword cap defaults to 5 advertisers.
Track a specific advertiser
{"searchTargets": ["https://adstransparency.google.com/advertiser/AR08888592736429539329"],"resultsLimit": 100}
Or just the bare ID: ["AR08888592736429539329"].
Filter by region + format + platform + date
{"searchTargets": ["AR08888592736429539329"],"filterRegion": "DE","filterFormat": "VIDEO","filterPlatform": "YOUTUBE","timeRangePreset": "LAST_30_DAYS"}
Speed mode β listing only, ~4Γ cheaper
{"searchTargets": ["AR08888592736429539329"],"resultsLimit": 1000,"skipDetails": true}
Returns IDs + format + dates + advertiser name. Skips per-creative decoding (no textLines, landingUrl, preview.video).
βοΈ Input reference
π― What to scrape
| Field | Type | Default | Description |
|---|---|---|---|
searchTargets β | string[] | required | Mix of keywords (nike), domains (nike.com), advertiser IDs (ARβ¦) or Transparency URLs |
π Limits
| Field | Type | Default | Description |
|---|---|---|---|
resultsLimit | int | 50 | Max ads per advertiser. 0 = unlimited |
maxAdvertisersPerQuery | int | 5 | When the target is a keyword/domain, fetch ads from at most this many matched advertisers |
π Filters
| Field | Type | Default | Description |
|---|---|---|---|
filterRegion | string | ALL | ISO-2 code (US, DE, FR, JP, β¦) or ALL |
filterFormat | enum | ALL | ALL / TEXT / IMAGE / VIDEO |
filterPlatform | enum | ALL | ALL / YOUTUBE / SEARCH / DISPLAY / MAPS / SHOPPING / DISCOVER / GMAIL / PLAY |
timeRangePreset | enum | ALL_TIME | LAST_7_DAYS / LAST_30_DAYS / LAST_90_DAYS / THIS_YEAR |
customStartDate | string | β | YYYY-MM-DD, overrides preset |
customEndDate | string | β | YYYY-MM-DD, overrides preset |
URL-embedded filters (?region=US&format=VIDEO&platform=YouTube) override globals per target.
β‘ Speed mode
| Field | Type | Default | Description |
|---|---|---|---|
skipDetails β‘ | bool | false | Listing only (IDs, format, dates, advertiser name). Skips per-creative decoding. ~4Γ faster, ~4Γ cheaper |
βοΈ Network
| Field | Type | Default | Description |
|---|---|---|---|
maxConcurrency | int | 10 | Parallel RPC calls (5β20 sweet spot) |
proxyConfiguration | object | Residential | Default. Google rate-limits Datacenter IP ranges aggressively, especially under heavy parallel load. Switch to Datacenter only for small batches (β€10 ads) where you want lower per-request cost |
How keyword search works
Google's typeahead returns name-prefix matches: a search for nike surfaces Nike Inc. along with nikey, Nikena, etc. We apply a strict word-boundary filter (\bnike\b regex, case-insensitive Unicode):
| Advertiser name | Match? |
|---|---|
Nike, Inc. | β β nike is a complete word |
Nike SRL | β |
Nike Lee | β |
Just Do It Nike | β |
nikey | β β nike runs into y, no boundary |
Nikena | β β nike runs into na, no boundary |
No fuzzy ranking, no relevance scoring β strictly matches or doesn't. If you actually want nikey, search for nikey exactly.
Domains (nike.com, shop.nike.com, amazon.co.uk) get the brand part extracted (nike, nike, amazon) and run through the same filter.
Same advertiser from multiple inputs? nike + shop.nike.com + ?query=nike all resolve to the same advertiser. We dedupe before fetching β you pay once.
Heuristic limits
textLines[] is the visible text from the iframe in document order. For most ads it reads like [brand, body, CTA] (e.g. ["Pikmin Bloom", "Enjoy your...", "Install"]), but some HTML5-bundle formats serve text we can't parse β in that case textLines is omitted, but preview.thumbnail, preview.video, adTransparencyUrl, landingUrl and lifecycle fields still work.
We deliberately don't auto-split into title / body / cta columns β Google's templates put text in unpredictable order and any auto-split is wrong for some fraction of ads. If you need a single-column title for analysis, textLines[0] is usually the brand name.
π Performance & cost
Measured on the live API:
| Workload | Time | Throughput | Cost (Datacenter proxy) |
|---|---|---|---|
| 200 ads, listing only | 1.7 s | 120 ads/sec | $0.005 |
| 50 ads, full enrichment | 2.0 s | 25 ads/sec | $0.013 |
| 1,000 ads, full enrichment | 40 s | 25 ads/sec | $0.06 |
10,000 ads, listing only (skipDetails: true) | 80 s | 125 ads/sec | $0.05 |
That's ~16,000 fully-enriched ads per Apify compute hour, or ~430,000 ads in listing-only mode.
π‘ Use cases
- Competitive intelligence β track every ad your rivals run, daily
- Creative benchmarking β sort
daysActivedesc, copy what's been running for 6+ months (= what works) - Market research β quantify ad activity per country, format, platform, time period
- Brand monitoring β detect impersonators or unauthorized resellers using your name
- Media planning β see which Google surfaces (YouTube vs Search vs Maps) competitors prioritize
- AI / data pipelines β feed structured ad data into LLMs for clustering, summarization, trend analysis
π€ Output access
Two views available in Apify Console (Storage β Dataset):
| View | Best for |
|---|---|
| πΌοΈ Visual preview | Thumbnails rendered inline β fast visual scan, advertiser, format, days active, regions count, click-through to Google |
| π Full data | Every column. Use for export or when you need every field. Sort by daysActive desc to find longest-running winners |
Export to JSON / CSV / Excel / XML with one click. Or pull via API:
from apify_client import ApifyClientclient = ApifyClient("YOUR_TOKEN")run = client.actor("YOUR_USERNAME/google-ads-scraper").call(run_input={"searchTargets": ["AR08888592736429539329"],"resultsLimit": 200,})# Per-creative records (default dataset)for ad in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{ad['advertiserName']} {ad['daysActive']}d {ad.get('textLines', [None])[0]}")# Per-advertiser summaries (named dataset)summaries = client.dataset_collection().get_or_create(name="advertisers")for s in client.dataset(summaries["id"]).iterate_items():print(f"{s['advertiserName']}: {s['stats']['creativesFetched']} ads, "f"longest-running {s['stats']['longestRunningDays']} days")
ποΈ Under the hood
Speaks Google's internal anji RPC API β the same backend the Transparency Center web app uses:
| Operation | Endpoint |
|---|---|
| Keyword β advertiser | POST /anji/_/rpc/SearchService/SearchSuggestions |
| Advertiser β creatives | POST /anji/_/rpc/SearchService/SearchCreatives |
| Advertiser metadata | POST /anji/_/rpc/LookupService/GetAdvertiserById |
| Creative details | POST /anji/_/rpc/LookupService/GetCreativeById |
Engineering choices:
curl_cffi.AsyncSessionwithimpersonate="chrome"β real Chrome TLS / JA3 fingerprint- Fresh proxy session per HTTP call β
uuid4()session IDs, every request from a different IP - Three independent retry budgets β
cookies/search/lookupeach retry 3Γ with exponential backoff - 15-second hard timeout on every individual call
- Per-run preview-URL dedup β Google often serves the same ad under 4-7 cache-bust URLs; we fetch once
- Asyncio semaphore-bounded concurrency β controlled parallelism, no thundering herd
- Word-boundary keyword filter in pure regex β predictable, deterministic, no ranking heuristics
β FAQ
Q: Is this legal? Yes. Google's Ads Transparency Center exposes this data publicly under the EU Digital Services Act and similar transparency commitments. No login required, no private data, no ToS violation. Fair game for competitive analysis, research, journalism, brand monitoring.
Q: Why are some impressions {min: 0, max: 1000}?
Google reports impressions in bucketed ranges. When a creative had < 1,000 impressions, that's the lowest bucket Google exposes. Matches what you see on transparency.google.com directly.
Q: Why is platforms empty for some creatives?
App-install ads and certain dynamic creatives don't expose per-platform breakdown in the API. We omit the field rather than emit []. Both we and competitor scrapers face the same Google limitation.
Q: Can I get email / phone / address of advertisers? No β and no other scraper can either. Google deliberately excludes contact info from the Transparency API.
Q: Why does keyword shopify.com return Shopify instead of all .com advertisers?
We extract the brand part (shopify) from the domain and search by word-boundary. So shopify.com β‘ shopify. Use a different keyword if you want a different advertiser.
Q: Why does nike not return nikey?
Word-boundary rule (\bnike\b). nikey doesn't have nike as a complete word β it's a substring. If you want nikey, search for nikey exactly. Logs show what was dropped.
Q: How fast can I scrape one advertiser's full ad library? ~25 ads/sec with full enrichment, ~120 ads/sec listing-only. A typical advertiser with 500 active ads completes in 20β40 seconds.
Q: What about rate limits? Each RPC call goes out on a fresh proxy session ID, so each request hits a different IP. Datacenter proxy is fine for most use; switch to Residential if you see persistent 429s.
β οΈ Limitations
- Per-region Γ per-platform impression cross-tab β we report
regions[]+platforms[]+ overallimpressionsseparately, not the cross-tab. If you need "Bulgaria Γ YouTube β 1000-2000 impressions", let us know. - Asset URLs (
googlevideo.com/videoplayback?...) are signed and expire after a few hours βpreview.videomay 404 if you fetch the URL hours later. Re-run if you need fresh links. - Some HTML5-bundle ad templates don't expose decodable text β
textLinesis omitted for those. Thepreviewblock (thumbnail/video) andadTransparencyUrlstill work; click through to see the ad on Google. - The internal
f.reqJSPB schema is reverse-engineered and may shift if Google changes their protocol β we follow upstream.
π Disclaimer
All data extracted by this actor is publicly available through Google's official Ads Transparency Center. This actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Google or Alphabet Inc. Use responsibly and respect rate limits.