Google Ads scraper
Under maintenancePricing
$1.00 / 1,000 result items
Google Ads scraper
Under maintenanceGoogle Ads Scraper extracts structured ad data from the Google Ads Transparency Center for advertiser and domain searches. It returns advertiser names, creative IDs, ad formats, first/last seen dates, detail URLs, and preview links for competitor research, ad monitoring, and market analysis.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
kane liu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Google Ads Transparency Center Scraper
Scrape Google Ads Transparency Center creatives from Apify by advertiser or domain.
This Actor is intentionally narrow and honest:
- it supports only
advertiseranddomaintargets; - it is VN-first in its published schema and verified smoke coverage;
- it returns stable search-level fields first, then adds best-effort preview/detail enrichment;
- it does not claim production-hardening against every Google anti-bot, region, or ad-format variant.
If you need a Store-facing summary in one sentence: this Actor is useful today for Google Ads Transparency Center lookup workflows where stable creative IDs, advertiser names, preview URLs, and basic preview-derived fields are enough, and deeper enrichment is treated as opportunistic.
What this Actor is good for
Use it when you want to:
- collect creatives for a known advertiser in Google Ads Transparency Center;
- collect creatives for a known domain;
- monitor stable identifiers such as
advertiserId,advertiserName,creativeId,detailUrl, andformatCode; - keep the original creative/detail payload when available via
rawCreativeJson/rawAdvertiserJson; - extract lightweight preview content such as
headline,channelName,destinationUrl, orvisibleUrlwhen the preview parser can resolve them.
What it does not promise
This repository should not be read as a claim that:
- all countries are equally verified or equally stable;
- all ad formats yield the same structured preview fields;
GetCreativeByIdalways returns a useful payload;- advertiser-name suggestion resolution is fully production-verified;
- proxies, retries, browser warmup, and anti-bot handling are already production-grade.
Current real capability, without hype
Supported search modes
advertiserdomain
Region scope
- Input validation accepts any two-letter country code.
- The live client currently contains region-to-collection mappings for:
AU,BR,CA,DE,FR,GB,IN,JP,PH,SG,US,VN. - Published positioning and verified smoke coverage remain VN-centered. The current Store-facing schema and validation artifacts are still tuned to the VN-verified path, so this Actor should still be presented as a VN-validated MVP, not as a fully verified global scraper.
Important implementation limits
These are real current limits of the codebase:
targetsis an array in the input schema, but the current runtime processes only the first target per run.includeCreativeDetails=trueis best-effort. The field exists, the lookup is wired, but the returned payload quality is not uniform.includeVariantContent=trueis best-effort. Some previews yield good headline/url extraction; others only expose partial text or nothing structured.includeRawJson=truedoes not currently populate every raw field uniformly:rawCreativeJsoncan be populated whenGetCreativeByIdreturns a payload;rawAdvertiserJsoncan be populated when advertiser lookup runs and succeeds;rawSearchCreativesJsonis present in the schema but is currently written asnullby the implementation.
Verified status
Verified locally in deterministic fixture mode
The fixture path is the most reliable, closed-loop validation path in this repo.
It is locally validated to:
- parse fixture
SearchCreativesresponses; - optionally merge fixture
GetCreativeByIdpayloads; - optionally parse fixture variant content;
- write dataset items to Apify local storage;
- write
RUN_SUMMARY.
Verified in real mode
There are two different truths to keep separate:
- This repository contains prior successful real advertiser artifacts for:
region=VNsearchMode=advertisersearchValue=AR16735076323512287233
- The current local rerun in this environment is not guaranteed to pass every time. On this machine, the latest live smoke attempt hit Google rate limiting (
429 Too Many Requests), so live mode must still be treated as fragile.
Beyond the default advertiser smoke, repository notes also record one manually verified domain path:
region=VNsearchMode=domainsearchValue=chatgpt.com
GetCreativeById / rawCreativeJson status
The old README text saying live GetCreativeById basically returns {} is now too absolute.
The accurate boundary is:
- the Actor already stores
rawCreativeJsonwhenGetCreativeByIdreturns something usable; - recent repository-local advertiser artifacts still show cases where
rawCreativeJsonis{}; - separately verified remote advertiser runs have already shown non-empty
rawCreativeJson; - therefore
GetCreativeByIdis real but unstable, andrawCreativeJsonshould be treated as opportunistic enrichment, not a guaranteed output field.
Preview enrichment status
The current main enrichment path is:
SearchCreativessearch rows, plus- preview
content.jsfetching/parsing, plus - inline preview HTML parsing when the row does not expose a
previewUrl.
What is genuinely verified here:
- search-level fields are the most stable output layer;
previewUrlcan be present and useful, but it is nullable;- inline preview HTML can supply enrichment when
previewUrlis absent; - some live preview payloads yield useful fields like
headline,channelName,destinationUrl, andvisibleUrl.
What is not guaranteed:
- every ad format exposes a preview payload the current parser understands;
- every preview will yield
headlineordescription; - preview parsing is stable across all regions and format families.
Input contract
This section documents the current input contract as implemented by input_schema.json and the runtime.
Input
Top-level input fields supported by input_schema.json:
| Field | Type | Notes |
|---|---|---|
runMode | fixture | real | fixture is deterministic and fully testable; real hits live Google endpoints. |
region | string | Defaults to VN. Any two-letter code passes input validation, but verified/store-safe positioning remains VN-first. |
targets | array | Required. Current runtime only processes the first target item. |
maxCreativesPerTarget | integer | 1-100. |
includeCreativeDetails | boolean | Enables best-effort GetCreativeById lookup. |
includeVariantContent | boolean | Enables best-effort preview/variant parsing. |
includeRawJson | boolean | Keeps raw detail payloads when actually fetched/successful. |
fixtures | object | Optional in fixture mode. If omitted, the Actor auto-loads its built-in deterministic smoke fixtures. |
proxyConfiguration | object | Apify proxy editor-compatible config. |
session | object | Optional cookies/headers/user agent for live mode. |
rpcIds | object | Listed in schema for overrides, but the live runtime currently uses rpcPaths internally. Treat as advanced/internal. |
baseUrl | string | Defaults to https://adstransparency.google.com. |
requestTimeoutSecs | number | Defaults to 30. |
debug | boolean | Extra logging. |
keepDebugArtifacts | boolean | Keep intermediate artifacts. |
Target object
{"searchMode": "domain","searchValue": "example.com","targetLabel": "Example domain"}
Notes:
searchModemust bedomainoradvertiser.searchValuemust be a non-empty string.- For advertiser runs, the most reliable input is a known advertiser ID such as
AR16735076323512287233. - The client also contains advertiser-suggestion resolution for non-ID advertiser queries, but that path is not the primary verified smoke path.
Example inputs
1) Fixture mode
{"runMode": "fixture","region": "VN","targets": [{"searchMode": "domain","searchValue": "example.com","targetLabel": "Fixture smoke"}],"maxCreativesPerTarget": 2,"includeCreativeDetails": true,"includeVariantContent": true,"fixtures": {"searchCreatives": "fixtures/rpc/search_creatives_domain_vn.json","creativeById": {"crt-900": "fixtures/rpc/get_creative_by_id_crt-900.json","crt-901": "fixtures/rpc/get_creative_by_id_crt-901.json"},"advertiserById": {"adv-100": "fixtures/rpc/get_advertiser_by_id_adv-100.json","adv-101": "fixtures/rpc/get_advertiser_by_id_adv-101.json"}}}
2) Real advertiser smoke
{"runMode": "real","region": "VN","maxCreativesPerTarget": 1,"targets": [{"searchMode": "advertiser","searchValue": "AR16735076323512287233","targetLabel": "Nike live advertiser smoke"}],"includeCreativeDetails": true,"includeVariantContent": true}
3) Real domain smoke
{"runMode": "real","region": "VN","maxCreativesPerTarget": 2,"targets": [{"searchMode": "domain","searchValue": "chatgpt.com","targetLabel": "ChatGPT live domain smoke"}],"includeCreativeDetails": true,"includeVariantContent": true}
Output contract
This section documents the current published dataset contract and the known schema/runtime nuances.
Output
The published dataset model is defined in .actor/dataset_schema.json.
Stable output fields
These are the most dependable fields in current practice:
runIdregionsearchModesearchValueadvertiserIdadvertiserNamecreativeIddetailUrlpreviewUrl(required key, nullable value)formatCodescrapedAt
Best-effort enrichment fields
These may be present, null, partial, or format-dependent:
firstShownRawlastShownRawresultCountLowerresultCountUppervariantCountheadlinedescriptiondestinationUrlvisibleUrlredirectUrlvideoIdchannelNamechannelIconrawCreativeJsonrawAdvertiserJsonvariantParseSourcewarnings
Known schema/implementation nuance
rawSearchCreativesJsonexists in the dataset schema but is currently emitted asnullby the code.
Example output
This example is based on a real advertiser artifact already present in artifacts/:
{"runId": "local-run","region": "VN","searchMode": "advertiser","searchValue": "AR16735076323512287233","targetLabel": "Nike live advertiser smoke","advertiserId": "AR16735076323512287233","advertiserName": "Nike, Inc.","creativeId": "CR01003845904182018049","detailUrl": "https://adstransparency.google.com/advertiser/AR16735076323512287233/creative/CR01003845904182018049","previewUrl": "https://displayads-formats.googleusercontent.com/ads/preview/content.js?...","formatCode": 2,"headline": "FC Barcelona Academy Pro Home Nike Men's Dri-FIT Soccer Pre-Match Short-Sleeve Top in Blue, Size: Large | HJ7142-456","channelName": "Nike","rawCreativeJson": {},"variantParseSource": "https://displayads-formats.googleusercontent.com/ads/preview/content.js?...","warnings": []}
That example shows the right interpretation of current output quality:
- search-level fields are solid;
- preview-derived
headlinemay be available; rawCreativeJsonmay still be{}even in a successful live run.
Project layout
.actor/actor.jsondataset_schema.jsonfixtures/rpc/smoke/src/parsers/google_ads_client.pyhttp_client.pyinput_model.pymain.pynormalizers.pyruntime.pystorage.pytests/README.mdinput_schema.json
Local validation
The repository ships with explicit validation scripts.
Recommended commands:
python3 -m pytest -qpython3 apify_actor_static_audit.py --repo-root . --config static_audit_config.jsonpython3 apify_actor_dynamic_test.py --repo-root . --config dynamic_test_config.jsonpython3 apify_actor_quality_gate.py --repo-root . --config quality_gate_config.jsonpython3 apify_actor_dynamic_test.py --repo-root . --config dynamic_test_real_config.json
What those checks mean:
pytest -q: unit/integration coverage around the parser/runtime layer.static_audit: required files and metadata exist.dynamic_test_config.json: fixture-mode closed loop must write dataset +RUN_SUMMARY.quality_gate: validates the fixture dataset against expected quality requirements.dynamic_test_real_config.json: live smoke against the current real advertiser path. This is useful, but may fail due to rate limiting or anti-bot behavior.
Troubleshooting
Live run fails with HTTP 429
This is currently the most important real-world failure mode.
What it means:
- Google rate-limited the request path;
- the Actor is not yet robust enough to guarantee success without better session/proxy handling.
What to try:
- use Apify residential proxy settings in
proxyConfiguration; - provide better session cookies / headers / user agent in
session; - reduce request frequency and rerun later;
- keep
maxCreativesPerTargetlow for smoke runs.
previewUrl is null
This can be valid.
Some search rows expose inline preview HTML instead of a separate preview URL. Treat previewUrl as nullable and inspect:
headlinedestinationUrlvisibleUrlvariantParseSourcewarnings
rawCreativeJson is {} or null
This does not automatically mean the run failed.
Interpretation:
- the search path still may have succeeded;
- preview enrichment still may have succeeded;
GetCreativeByIdsimply did not return a richer payload in that run.
advertiserName is missing or advertiser raw payload is null
Advertiser lookup is currently secondary to search-row output. If Google blocks or returns sparse advertiser-detail payloads, the Actor falls back to row-level fields when possible.
Store-positioning summary
If you publish this on Apify Store, the accurate positioning is:
- a Google Ads Transparency Center scraper;
- focused on advertiser/domain lookups;
- VN-validated MVP with fixture-backed tests and a real advertiser smoke path;
- useful stable metadata extraction plus best-effort preview/detail enrichment;
- not yet a claim of globally stable, production-hardened scraping across all regions and formats.