Google Ads scraper avatar

Google Ads scraper

Under maintenance

Pricing

$1.00 / 1,000 result items

Go to Apify Store
Google Ads scraper

Google Ads scraper

Under maintenance

Google Ads Scraper extracts structured ad data from the Google Ads Transparency Center for advertiser and domain searches. It returns advertiser names, creative IDs, ad formats, first/last seen dates, detail URLs, and preview links for competitor research, ad monitoring, and market analysis.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

kane liu

kane liu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Google Ads Transparency Center Scraper

Scrape Google Ads Transparency Center creatives from Apify by advertiser or domain.

This Actor is intentionally narrow and honest:

  • it supports only advertiser and domain targets;
  • it is VN-first in its published schema and verified smoke coverage;
  • it returns stable search-level fields first, then adds best-effort preview/detail enrichment;
  • it does not claim production-hardening against every Google anti-bot, region, or ad-format variant.

If you need a Store-facing summary in one sentence: this Actor is useful today for Google Ads Transparency Center lookup workflows where stable creative IDs, advertiser names, preview URLs, and basic preview-derived fields are enough, and deeper enrichment is treated as opportunistic.

What this Actor is good for

Use it when you want to:

  • collect creatives for a known advertiser in Google Ads Transparency Center;
  • collect creatives for a known domain;
  • monitor stable identifiers such as advertiserId, advertiserName, creativeId, detailUrl, and formatCode;
  • keep the original creative/detail payload when available via rawCreativeJson / rawAdvertiserJson;
  • extract lightweight preview content such as headline, channelName, destinationUrl, or visibleUrl when the preview parser can resolve them.

What it does not promise

This repository should not be read as a claim that:

  • all countries are equally verified or equally stable;
  • all ad formats yield the same structured preview fields;
  • GetCreativeById always returns a useful payload;
  • advertiser-name suggestion resolution is fully production-verified;
  • proxies, retries, browser warmup, and anti-bot handling are already production-grade.

Current real capability, without hype

Supported search modes

  • advertiser
  • domain

Region scope

  • Input validation accepts any two-letter country code.
  • The live client currently contains region-to-collection mappings for: AU, BR, CA, DE, FR, GB, IN, JP, PH, SG, US, VN.
  • Published positioning and verified smoke coverage remain VN-centered. The current Store-facing schema and validation artifacts are still tuned to the VN-verified path, so this Actor should still be presented as a VN-validated MVP, not as a fully verified global scraper.

Important implementation limits

These are real current limits of the codebase:

  • targets is an array in the input schema, but the current runtime processes only the first target per run.
  • includeCreativeDetails=true is best-effort. The field exists, the lookup is wired, but the returned payload quality is not uniform.
  • includeVariantContent=true is best-effort. Some previews yield good headline/url extraction; others only expose partial text or nothing structured.
  • includeRawJson=true does not currently populate every raw field uniformly:
    • rawCreativeJson can be populated when GetCreativeById returns a payload;
    • rawAdvertiserJson can be populated when advertiser lookup runs and succeeds;
    • rawSearchCreativesJson is present in the schema but is currently written as null by the implementation.

Verified status

Verified locally in deterministic fixture mode

The fixture path is the most reliable, closed-loop validation path in this repo.

It is locally validated to:

  • parse fixture SearchCreatives responses;
  • optionally merge fixture GetCreativeById payloads;
  • optionally parse fixture variant content;
  • write dataset items to Apify local storage;
  • write RUN_SUMMARY.

Verified in real mode

There are two different truths to keep separate:

  1. This repository contains prior successful real advertiser artifacts for:
    • region=VN
    • searchMode=advertiser
    • searchValue=AR16735076323512287233
  2. The current local rerun in this environment is not guaranteed to pass every time. On this machine, the latest live smoke attempt hit Google rate limiting (429 Too Many Requests), so live mode must still be treated as fragile.

Beyond the default advertiser smoke, repository notes also record one manually verified domain path:

  • region=VN
  • searchMode=domain
  • searchValue=chatgpt.com

GetCreativeById / rawCreativeJson status

The old README text saying live GetCreativeById basically returns {} is now too absolute.

The accurate boundary is:

  • the Actor already stores rawCreativeJson when GetCreativeById returns something usable;
  • recent repository-local advertiser artifacts still show cases where rawCreativeJson is {};
  • separately verified remote advertiser runs have already shown non-empty rawCreativeJson;
  • therefore GetCreativeById is real but unstable, and rawCreativeJson should be treated as opportunistic enrichment, not a guaranteed output field.

Preview enrichment status

The current main enrichment path is:

  • SearchCreatives search rows, plus
  • preview content.js fetching/parsing, plus
  • inline preview HTML parsing when the row does not expose a previewUrl.

What is genuinely verified here:

  • search-level fields are the most stable output layer;
  • previewUrl can be present and useful, but it is nullable;
  • inline preview HTML can supply enrichment when previewUrl is absent;
  • some live preview payloads yield useful fields like headline, channelName, destinationUrl, and visibleUrl.

What is not guaranteed:

  • every ad format exposes a preview payload the current parser understands;
  • every preview will yield headline or description;
  • preview parsing is stable across all regions and format families.

Input contract

This section documents the current input contract as implemented by input_schema.json and the runtime.

Input

Top-level input fields supported by input_schema.json:

FieldTypeNotes
runModefixture | realfixture is deterministic and fully testable; real hits live Google endpoints.
regionstringDefaults to VN. Any two-letter code passes input validation, but verified/store-safe positioning remains VN-first.
targetsarrayRequired. Current runtime only processes the first target item.
maxCreativesPerTargetinteger1-100.
includeCreativeDetailsbooleanEnables best-effort GetCreativeById lookup.
includeVariantContentbooleanEnables best-effort preview/variant parsing.
includeRawJsonbooleanKeeps raw detail payloads when actually fetched/successful.
fixturesobjectOptional in fixture mode. If omitted, the Actor auto-loads its built-in deterministic smoke fixtures.
proxyConfigurationobjectApify proxy editor-compatible config.
sessionobjectOptional cookies/headers/user agent for live mode.
rpcIdsobjectListed in schema for overrides, but the live runtime currently uses rpcPaths internally. Treat as advanced/internal.
baseUrlstringDefaults to https://adstransparency.google.com.
requestTimeoutSecsnumberDefaults to 30.
debugbooleanExtra logging.
keepDebugArtifactsbooleanKeep intermediate artifacts.

Target object

{
"searchMode": "domain",
"searchValue": "example.com",
"targetLabel": "Example domain"
}

Notes:

  • searchMode must be domain or advertiser.
  • searchValue must be a non-empty string.
  • For advertiser runs, the most reliable input is a known advertiser ID such as AR16735076323512287233.
  • The client also contains advertiser-suggestion resolution for non-ID advertiser queries, but that path is not the primary verified smoke path.

Example inputs

1) Fixture mode

{
"runMode": "fixture",
"region": "VN",
"targets": [
{
"searchMode": "domain",
"searchValue": "example.com",
"targetLabel": "Fixture smoke"
}
],
"maxCreativesPerTarget": 2,
"includeCreativeDetails": true,
"includeVariantContent": true,
"fixtures": {
"searchCreatives": "fixtures/rpc/search_creatives_domain_vn.json",
"creativeById": {
"crt-900": "fixtures/rpc/get_creative_by_id_crt-900.json",
"crt-901": "fixtures/rpc/get_creative_by_id_crt-901.json"
},
"advertiserById": {
"adv-100": "fixtures/rpc/get_advertiser_by_id_adv-100.json",
"adv-101": "fixtures/rpc/get_advertiser_by_id_adv-101.json"
}
}
}

2) Real advertiser smoke

{
"runMode": "real",
"region": "VN",
"maxCreativesPerTarget": 1,
"targets": [
{
"searchMode": "advertiser",
"searchValue": "AR16735076323512287233",
"targetLabel": "Nike live advertiser smoke"
}
],
"includeCreativeDetails": true,
"includeVariantContent": true
}

3) Real domain smoke

{
"runMode": "real",
"region": "VN",
"maxCreativesPerTarget": 2,
"targets": [
{
"searchMode": "domain",
"searchValue": "chatgpt.com",
"targetLabel": "ChatGPT live domain smoke"
}
],
"includeCreativeDetails": true,
"includeVariantContent": true
}

Output contract

This section documents the current published dataset contract and the known schema/runtime nuances.

Output

The published dataset model is defined in .actor/dataset_schema.json.

Stable output fields

These are the most dependable fields in current practice:

  • runId
  • region
  • searchMode
  • searchValue
  • advertiserId
  • advertiserName
  • creativeId
  • detailUrl
  • previewUrl (required key, nullable value)
  • formatCode
  • scrapedAt

Best-effort enrichment fields

These may be present, null, partial, or format-dependent:

  • firstShownRaw
  • lastShownRaw
  • resultCountLower
  • resultCountUpper
  • variantCount
  • headline
  • description
  • destinationUrl
  • visibleUrl
  • redirectUrl
  • videoId
  • channelName
  • channelIcon
  • rawCreativeJson
  • rawAdvertiserJson
  • variantParseSource
  • warnings

Known schema/implementation nuance

  • rawSearchCreativesJson exists in the dataset schema but is currently emitted as null by the code.

Example output

This example is based on a real advertiser artifact already present in artifacts/:

{
"runId": "local-run",
"region": "VN",
"searchMode": "advertiser",
"searchValue": "AR16735076323512287233",
"targetLabel": "Nike live advertiser smoke",
"advertiserId": "AR16735076323512287233",
"advertiserName": "Nike, Inc.",
"creativeId": "CR01003845904182018049",
"detailUrl": "https://adstransparency.google.com/advertiser/AR16735076323512287233/creative/CR01003845904182018049",
"previewUrl": "https://displayads-formats.googleusercontent.com/ads/preview/content.js?...",
"formatCode": 2,
"headline": "FC Barcelona Academy Pro Home Nike Men's Dri-FIT Soccer Pre-Match Short-Sleeve Top in Blue, Size: Large | HJ7142-456",
"channelName": "Nike",
"rawCreativeJson": {},
"variantParseSource": "https://displayads-formats.googleusercontent.com/ads/preview/content.js?...",
"warnings": []
}

That example shows the right interpretation of current output quality:

  • search-level fields are solid;
  • preview-derived headline may be available;
  • rawCreativeJson may still be {} even in a successful live run.

Project layout

.actor/
actor.json
dataset_schema.json
fixtures/
rpc/
smoke/
src/
parsers/
google_ads_client.py
http_client.py
input_model.py
main.py
normalizers.py
runtime.py
storage.py
tests/
README.md
input_schema.json

Local validation

The repository ships with explicit validation scripts.

Recommended commands:

python3 -m pytest -q
python3 apify_actor_static_audit.py --repo-root . --config static_audit_config.json
python3 apify_actor_dynamic_test.py --repo-root . --config dynamic_test_config.json
python3 apify_actor_quality_gate.py --repo-root . --config quality_gate_config.json
python3 apify_actor_dynamic_test.py --repo-root . --config dynamic_test_real_config.json

What those checks mean:

  • pytest -q: unit/integration coverage around the parser/runtime layer.
  • static_audit: required files and metadata exist.
  • dynamic_test_config.json: fixture-mode closed loop must write dataset + RUN_SUMMARY.
  • quality_gate: validates the fixture dataset against expected quality requirements.
  • dynamic_test_real_config.json: live smoke against the current real advertiser path. This is useful, but may fail due to rate limiting or anti-bot behavior.

Troubleshooting

Live run fails with HTTP 429

This is currently the most important real-world failure mode.

What it means:

  • Google rate-limited the request path;
  • the Actor is not yet robust enough to guarantee success without better session/proxy handling.

What to try:

  • use Apify residential proxy settings in proxyConfiguration;
  • provide better session cookies / headers / user agent in session;
  • reduce request frequency and rerun later;
  • keep maxCreativesPerTarget low for smoke runs.

previewUrl is null

This can be valid.

Some search rows expose inline preview HTML instead of a separate preview URL. Treat previewUrl as nullable and inspect:

  • headline
  • destinationUrl
  • visibleUrl
  • variantParseSource
  • warnings

rawCreativeJson is {} or null

This does not automatically mean the run failed.

Interpretation:

  • the search path still may have succeeded;
  • preview enrichment still may have succeeded;
  • GetCreativeById simply did not return a richer payload in that run.

advertiserName is missing or advertiser raw payload is null

Advertiser lookup is currently secondary to search-row output. If Google blocks or returns sparse advertiser-detail payloads, the Actor falls back to row-level fields when possible.

Store-positioning summary

If you publish this on Apify Store, the accurate positioning is:

  • a Google Ads Transparency Center scraper;
  • focused on advertiser/domain lookups;
  • VN-validated MVP with fixture-backed tests and a real advertiser smoke path;
  • useful stable metadata extraction plus best-effort preview/detail enrichment;
  • not yet a claim of globally stable, production-hardened scraping across all regions and formats.