Google Ads Analyzer avatar
Google Ads Analyzer

Pricing

Pay per usage

Go to Apify Store
Google Ads Analyzer

Google Ads Analyzer

Extract ad data from Google Ads Transparency Center by domain. Three modes: FULL (basic data), OCR (AI text extraction from images - headlines, descriptions, URLs), and LITE (summary counts). Filter by date range and region. Perfect for competitor analysis and ad research.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Traffic Architect

Traffic Architect

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

7 days ago

Last modified

Share

Google Ads Transparency Scraper (Python)

🔎 What is this Actor?

This Apify Actor is a Python-based tool designed to scrape data from the Google Ads Transparency Center. It allows you to extract information about advertisers and their ads based on a single domain name (e.g., clay.com, apify.com) and flexible date range options.

📊 What Google ads data can I extract?

The Actor supports three run modes:

  • FULL Mode: Extracts detailed information for each ad creative WITHOUT OCR, including:
    • Ad metadata (advertiser ID, creative ID, format, dates)
    • Preview image URLs
    • Domain information
  • OCR Mode: Extracts detailed information for each ad creative WITH OCR text extraction, including:
    • All data from FULL mode
    • OCR text extraction from ad images (headline, description, click URL)
  • LITE Mode: Extracts a summary for each matched advertiser, providing counts of their ad creatives by format.

📖 How to use

⬇️ Input

The Actor expects a JSON input with the following fields:

  • keywords (string, required): A single domain name or Google Ads Transparency Center URL to search for ads. You can provide either:

    • Domain name: e.g., "clay.com" or "apify.com"
    • Full URL: e.g., "https://adstransparency.google.com/?region=anywhere&domain=clay.com"

    The actor will search for all ads associated with this domain.

  • runMode (string, optional, default: "FULL"):

    • "FULL": Fetches detailed ad information for each creative WITHOUT OCR.
    • "OCR": Fetches detailed ad information for each creative WITH OCR text extraction.
    • "LITE": Fetches only counts of ad creatives per format for each matched advertiser.
  • dateRangePreset (string, optional, default: "ANYTIME"): Controls the date range for fetching ads. Options:

    • "ANYTIME": No date filtering.
    • "LAST_7_DAYS": Ads shown in the last 7 days.
    • "LAST_30_DAYS": Ads shown in the last 30 days.
    • "CUSTOM_RANGE": Specify a custom date range using customStartDate and customEndDate.
  • customStartDate (string, optional, format: YYYY-MM-DD): The start date for a custom range (e.g., "2023-01-01"). Used only if dateRangePreset is "CUSTOM_RANGE".

  • customEndDate (string, optional, format: YYYY-MM-DD): The end date for a custom range (e.g., "2023-01-31"). Used only if dateRangePreset is "CUSTOM_RANGE".

  • count (integer, optional, default: 10):

    • In FULL mode or OCR mode: The maximum number of ad creatives to retrieve for each domain within the specified date range.
    • In LITE mode: The maximum number of creative summaries to fetch for counting (default: 2000).
  • region (string, optional, default: "anywhere"): The region code to filter ads by (e.g., 'US', 'GB'). Use "anywhere" for no specific region.

  • proxyConfig (object, optional): Standard Apify proxy configuration.

Example Input (FULL Mode with URL):

{
"keywords": "https://adstransparency.google.com/?region=anywhere&domain=clay.com",
"runMode": "FULL",
"dateRangePreset": "LAST_30_DAYS",
"count": 5,
"region": "anywhere"
}

Example Input (FULL Mode with Domain Name):

{
"keywords": "apify.com",
"runMode": "FULL",
"dateRangePreset": "CUSTOM_RANGE",
"customStartDate": "2024-01-01",
"customEndDate": "2024-01-31",
"count": 5,
"region": "US"
}

Example Input (OCR Mode with OCR Text Extraction):

{
"keywords": "clay.com",
"runMode": "OCR",
"dateRangePreset": "LAST_7_DAYS",
"count": 10,
"region": "anywhere"
}

Example Input (LITE Mode, Last 7 Days):

{
"keywords": "clay.com",
"runMode": "LITE",
"dateRangePreset": "LAST_7_DAYS",
"region": "anywhere"
}

⬆️ Output

The extracted data is stored in the default Apify dataset. The structure of items in the dataset depends on the selected runMode.

Output Structure for runMode: "FULL" and runMode: "OCR"

Each item is a JSON object representing a detailed ad creative with the following fields:

  • originalKeyword (string): The input keyword that led to this ad being scraped.
  • advertiserId (string): The unique ID of the advertiser.
  • advertiserName (string): The name of the advertiser.
  • creativeId (string): The unique ID of the ad creative.
  • format (string): The format of the ad. Possible values: "TEXT", "IMAGE", "VIDEO", "UNKNOWN".
  • previewUrl (string | null): URL to the ad preview image, if available.
  • imgHtml (string | null): The raw HTML of the ad preview image, if available.
  • domain (string | null): The domain associated with the ad creative.
  • firstShown (string | null): Date the ad was first shown (YYYY-MM-DD format), if available.
  • lastShown (string | null): Date the ad was last shown (YYYY-MM-DD format), if available.
  • ocrData (object | null): Only included in OCR mode. Text extracted from the ad creative image using OCR (Tesseract). This object contains:
    • rawText (string | null): The complete raw text extracted by OCR from the ad image.
    • ocrError (string | null): Any error message if OCR extraction failed. Null if successful.

Example Output (FULL Mode - No OCR):

{
"originalKeyword": "clay.com",
"advertiserId": "AR12345678901234567890",
"advertiserName": "Clay",
"creativeId": "CR98765432109876543210",
"format": "IMAGE",
"previewUrl": "https://tpc.googlesyndication.com/...",
"imgHtml": "<img src=\"https://tpc.googlesyndication.com/...\">",
"domain": "clay.com",
"firstShown": "2024-01-15",
"lastShown": "2024-01-31"
}

Example Output (OCR Mode - With OCR):

{
"originalKeyword": "clay.com",
"advertiserId": "AR12345678901234567890",
"advertiserName": "Clay",
"creativeId": "CR98765432109876543210",
"format": "IMAGE",
"previewUrl": "https://tpc.googlesyndication.com/...",
"imgHtml": "<img src=\"https://tpc.googlesyndication.com/...\">",
"domain": "clay.com",
"firstShown": "2024-01-15",
"lastShown": "2024-01-31",
"ocrData": {
"rawText": "Sponsored\nA clay.com\nwww.clay.com/\nAutomate Inbound\nScore and route leads automatically. Enrich, score, and route inbound in real time.",
"ocrError": null
}
}

Output Structure for runMode: "LITE"

Each item is a JSON object summarizing ad counts for all advertisers associated with the searched domain, filtered by the specified date range and region:

  • originalKeyword (string): The input keyword (domain or URL) that led to this summary.
  • keyword (string): The extracted domain name (e.g., "clay.com").
  • advertisersFound (object): A dictionary of advertiser IDs mapped to advertiser names found for this domain.
  • textCreativeCount (integer): Number of TEXT ad creatives found for this domain within the specified parameters.
  • imageCreativeCount (integer): Number of IMAGE ad creatives.
  • videoCreativeCount (integer): Number of VIDEO ad creatives.
  • unknownFormatCount (integer): Number of creatives with an undetermined format.
  • regionSearched (string): The region parameter used for this count (e.g., "US", "anywhere").
  • totalCreativesCountedFromSearch (integer): The actual number of creative summaries processed to derive the format counts for the specified date range and region.

⚙️ Setup and Running

This Actor is designed to run as a Docker container on the Apify platform.

  1. Build the Actor: apify build (from within the Actor's directory) Or, manually with Docker: docker build -t google-ads-scraper-actor .

  2. Run the Actor: apify run (this will use the INPUT.json file in the Actor's .actor directory if present, or you can specify input via CLI or Apify Console)

  3. Push to Apify Platform: apify push

❓ Frequently Asked Questions (FAQs)

Is it legal to scrape Google Ads data? Scraping publicly available data is generally permissible, but you should always be mindful of the website's terms of service, robots.txt, and relevant data privacy regulations (like GDPR, CCPA). Ensure your scraping activities are ethical and do not overload the target servers. Consult legal advice if you are unsure.

💬 Your feedback

If you have any feedback or feature requests, please let us know!