Google Ads Transparency Scraper avatar
Google Ads Transparency Scraper

Pricing

$20.00/month + usage

Go to Store
Google Ads Transparency Scraper

Google Ads Transparency Scraper

Developed by

Shashank Shankar

Shashank Shankar

Maintained by Community

Scrapes Google's Ad Transparency Center to check if domains are running ads. Features: ad creative extraction (images/videos), OCR text extraction, YouTube video handling, detailed stats, configurable concurrency, and robust error handling.

5.0 (2)

Pricing

$20.00/month + usage

0

Total users

6

Monthly users

6

Runs succeeded

>99%

Last modified

5 days ago

This Apify actor scrapes Google's Ad Transparency Center to check if domains are running ads and extracts ad creatives with OCR text extraction.

Features

  • Checks if domains are running Google Ads
  • Extracts ad creatives (images and videos)
  • Performs OCR on image ads to extract text
  • Handles YouTube video thumbnails and links
  • Provides detailed statistics and progress tracking
  • Configurable concurrency for faster processing
  • Robust error handling and retries

Input

The actor accepts the following input parameters:

{
"domains": [
"example.com",
"example.org"
],
"maxConcurrency": 1 // Optional, default: 1, max: 10
}
  • domains: Array of domains to check for ads (required)
  • maxConcurrency: Maximum number of domains to process concurrently (optional)

Output

The actor saves results to its default dataset. Each item contains:

{
"domain": "example.com",
"ads_running": true,
"creatives": [
{
"type": "image",
"url": "https://..."
},
{
"type": "video",
"url": "https://..."
}
],
"ad_texts": [
"Extracted text from image 1",
"Extracted text from image 2"
],
"error": null, // Error message if scraping failed
"timestamp": "2024-03-21T12:34:56.789Z"
}

Usage

  1. Create a new task for the actor
  2. Provide input:
    {
    "domains": ["example.com"],
    "maxConcurrency": 1
    }
  3. Run the task
  4. Get results from the dataset

Performance and Limits

  • Memory: 4096 MB
  • Timeout: 4 hours
  • Concurrency: 1-10 domains in parallel
  • Rate limiting: 2 second delay between requests

Dependencies

  • Python 3.9
  • Chrome browser
  • Tesseract OCR
  • Key Python packages:
    • selenium
    • pytesseract
    • aiohttp
    • Pillow
    • apify-client

Error Handling

The actor implements robust error handling:

  • Automatic retries for transient errors
  • Graceful degradation for OCR failures
  • Detailed error reporting in output
  • Progress tracking and statistics

Development

  1. Install dependencies:

    $pip install -r requirements.txt
  2. Install system dependencies:

    $apt-get install tesseract-ocr
  3. Run locally:

    $python main.py

License

Apache 2.0