
Google Ads Transparency Scraper
Pricing
$20.00/month + usage

Google Ads Transparency Scraper
Scrapes Google's Ad Transparency Center to check if domains are running ads. Features: ad creative extraction (images/videos), OCR text extraction, YouTube video handling, detailed stats, configurable concurrency, and robust error handling.
5.0 (2)
Pricing
$20.00/month + usage
0
Total users
6
Monthly users
6
Runs succeeded
>99%
Last modified
5 days ago
This Apify actor scrapes Google's Ad Transparency Center to check if domains are running ads and extracts ad creatives with OCR text extraction.
Features
- Checks if domains are running Google Ads
- Extracts ad creatives (images and videos)
- Performs OCR on image ads to extract text
- Handles YouTube video thumbnails and links
- Provides detailed statistics and progress tracking
- Configurable concurrency for faster processing
- Robust error handling and retries
Input
The actor accepts the following input parameters:
{"domains": ["example.com","example.org"],"maxConcurrency": 1 // Optional, default: 1, max: 10}
domains
: Array of domains to check for ads (required)maxConcurrency
: Maximum number of domains to process concurrently (optional)
Output
The actor saves results to its default dataset. Each item contains:
{"domain": "example.com","ads_running": true,"creatives": [{"type": "image","url": "https://..."},{"type": "video","url": "https://..."}],"ad_texts": ["Extracted text from image 1","Extracted text from image 2"],"error": null, // Error message if scraping failed"timestamp": "2024-03-21T12:34:56.789Z"}
Usage
- Create a new task for the actor
- Provide input:
{"domains": ["example.com"],"maxConcurrency": 1}
- Run the task
- Get results from the dataset
Performance and Limits
- Memory: 4096 MB
- Timeout: 4 hours
- Concurrency: 1-10 domains in parallel
- Rate limiting: 2 second delay between requests
Dependencies
- Python 3.9
- Chrome browser
- Tesseract OCR
- Key Python packages:
- selenium
- pytesseract
- aiohttp
- Pillow
- apify-client
Error Handling
The actor implements robust error handling:
- Automatic retries for transient errors
- Graceful degradation for OCR failures
- Detailed error reporting in output
- Progress tracking and statistics
Development
-
Install dependencies:
$pip install -r requirements.txt -
Install system dependencies:
$apt-get install tesseract-ocr -
Run locally:
$python main.py
License
Apache 2.0
On this page
Share Actor: