Hybrid Ad Intelligence Scraper avatar

Hybrid Ad Intelligence Scraper

Pricing

from $0.20 / 1,000 results

Go to Apify Store
Hybrid Ad Intelligence Scraper

Hybrid Ad Intelligence Scraper

Scrape competitor ads from Google Ads Transparency & Meta Ads Library, optionally enriched with Poweradspy metrics. Features LLM self-healing fallback, landing page tech detection (Shopify, WooCommerce), global deduplication, and automated inspiration reports.

Pricing

from $0.20 / 1,000 results

Rating

0.0

(0)

Developer

Solutions Smart

Solutions Smart

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

7

Monthly active users

7 days ago

Last modified

Share

A resilient, hybrid ad scraper that marries free public ad libraries (Google Ads Transparency, Meta Ads Library) with premium Poweradspy intelligence. It features self-healing extraction using an LLM-powered Stagehand fallback, HTTP-first cost-control for enrichment, and sophisticated deduplication + reporting.

High-Level Features

  • Hybrid sources
    • google_ads_transparency (Google Ads Transparency Center, experimental selectors + optional LLM fallback).
    • meta_ads_library (official Meta Ads Library Graph API, requires metaAccessToken).
    • poweradspy (premium data + engagement metrics; can also generate mock data for testing).
  • Modes
    1. Free Sources Only (mode: "free"): Use public libraries only.
    2. Auto / Premium (mode: "auto"): If Poweradspy credentials are present, enrich / collect via Poweradspy as well.
    3. Poweradspy Only (mode: "poweradspy"): Skip free sources and use only Poweradspy.
  • Resilience first
    • Default extraction uses fast Playwright selectors.
    • When selectors fail or return 0, a Stagehand LLM fallback can take over (if OPENAI_API_KEY / ANTHROPIC_API_KEY is configured).
  • Enrichment & cost control
    • HTTP-first landing-page platform detection (Shopify, WooCommerce, etc.) with concurrency + RPM limits.
    • Global deduplication across all sources with customizable keys.

Common Input Examples

1. Free Mode (Google only, no credentials)

{
"mode": "free",
"freeSources": ["google_ads_transparency"],
"searchKeywords": "coffee",
"maxResultsPerSource": 50,
"googleAdvertisersToScan": 2,
"googleDetailPagesToVisit": 3,
"stagehandEnabled": true,
"extractionEngine": "stagehand_fallback",
"enableLandingPlatformDetection": true
}

Notes:

  • Without OPENAI_API_KEY, the Google selectors are best-effort only; if they return 0 and Stagehand is disabled or has no key, no Google ads will be scraped.
  • For Google Ads Transparency, googleAdvertisersToScan: 2 and googleDetailPagesToVisit: 3 are strong default settings for balancing result quality, speed, and cost.
  • See Mock Fallback below for how we still produce sample output in that case.

2. Free Mode with Meta Ads Library

{
"mode": "free",
"freeSources": ["google_ads_transparency", "meta_ads_library"],
"keywords": ["coffee", "espresso"],
"maxResultsPerSource": 25,
"googleAdvertisersToScan": 2,
"googleDetailPagesToVisit": 3,
"metaAccessToken": "YOUR_META_DEVELOPER_TOKEN"
}

Meta notes:

  • metaAccessToken is a standard Facebook/Meta Ads Library token with ads_read permission.
  • If meta_ads_library is selected but metaAccessToken is missing, the Actor logs a warning and skips Meta (the run does not fail).

3. Auto / Premium Mode (Poweradspy integration)

{
"mode": "auto",
"freeSources": ["google_ads_transparency"],
"keywords": ["coffee"],
"poweradspyEmail": "your_email@example.com",
"poweradspyPassword": "your_password",
"enablePoweradspyEnrichment": true,
"maxResultsPerSource": 50,
"googleAdvertisersToScan": 2,
"googleDetailPagesToVisit": 3,
"maxTotalResults": 100
}

In auto mode, free sources run first; if credentials are present, Poweradspy is used to enrich/extend the results.

4. Poweradspy Mock Data (fallback output)

You can generate realistic sample ads without a Poweradspy account via the mock engine:

{
"mode": "free",
"freeSources": ["google_ads_transparency"],
"searchKeywords": "coffee",
"fallbackToMockAdsWhenEmpty": true
}

Behavior:

  • If free sources (Google/Meta) return 0 ads and Poweradspy was not run, the Actor logs a warning and calls the Poweradspy scraper with useMockData: true to produce ~10 sample ads.
  • To disable this and get truly empty output, set:
{
"fallbackToMockAdsWhenEmpty": false
}

You can also call Poweradspy directly with useMockData: true in mode: "poweradspy" or mode: "auto" when you want sample output.

Stagehand (LLM) Config

If stagehandEnabled is true and extractionEngine is set to "stagehand_fallback":

  • Set OPENAI_API_KEY in the Actor’s Settings → Environment variables to use the default gpt-4o model, or
  • Set ANTHROPIC_API_KEY and "stagehandModelProvider": { "provider": "anthropic" } to use Claude.

When selectors return 0 items or throw, Stagehand opens the Google Ads Transparency page and uses the LLM to:

  • Search for your query,
  • Scroll / find ad cards in the current DOM,
  • Extract fields into a structured AdItem list.

If no API key is present, the fallback logs a warning and yields an empty array (the run continues; see Mock Fallback above).

Output Schema & Reports

  • Dataset rows follow the schema in .actor/dataset_schema.json (key fields):
    • adId, platform, title, adCopy, creativeUrl, landingPageUrl,
    • advertiserName, firstSeen, lastSeen, likes, shares, comments,
    • landingPagePlatform, cta, placement, country, scrapedAt.
  • HTML/Markdown report (inspiration-report) groups ads by landingPagePlatform and annotates each with the source (google_ads_transparency, meta_ads_library, poweradspy) and Poweradspy metrics when available.

Cost Control & Enrichments

  • Landing platform detection is done HTTP-first (no full browser), with configurable:
    • enrichmentMaxConcurrency (parallel HTTP calls),
    • enrichmentMaxRequestsPerMinute (RPM),
    • enrichmentHttpTimeoutMs (per-request timeout).
  • For Google Ads Transparency, the main cost controls are:
    • maxResultsPerSource
    • googleAdvertisersToScan
    • googleDetailPagesToVisit
  • Google Ads Transparency runs without a proxy by default to avoid tunnel errors; Meta and Poweradspy honor the proxyConfiguration input (e.g. Apify Residential proxy).

Compliance

Use the Actor responsibly and in compliance with privacy laws, target-site policies, and third-party API terms, including GDPR and CCPA where applicable.

Support

If this Actor helps your workflow, please leave a 5-star rating on the Actor page.