Meta (Facebook & Instagram) Ad Library Scraper
Pricing
from $2.50 / 1,000 results
Meta (Facebook & Instagram) Ad Library Scraper
Scrapes Meta's public Ad Library and returns structured ad data. Four lookup modes (page_id, page_url, keyword, batch), opt-in pagination, schema-validated output.
Pricing
from $2.50 / 1,000 results
Rating
0.0
(0)
Developer
Jaybird Technologies
Maintained by CommunityActor stats
0
Bookmarked
7
Total users
6
Monthly active users
12 hours ago
Last modified
Categories
Share
Meta Ad Library Scraper
Apify Actor that scrapes Meta's public Ad Library (facebook.com/ads/library) and returns structured ad data. Four lookup modes (page_id, page_url, keyword, batch), opt-in pagination, schema-validated per-item output.
Why this actor
Meta's official Ad Library API requires app review, identity verification, and is restricted to political/issue/EU ads — unusable for general competitive intelligence on SMB and brand advertisers. The public Ad Library website exposes the same data for all ads, but only via a JS-rendered UI behind a browser challenge. This actor productizes the SSR scrape with the things existing Store actors tend to miss:
- Lifecycle dates (
start_date,end_date) per creative. - Honest
page_urlresolution — keyword-search the handle, then filter results bypage_profile_urimatch. No silent fallback to near-matches. - Schema-validated output with a non-silent failure envelope. When Meta drifts, you see
_validation_errorspopulated on the item, not a silent null. - Predictable pagination with a hard cap and a
truncated_reasondiscriminator (max_ads_capvsrun_timeout).
Input
See .actor/input_schema.json for the canonical schema. Summary:
| Field | Type | Default | Notes |
|---|---|---|---|
search_mode | enum | keyword | One of page_id, page_url, keyword, batch. |
page_id | string | — | Required when search_mode=page_id. Numeric ID. |
page_url | string | — | Required when search_mode=page_url. e.g. https://www.facebook.com/PetsOnBroadway |
query | string | — | Required when search_mode=keyword. |
queries | string[] | — | Required when search_mode=batch. |
region | string | US | ISO 3166-1 alpha-2 country code. Plumbs into Meta's URL country= filter only. |
active_only | boolean | true | When false, includes stopped ads. |
media_type | enum | all | all / image / video. |
max_ads_per_advertiser | integer | 30 | 1..500. Hard cap. |
follow_pagination | boolean | false | When true, walks end_cursor up to max_ads_per_advertiser. |
Examples
Single advertiser by page ID:
{"search_mode": "page_id", "page_id": "142023025844028"}
By page URL (handle resolution + match filter):
{"search_mode": "page_url", "page_url": "https://www.facebook.com/PetsOnBroadway"}
Keyword search:
{"search_mode": "keyword", "query": "Logical Position"}
Batch (no within-run dedupe — see "Behavior notes" below):
{"search_mode": "batch","queries": ["nike", "adidas", "puma"],"follow_pagination": true,"max_ads_per_advertiser": 100}
Output
One dataset item per ad surfaced. Duplicate ad_archive_ids ARE possible within a single run when overlapping batch queries surface the same ad — consumers dedupe.
{// Identity"ad_archive_id": "1667643404381457","page_id": "142023025844028","page_name": "Pets On Broadway","page_profile_uri": "https://www.facebook.com/PetsOnBroadway/","page_like_count": 3404,"page_categories": ["Pet Supplies"],// Lifecycle (ISO-8601 UTC date; null if Meta omits)"start_date": "2026-05-04","end_date": "2026-05-09","is_active": true,// Creative"display_format": "image", // lowercased: image | video | carousel | dco | unknown"title": "petsonbroadway.com","caption": "petsonbroadway.com","body_text": "This May, every purchase of Nulo dog food helps...","cta_text": "Shop now","cta_type": "SHOP_NOW","link_url": "https://petsonbroadway.com/collections/nulo",// Media (image_url = first image; video_url prefers HD over SD)"image_url": "https://...img1.jpg","video_url": null,"all_image_urls": ["https://...img1.jpg"],"all_video_urls": [],// Provenance"matched_via": "page_id", // page_id | page_url | keyword"query": "142023025844028", // original input value the user submitted"region": "US","fetched_at": "2026-05-15T21:34:12.345Z"// _validation_errors: present and non-empty IF this item failed validation}
Per-run summary
Written to the run's key-value store as OUTPUT.json:
{"queries_executed": 3,"advertisers_resolved": 3,"ads_returned": 73,"block_rate": 0.0,"validation_error_rate": 0.0,"aborted": false,"aborted_reason": null,"warnings": [],"advertisers": [{"query": "nike","matched_via": "keyword","page_id": "...","page_name": "Nike","total_count": 4321, // Meta's library-wide tally"ads_in_dataset": 30,"has_more": true,"end_cursor": "AbCd...","source_url": "https://www.facebook.com/ads/library/?...","truncated_reason": "max_ads_cap" // max_ads_cap | run_timeout | null}]}
Behavior notes (read this before integrating)
1. Items are NOT deduped within a run
If you submit queries: ["nike", "Nike Inc", "Nike Sportswear"], the dataset will include the same ad_archive_id once per query that surfaced it (with different query field values). This is deliberate — each input query is its own search and bills its own events. Always group by ad_archive_id if you want unique creatives.
2. Validation failures are pushed with _validation_errors, not dropped
Every item is validated against the Pydantic schema before push. If validation fails (e.g. Meta dropped a field from SSR), the item is still pushed but with a non-empty _validation_errors array. Your code MUST check this field — see the worked example below.
3. page_url mode never silently returns the wrong advertiser
When you submit page_url: "https://www.facebook.com/somehandle", the actor keyword-searches the handle, then filters results to only ads whose page_profile_uri matches /somehandle/ (case-insensitive). If no ads match, you get an empty result with a warnings[] entry — NOT a near-match from some unrelated page.
4. start_date / end_date are ISO-8601 dates in UTC
Meta inlines them as Unix timestamps; the actor normalizes to "YYYY-MM-DD" strings.
5. Derived fields are deliberately omitted
days_active, image_count, video_count, and ad permalink URLs are NOT emitted. Compute client-side:
days_active = (date.fromisoformat(end_date) - date.fromisoformat(start_date)).days if end_date else \(date.today() - date.fromisoformat(start_date)).daysimage_count = len(item["all_image_urls"])video_count = len(item["all_video_urls"])ad_permalink = f"https://www.facebook.com/ads/library/?id={item['ad_archive_id']}"
Calling the actor
Synchronous (small lookups, single page)
curl -X POST "https://api.apify.com/v2/acts/<your-username>~meta-ad-library-scraper/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"search_mode": "page_id", "page_id": "142023025844028"}'
Returns the dataset as JSON. Good for one-off lookups under ~30 seconds.
Asynchronous (batches, pagination)
# Start the runRUN_ID=$(curl -X POST "https://api.apify.com/v2/acts/<your-username>~meta-ad-library-scraper/runs?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"search_mode": "batch", "queries": ["nike", "adidas"], "follow_pagination": true}' \| jq -r '.data.id')# Poll until SUCCEEDED, or use webhookscurl "https://api.apify.com/v2/actor-runs/$RUN_ID?token=$APIFY_TOKEN"# Fetch dataset itemscurl "https://api.apify.com/v2/actor-runs/$RUN_ID/dataset/items?token=$APIFY_TOKEN"# Fetch OUTPUT.json from the run's KV storecurl "https://api.apify.com/v2/actor-runs/$RUN_ID/key-value-store/records/OUTPUT?token=$APIFY_TOKEN"
Worked parse-with-_validation_errors example
import requestsresp = requests.get(f"https://api.apify.com/v2/actor-runs/{run_id}/dataset/items",params={"token": APIFY_TOKEN},)items = resp.json()good, drifted = [], []for it in items:if it.get("_validation_errors"):drifted.append(it)else:good.append(it)if drifted:# _validation_errors populated → Meta SSR may have drifted from the# actor's declared schema. The items still have best-effort extracted# data, but consuming them as-if-valid risks downstream type errors.print(f"WARNING: {len(drifted)} of {len(items)} items failed validation")for it in drifted[:3]:print(" errors:", it["_validation_errors"])# Use only the validated items downstream.process_ads(good)
Versioning
The actor follows semantic versioning over its output contract:
- Major bump (1.x → 2.x): output field removal, or input-schema breaking change.
- Minor bump (1.0 → 1.1): output field addition, behavior change with release notes.
- Patch: bug fixes, internal changes.
Soft contract breaks from Meta's side (a field that was always populated suddenly becomes null) do NOT trigger a major bump on their own — they surface via populated _validation_errors on affected items. If you parse the dataset strictly, check _validation_errors.
Drift detection
The actor declares a dataset schema in .actor/dataset_schema.json. Apify validates every pushed item against it at runtime and exposes per-field statistics in the Console (null rates, value distributions). For ongoing surveillance:
- An Apify Schedule runs the actor nightly against a small batch of monitored advertisers.
- Apify Monitoring alerts on run failure, low item-count, and per-field null-rate spikes — surfacing Meta-side SSR drift before it becomes a customer complaint.
This is configured in the Apify Console (Schedules + Monitoring tabs); no external CI is required for drift detection. GitHub Actions in this repo only runs ruff + unit tests on push.
Versioned specs and proposals
This actor is developed using OpenSpec. The current capability spec lives at openspec/specs/meta-ad-library-scraper/spec.md once changes are archived; the v1 proposal is in openspec/changes/add-meta-ad-library-actor/.
License
TBD — to be set before public release.