Meta (Facebook & Instagram) Ad Library Scraper avatar

Meta (Facebook & Instagram) Ad Library Scraper

Pricing

from $2.50 / 1,000 results

Go to Apify Store
Meta (Facebook & Instagram) Ad Library Scraper

Meta (Facebook & Instagram) Ad Library Scraper

Scrapes Meta's public Ad Library and returns structured ad data. Four lookup modes (page_id, page_url, keyword, batch), opt-in pagination, schema-validated output.

Pricing

from $2.50 / 1,000 results

Rating

0.0

(0)

Developer

Jaybird Technologies

Jaybird Technologies

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

6

Monthly active users

12 hours ago

Last modified

Share

Meta Ad Library Scraper

Apify Actor that scrapes Meta's public Ad Library (facebook.com/ads/library) and returns structured ad data. Four lookup modes (page_id, page_url, keyword, batch), opt-in pagination, schema-validated per-item output.

Why this actor

Meta's official Ad Library API requires app review, identity verification, and is restricted to political/issue/EU ads — unusable for general competitive intelligence on SMB and brand advertisers. The public Ad Library website exposes the same data for all ads, but only via a JS-rendered UI behind a browser challenge. This actor productizes the SSR scrape with the things existing Store actors tend to miss:

  • Lifecycle dates (start_date, end_date) per creative.
  • Honest page_url resolution — keyword-search the handle, then filter results by page_profile_uri match. No silent fallback to near-matches.
  • Schema-validated output with a non-silent failure envelope. When Meta drifts, you see _validation_errors populated on the item, not a silent null.
  • Predictable pagination with a hard cap and a truncated_reason discriminator (max_ads_cap vs run_timeout).

Input

See .actor/input_schema.json for the canonical schema. Summary:

FieldTypeDefaultNotes
search_modeenumkeywordOne of page_id, page_url, keyword, batch.
page_idstringRequired when search_mode=page_id. Numeric ID.
page_urlstringRequired when search_mode=page_url. e.g. https://www.facebook.com/PetsOnBroadway
querystringRequired when search_mode=keyword.
queriesstring[]Required when search_mode=batch.
regionstringUSISO 3166-1 alpha-2 country code. Plumbs into Meta's URL country= filter only.
active_onlybooleantrueWhen false, includes stopped ads.
media_typeenumallall / image / video.
max_ads_per_advertiserinteger301..500. Hard cap.
follow_paginationbooleanfalseWhen true, walks end_cursor up to max_ads_per_advertiser.

Examples

Single advertiser by page ID:

{"search_mode": "page_id", "page_id": "142023025844028"}

By page URL (handle resolution + match filter):

{"search_mode": "page_url", "page_url": "https://www.facebook.com/PetsOnBroadway"}

Keyword search:

{"search_mode": "keyword", "query": "Logical Position"}

Batch (no within-run dedupe — see "Behavior notes" below):

{
"search_mode": "batch",
"queries": ["nike", "adidas", "puma"],
"follow_pagination": true,
"max_ads_per_advertiser": 100
}

Output

One dataset item per ad surfaced. Duplicate ad_archive_ids ARE possible within a single run when overlapping batch queries surface the same ad — consumers dedupe.

{
// Identity
"ad_archive_id": "1667643404381457",
"page_id": "142023025844028",
"page_name": "Pets On Broadway",
"page_profile_uri": "https://www.facebook.com/PetsOnBroadway/",
"page_like_count": 3404,
"page_categories": ["Pet Supplies"],
// Lifecycle (ISO-8601 UTC date; null if Meta omits)
"start_date": "2026-05-04",
"end_date": "2026-05-09",
"is_active": true,
// Creative
"display_format": "image", // lowercased: image | video | carousel | dco | unknown
"title": "petsonbroadway.com",
"caption": "petsonbroadway.com",
"body_text": "This May, every purchase of Nulo dog food helps...",
"cta_text": "Shop now",
"cta_type": "SHOP_NOW",
"link_url": "https://petsonbroadway.com/collections/nulo",
// Media (image_url = first image; video_url prefers HD over SD)
"image_url": "https://...img1.jpg",
"video_url": null,
"all_image_urls": ["https://...img1.jpg"],
"all_video_urls": [],
// Provenance
"matched_via": "page_id", // page_id | page_url | keyword
"query": "142023025844028", // original input value the user submitted
"region": "US",
"fetched_at": "2026-05-15T21:34:12.345Z"
// _validation_errors: present and non-empty IF this item failed validation
}

Per-run summary

Written to the run's key-value store as OUTPUT.json:

{
"queries_executed": 3,
"advertisers_resolved": 3,
"ads_returned": 73,
"block_rate": 0.0,
"validation_error_rate": 0.0,
"aborted": false,
"aborted_reason": null,
"warnings": [],
"advertisers": [
{
"query": "nike",
"matched_via": "keyword",
"page_id": "...",
"page_name": "Nike",
"total_count": 4321, // Meta's library-wide tally
"ads_in_dataset": 30,
"has_more": true,
"end_cursor": "AbCd...",
"source_url": "https://www.facebook.com/ads/library/?...",
"truncated_reason": "max_ads_cap" // max_ads_cap | run_timeout | null
}
]
}

Behavior notes (read this before integrating)

1. Items are NOT deduped within a run

If you submit queries: ["nike", "Nike Inc", "Nike Sportswear"], the dataset will include the same ad_archive_id once per query that surfaced it (with different query field values). This is deliberate — each input query is its own search and bills its own events. Always group by ad_archive_id if you want unique creatives.

2. Validation failures are pushed with _validation_errors, not dropped

Every item is validated against the Pydantic schema before push. If validation fails (e.g. Meta dropped a field from SSR), the item is still pushed but with a non-empty _validation_errors array. Your code MUST check this field — see the worked example below.

3. page_url mode never silently returns the wrong advertiser

When you submit page_url: "https://www.facebook.com/somehandle", the actor keyword-searches the handle, then filters results to only ads whose page_profile_uri matches /somehandle/ (case-insensitive). If no ads match, you get an empty result with a warnings[] entry — NOT a near-match from some unrelated page.

4. start_date / end_date are ISO-8601 dates in UTC

Meta inlines them as Unix timestamps; the actor normalizes to "YYYY-MM-DD" strings.

5. Derived fields are deliberately omitted

days_active, image_count, video_count, and ad permalink URLs are NOT emitted. Compute client-side:

days_active = (date.fromisoformat(end_date) - date.fromisoformat(start_date)).days if end_date else \
(date.today() - date.fromisoformat(start_date)).days
image_count = len(item["all_image_urls"])
video_count = len(item["all_video_urls"])
ad_permalink = f"https://www.facebook.com/ads/library/?id={item['ad_archive_id']}"

Calling the actor

Synchronous (small lookups, single page)

curl -X POST "https://api.apify.com/v2/acts/<your-username>~meta-ad-library-scraper/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"search_mode": "page_id", "page_id": "142023025844028"}'

Returns the dataset as JSON. Good for one-off lookups under ~30 seconds.

Asynchronous (batches, pagination)

# Start the run
RUN_ID=$(curl -X POST "https://api.apify.com/v2/acts/<your-username>~meta-ad-library-scraper/runs?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"search_mode": "batch", "queries": ["nike", "adidas"], "follow_pagination": true}' \
| jq -r '.data.id')
# Poll until SUCCEEDED, or use webhooks
curl "https://api.apify.com/v2/actor-runs/$RUN_ID?token=$APIFY_TOKEN"
# Fetch dataset items
curl "https://api.apify.com/v2/actor-runs/$RUN_ID/dataset/items?token=$APIFY_TOKEN"
# Fetch OUTPUT.json from the run's KV store
curl "https://api.apify.com/v2/actor-runs/$RUN_ID/key-value-store/records/OUTPUT?token=$APIFY_TOKEN"

Worked parse-with-_validation_errors example

import requests
resp = requests.get(
f"https://api.apify.com/v2/actor-runs/{run_id}/dataset/items",
params={"token": APIFY_TOKEN},
)
items = resp.json()
good, drifted = [], []
for it in items:
if it.get("_validation_errors"):
drifted.append(it)
else:
good.append(it)
if drifted:
# _validation_errors populated → Meta SSR may have drifted from the
# actor's declared schema. The items still have best-effort extracted
# data, but consuming them as-if-valid risks downstream type errors.
print(f"WARNING: {len(drifted)} of {len(items)} items failed validation")
for it in drifted[:3]:
print(" errors:", it["_validation_errors"])
# Use only the validated items downstream.
process_ads(good)

Versioning

The actor follows semantic versioning over its output contract:

  • Major bump (1.x → 2.x): output field removal, or input-schema breaking change.
  • Minor bump (1.0 → 1.1): output field addition, behavior change with release notes.
  • Patch: bug fixes, internal changes.

Soft contract breaks from Meta's side (a field that was always populated suddenly becomes null) do NOT trigger a major bump on their own — they surface via populated _validation_errors on affected items. If you parse the dataset strictly, check _validation_errors.

Drift detection

The actor declares a dataset schema in .actor/dataset_schema.json. Apify validates every pushed item against it at runtime and exposes per-field statistics in the Console (null rates, value distributions). For ongoing surveillance:

  • An Apify Schedule runs the actor nightly against a small batch of monitored advertisers.
  • Apify Monitoring alerts on run failure, low item-count, and per-field null-rate spikes — surfacing Meta-side SSR drift before it becomes a customer complaint.

This is configured in the Apify Console (Schedules + Monitoring tabs); no external CI is required for drift detection. GitHub Actions in this repo only runs ruff + unit tests on push.

Versioned specs and proposals

This actor is developed using OpenSpec. The current capability spec lives at openspec/specs/meta-ad-library-scraper/spec.md once changes are archived; the v1 proposal is in openspec/changes/add-meta-ad-library-actor/.

License

TBD — to be set before public release.