Pinterest Easy Scraper avatar

Pinterest Easy Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Pinterest Easy Scraper

Pinterest Easy Scraper

Discover unlimited public data on Pinterest with our Easy Scraper. Dive into profiles and "pins" with more depth than ever. Your free, go-to tool for seamless Pinterest insights.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

codemaster devops

codemaster devops

Maintained by Community

Actor stats

5

Bookmarked

154

Total users

0

Monthly active users

19 hours ago

Last modified

Categories

Share

Pinterest Scraper — Profile + Pin Data Extractor

Scrape Pinterest profiles and pins into a clean, structured Apify dataset via a 4-tier reliability cascade: public widget API → SSR HTML bootstrap → internal resource API → headless browser fallback.

Keywords: Pinterest scraper, Pinterest API scraper, Pinterest profile data, Pinterest pin metadata, Apify actor, web scraping Pinterest.

Version 2.0.x — what changed

  • 4-tier cascade replaces the single-endpoint scraper. Small runs resolve via the unauthenticated widget endpoint; large runs bootstrap session state from SSR HTML and paginate via the internal API; unreachable profiles optionally escalate to a real headless browser.
  • Residential Apify proxy is mandatory for useApifyProxy: true. Pinterest aggressively blocks datacenter IPs; the actor fails fast at input validation so you never waste a compute unit on a guaranteed-to-fail run.
  • legacyKvKeys removed (breaking change). Profiles are now stored under a single collision-safe key: profile-<sanitizedUsername>-<hash>. See Migration from 1.x below.
  • Pinterest-aware headers (X-Requested-With, X-APP-VERSION, X-Pinterest-AppState, X-Pinterest-PWS-Handler, X-CSRFToken, Referer) are now emitted on every internal-API request. got-scraping's auto header generator is disabled so Pinterest-specific headers survive.
  • Session hardeninguseSessionPool: true, persistCookiesPerSession: true, soft-block detection via content-type + body regex, and deliberate session.retire() / session.markBad() based on failure class.
  • Automated tests + nightly canary on the committed fixtures plus a real widget-endpoint shape check for nasa / natgeo.

Features

  • Profile extraction — usernames, bios, follower counts, website, profile image, verified status.
  • Pin extraction — id, title, description, outbound link, domain, board, image, aggregated engagement stats.
  • Curated output by default — compact, CSV-friendly schema; opt into the full raw payload with includeRaw: true.
  • Deduplication — pins are deduped by id across pagination pages.
  • Bookmark-based pagination with safety guard — cannot get stuck when Pinterest returns the same cursor twice.
  • Rate limiting — configurable jitter (minDelayMs / maxDelayMs) + session pool rotation.
  • Per-tier retry budgets — widget: 2, html: 3, resource: 5, browser: 2.
  • Fail-fast validation — invalid input, non-residential proxy, or non-Pinterest URLs error loudly.

Quick start (Apify Console)

  1. Open the Pinterest Scraper actor in the Apify Console.
  2. Click Try for free / Start.
  3. The default input already contains two public profiles (nasa, natgeo) and Apify Proxy with residential group — just press Start.
  4. When the run finishes, open the Dataset tab and export as JSON, CSV, Excel, RSS, or HTML.

Sample input

{
"startUrls": [
"https://www.pinterest.com/nasa/",
"natgeo"
],
"maxPinsCnt": 50,
"includeRaw": false,
"widgetFirst": true,
"fallbackToBrowser": false,
"proxyConfig": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"minConcurrency": 1,
"maxConcurrency": 5,
"maxRequestRetries": 5,
"requestHandlerTimeoutSecs": 30,
"minDelayMs": 500,
"maxDelayMs": 2000
}

Input schema

FieldTypeRequiredDefaultDescription
startUrlsstring[] or {url:string}[]yesProfile URLs (https://www.pinterest.com/nasa/) or bare usernames (nasa). Locale segments (/en/, /de/, etc.) are stripped.
maxPinsCntintegerno50Max pins per profile (1 – 10000). <= 25 uses the widget tier exclusively; larger values paginate via the resource tier.
includeRawbooleannofalseAttach the full raw Pinterest payload (minus known noise fields) to each pin/profile record.
widgetFirstbooleannotrueStart small runs (maxPinsCnt <= 25) on the public widget endpoint. Disable to always bootstrap from HTML.
fallbackToBrowserbooleannofalseEscalate to a real headless browser when widget + HTML + resource tiers all fail for a profile. Adds significant memory + runtime.
proxyConfigobjectyes{useApifyProxy:true, apifyProxyGroups:["RESIDENTIAL"]}Apify Proxy (residential required) or custom {useApifyProxy:false, proxyUrls:[...]}.
minConcurrencyintegerno1Lower bound of parallel requests.
maxConcurrencyintegerno5Upper bound. Keep low (2 – 5); Pinterest rate-limits aggressively.
maxRequestRetriesintegerno5Global retry ceiling. Per-tier caps apply (widget: 2, html: 3, resource: 5, browser: 2).
requestHandlerTimeoutSecsintegerno30Per-request processing timeout.
minDelayMs / maxDelayMsintegerno500 / 2000Jittered pre-request delay. Raise on 403/429.

Residential proxy is mandatory

The actor throws at input validation if useApifyProxy: true is combined with any proxy group other than (or missing) RESIDENTIAL. If you prefer to bring your own proxy, set useApifyProxy: false and supply residential proxyUrls. Datacenter IPs will not work — Pinterest blocks them at the TLS layer.

Output

  • Dataset (Actor.pushData) — one record per profile (recordType: "profile") plus one per pin (recordType: "pin"). Iterate with Dataset.forEach or export to any supported format.
  • Key-value store (Actor.setValue) — each profile is additionally stored under profile-<sanitizedUsername>-<hash> for direct lookup via the Apify API. Keys are collision-safe across case variants and unsafe characters.
  • Debug snapshots — set env DEBUG_SNAPSHOT=1 to capture the first profile payload, first pin payload, and per-tier hydration samples into the KV store under debug-* keys for offline inspection.

Curated profile shape

{
"recordType": "profile",
"username": "nasa",
"fullName": "NASA",
"about": "...",
"websiteUrl": "https://www.nasa.gov",
"profileUrl": "https://www.pinterest.com/nasa/",
"imageLargeUrl": "...",
"pinCount": 123,
"boardCount": 45,
"followerCount": 678901,
"followingCount": 12,
"country": "US",
"locale": "en-US",
"verified": true
}

Curated pin shape

{
"recordType": "pin",
"id": "123456789012345678",
"profile": "nasa",
"sourceUrl": "https://www.pinterest.com/nasa/",
"pinUrl": "https://www.pinterest.com/pin/123456789012345678/",
"title": "...",
"description": "...",
"link": "https://example.com/article",
"domain": "example.com",
"createdAt": "Tue, 01 Jan 2024 ...",
"image": { "url": "...", "width": 736, "height": 1104 },
"board": { "id": "...", "name": "...", "url": "..." },
"commentCount": 0,
"richMetadata": { /* opengraph-like */ },
"aggregatedStats": { "saves": 123, "done": 0 }
}

Architecture — the 4-tier cascade

Every profile enters at the highest-confidence tier and downgrades only on terminal failure. failedRequestHandler in main.js computes the next tier via nextTierRequest(label, userData, { fallbackToBrowser }).

#LabelEndpointWhen selectedDowngrade target
1profile-widgethttps://widgets.pinterest.com/v3/pidgets/users/<u>/pins/widgetFirst && maxPinsCnt <= 25profile-html
2profile-htmlhttps://www.pinterest.com/<u>/ (parse __PWS_DATA__)Default seed for maxPinsCnt > 25; also reached via widget downgrade. Hydrates session.userData.hydration with appVersion + csrfToken + cookies for tier 3.profile-resource-start
3profile-resource-startprofile-resource-pins/resource/UserResource/get/ then /resource/UserPinsResource/get/ (bookmark-paginated)Tier 2 downgrade, or direct entry when HTML fails.profile-browser (only if fallbackToBrowser: true)
4profile-browserPlaywrightCrawler navigating the public profile page; intercepts the XHRs above via page.on('response') and reuses the same curated builders.All HTTP tiers have failed for a profile AND fallbackToBrowser: true.Terminal.

The browser tier runs as a second crawler after the HTTP crawler finishes. Failed HTTP requests that qualify for escalation are batched into browserQueue; runBrowserTier is invoked once with the full batch so the Chromium instance is paid for only when strictly necessary.

Reliability layers

  • Tier-aware headerssrc/headers.js emits Pinterest-expected headers: Accept, Accept-Language, User-Agent (one of four modern desktop UAs, sticky per session), Referer, X-Requested-With: XMLHttpRequest, X-APP-VERSION, X-Pinterest-AppState: active, X-Pinterest-PWS-Handler: www/[username].js, X-CSRFToken (from hydration), and Origin. Widget tier skips Pinterest-specific headers since the endpoint is public.
  • Session lifecyclesrc/session.js classifies every response as ok | soft-block | rate-limit | transient | client-error | not-found using status code, content-type, and body-regex heuristics ("captcha", "unusual traffic", "rate limit", etc.). retire() on rate-limit / soft-block, markBad() on transient 5xx, no action on ok / not-found.
  • Cookie persistenceapplyCookies(session, setCookieHeader) extracts csrftoken into session.userData.hydration.csrfToken and keeps a concatenated Cookie header; persistCookiesPerSession: true ties cookies to session lifetime.
  • Retries budgeted per tierTIER_MAX_RETRIES in src/headers.js caps retries per label so a stuck tier cannot exhaust the global budget.
  • Bookmark loop guardlastBookmark is compared to the newly returned cursor; a repeat terminates pagination instead of looping.
  • Schema drift detectioncreateSchemaState carries profileWarned / pinWarned flags. The first empty curated record emits a one-shot warning with a sample of top-level keys. A nightly GitHub Action (.github/workflows/canary.yml) hits the widget endpoint for nasa + natgeo and asserts required keys so drift surfaces before users notice.

Migration from 1.x

Breaking change: legacyKvKeys is removed.

1.x wrote each profile to the key-value store twice — under the raw username (nasa) and under the safe key (profile-nasa-<hash>). 2.0 writes only the safe key.

If a downstream consumer reads profiles by raw username:

  1. Update it to read the new key format: profile-<sanitizedUsername>-<hash>. Obtain the sanitized key via safeKvKey(username) exported from src/input.js, or resolve it from the profile record (which carries username) and compute the SHA-1 prefix yourself.
  2. If a migration window is required, consume from the dataset (recordType: "profile" records) instead of the KV store — the dataset is unchanged.

There is no compat shim in 2.0. If you cannot migrate, stay on 1.x.

Running locally

npm ci
# optional (one-time): install Chromium for the browser tier
npx playwright install --with-deps chromium
echo '{"startUrls":["https://www.pinterest.com/nasa/"],"maxPinsCnt":25,"proxyConfig":{"useApifyProxy":false,"proxyUrls":["http://your-residential-proxy:port"]}}' > INPUT.json
APIFY_LOCAL_STORAGE_DIR=./apify_storage node main.js

For Apify Console runs the Dockerfile uses the apify/actor-node-playwright-chrome:20 base image, so Chromium ships with the actor — no separate install step.

Testing

  • npm test runs the full node:test suite under test/. No external dependencies, no browser launched.
  • npm run canary runs the live widget-endpoint schema canary locally (requires outbound network). The canary retries once on 403/503/network and skips-with-warning when the runner IP is block-flagged by Pinterest — real shape drift still hard-fails.
  • npm run smoke is the manual release gate before apify push. It spawns the full actor against a user-supplied residential proxy and asserts the dataset contains ≥1 profile + ≥1 pin record.
    • Required env: SMOKE_PROXY_URL=http://user:pass@residential-host:port
    • Optional env: SMOKE_USER (default nasa), SMOKE_MAX_PINS (default 10)
    • Storage is retained in a tmpdir for inspection on failure. Proxy credentials are redacted from all log output.
  • CI (.github/workflows/ci.yml) runs tests on every push to main, master, or any claude/** branch with PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 so CI stays fast.
  • Canary workflow (.github/workflows/canary.yml) runs the live schema check daily at 08:00 UTC and on-demand via workflow_dispatch.

Dependency discipline

  • The Dockerfile pins the Apify Playwright base image by tag suffix (apify/actor-node-playwright-chrome:20-1.59.1) so a future base-image refresh cannot silently swap the Chromium binary underneath us. Bump this tag deliberately — keep the devDep playwright range compatible.
  • playwright lives in devDependencies. Production images use the base image's pre-installed copy (guaranteeing a matched Chromium ↔ Playwright pair). Local dev + CI get it via npm install.
  • Direct lodash and omit-deep-lodash dependencies were removed in 2.0. The ~20-line vanilla replacements live in src/curate.js (omit, omitDeep, lastValue).
  • package.json carries an npm override forcing file-type >= 21.3.4 to patch the Crawlee transitive decompression-bomb advisory.
  • npm audit (full tree) reports 0 vulnerabilities as of 2.0. Re-run before every release.

Known limitations

  • Cookie jar is process-local. Session restarts lose hydration; HTML tier re-bootstraps from scratch. This is a deliberate trade-off — the code is stateless across runs.
  • Widget endpoint region-blocks. Some regions return 404 for the widget endpoint. The actor treats 404 as terminal for that tier and falls through to HTML automatically.
  • Residential proxy is ~8× the cost of datacenter. Non-negotiable for Pinterest — see Apify proxy docs for pricing.
  • Browser tier is slow and memory-hungry. Reserve fallbackToBrowser: true for profiles where HTTP tiers consistently fail; do not enable it as a prophylactic.

Project layout

.
├── main.js # actor entry: CheerioCrawler + optional PlaywrightCrawler stage
├── src/
│ ├── input.js # validateInput, parseStartUrl, safeKvKey, jitter helpers
│ ├── headers.js # buildHeaders(request, session), TIER_MAX_RETRIES, UA pool
│ ├── session.js # classifyResponse, handleBadResponse, applyCookies
│ ├── curate.js # buildCuratedProfile / buildCuratedPin / buildRawPin / KV payload
│ ├── routes.js # createRouter, LABELS, DOWNGRADE, nextTierRequest
│ └── tiers/
│ ├── widget.js # public widget endpoint handler
│ ├── html.js # __PWS_DATA__ parser + session hydration
│ ├── resource.js # internal resource API (start + paginated pins)
│ └── browser.js # PlaywrightCrawler + XHR capture (tier 4)
├── test/
│ ├── fixtures/ # widget/html/resource-userpins JSON + HTML fixtures
│ ├── helpers/actor-stub.js # Module._load shim for apify stub
│ ├── input.test.js # parseStartUrl, validateInput, safeKvKey
│ ├── headers.test.js # per-tier header assertions
│ ├── session.test.js # classification truth table + cookie parsing
│ ├── router.test.js # LABELS, DOWNGRADE, nextTierRequest
│ ├── tiers-widget.test.js
│ ├── tiers-html.test.js
│ ├── tiers-resource.test.js
│ ├── tiers-browser.test.js # pure-logic tests (no browser launch)
│ └── canary.js # nightly widget-shape check
├── .actor/actor.json
├── INPUT_SCHEMA.json
├── Dockerfile # apify/actor-node-playwright-chrome:20
├── .github/workflows/
│ ├── ci.yml
│ └── canary.yml
└── package.json # "version": "2.0.0"

License

ISC.