Fast no-login Instagram scraper. Extract profiles, posts, reels, comments, hashtags, locations, tagged feeds and audio reels. Paste URLs or search by keyword — clean structured JSON. Works on any post age via 5-tier HTML fallback. Date filter, dedup, parallel race, residential proxy.
search + searchType (hashtag / profile / place) + searchLimit —
free-text discovery, resolved through Instagram's public topsearch.
onlyPostsNewerThan — date filter accepting ISO (2026-01-15) and
relative (7 days, 2 months, 1 year).
addParentData — annotates every record with its dataSource
parent (username / tag / location id) when scraping multiple sources.
maxTotalResults — overall cap on emitted records, independent of
per-URL limits.
dedupResults — automatic dedup by shortCode across sources.
HTTP client — curl_cffi.requests.AsyncSession(impersonate="chrome")
with sticky residential proxy sessions per target URL (single coherent
IP per URL, fresh IP on retry).
Three independent retry budgets — per-request, per-URL, run-wide
rate-limit budget; budgets are tracked and surfaced to the user.
5-tier fallback for single-post URLs —
oEmbed → web_profile_info top-12 → feed/user pagination (parallel
race) → /p/<sc>/ HTML page (Polaris JSON / JSON-LD / regex sweep /
OG meta tags) → oEmbed basic.
Comments fetching without auth — Polaris HTML parse for preview
comments + graphql/query POST pagination with rotating doc_id
candidates and guest-session cookies seeded via qe/sync/.
Logged-out profile HTML fallback - profile URLs now try the public
/{username}/ browser document after web_profile_info returns no data,
extracting embedded Polaris/GraphQL profile and timeline JSON when
Instagram ships it without login cookies.
Optional user cookies - users can paste an Instagram Cookie header,
Cookie-Editor JSON export, JSON object, or Netscape cookie file into the
secret sessionCookies input when Instagram hides data from logged-out
clients. Values are redacted from logs; only installed cookie names are
shown.
Cookie setup guide - the input UI and README now explain exactly how
to copy a Cookie header from Chrome / Edge DevTools, with a warning that
cookies are sensitive and should only come from the user's own account.
Clearer auth wording - public descriptions now say that public data
works without login and optional user cookies are supported for profiles
Instagram hides from logged-out users.
Batched streaming output — BatchedDatasetWriter flushes ~50
records every 5 s, draining cleanly on shutdown.
Output sanitisation — NULL byte / control char stripping, oversized
caption / list truncation, id-field coercion, 9 MB-cap guard.
Pay-per-event ready — Apify's default dataset-item event charges
emitted records automatically when PPE pricing is configured.
User-friendly logs + status messages — emoji-prefixed structured
output with explicit error guidance ("switch to RESIDENTIAL proxy",
"double-check the username spelling", …).
Support diagnostics — runs now write a redacted
SUPPORT_DIAGNOSTICS JSON record to the default key-value store with
target result counts, failure reasons, proxy/cookie flags, rate-limit
budget, and troubleshooting checklist. This helps debug user reports when
the Actor owner cannot access the user's run.
124 offline unit tests covering parsers, sanitisation, retry
budgets, URL classification, HTML / OG / JSON-LD extraction, profile
cache, dataset writer, date filter, shortcode round-trip.
dataset_schema.json — full JSON Schema description of every
output field + 6 named views (Overview, Posts & Reels, Profiles,
Comments, Hashtags, Locations). Each view ships its own table layout
(labels + format hints), so the Apify Console picks the right columns
automatically for the chosen resultsType.
Performance highlights
Profile (50 posts): 5–10 s
Single recent post: 3–5 s
Single old post (4+ years): 10–15 s via parallel HTML race
Comments on one post: 15–30 s
Known limitations
Comments require a residential proxy (mobile API is auth-gated; we
parse the www.instagram.com HTML page instead).
Old posts beyond ~165 of the author's feed return ~18–22 fields
instead of the full 36+ — the GraphQL media node isn't shipped to
logged-out clients, so we reconstruct from a scoped regex sweep of
the HTML page.
Stories require an authenticated session — not supported in this
release.