Pricing

$2.50 / 1,000 results

Try for free

Go to Apify Store

Instagram Scraper

Try for free

One Instagram scraper for everything — profiles, contacts (emails & phones), posts, reels, comments, likers, followers, following, tagged, stories, highlights, hashtags, locations & search. No code: paste usernames or URLs, export to Excel, CSV, JSON or API. Public data needs no login.

Pricing

$2.50 / 1,000 results

Rating

5.0

(1)

Developer

VortexData

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

[2.0.14] - 2026-07-27

Everything below ships as one release. Verified across 12 dev builds (2.0.35-2.0.46) on identical inputs; four hypotheses were tested and three of them measured as worthless and were reverted — those are recorded too, so they are not re-attempted.

Added — you can now say which country you use Instagram from

Instagram treats a live session appearing from an unexpected country as stolen and challenges it. A challenge surfaces as an auth failure, so the run reported "your cookie is invalid or expired" — sending the user off to re-export a cookie that was never the problem. Meanwhile the proxy country was never pinned at all: proxyCountry existed in the code but was absent from the input schema, so no interface user could ever set it, and every cookie run browsed from a random country.
Country you use Instagram from now sits directly under the cookie field, defaulting to Automatic. Deciding it automatically is not possible: the only geo-bearing value in an Instagram cookie set is rur, which carries a Meta datacenter code, and outside the US Meta runs only a handful of datacenters — so it resolves to a continent at best, never a country. It is the user's call, so the input asks plainly and explains why it matters.
A checkpoint and an expired session now say different things, because they have different fixes. The checkpoint message names the country setting; the expired-session message still asks for a fresh cookie.
Supplying a cookie without choosing a country logs a one-line nudge up front, where it is still cheap to act on. SUPPORT_DIAGNOSTICS.proxy gained country and countryChosenByUser.

Headline, on identical input (8 ordinary public profiles, details)

profile-lookup requests   31 -> 16   (-48%)
  session blocks             9 ->  3   (-67%)
  wire bytes          1 044 569 -> 428 230   (-59%)
  platform cost       $0.009478 -> $0.007387 (-22%)
  records                    8 ->  8   (unchanged)

Halves the profile-lookup requests and cuts session blocks by three quarters, with no change in what comes back. Driven by new telemetry rather than intuition — including one hypothesis that measured as worthless and was reverted.

Added — the profile lookup now reports which attempt actually wins

SUPPORT_DIAGNOSTICS.http.profileLookup records successByAttempt, refusalsByAttempt and refusalKinds. Totals could never answer "is the ladder too long?" — only the index of the winning attempt can, and it is cheap to collect.

Changed — the lookup ladder is two rounds, not three

Measured across two runs and 9 successful lookups: the winning attempt was always 1, 2 or 3 — never 4, 5 or 6. Attempts 5-6 produced zero successes and only latency, so the third round is gone. Two rounds still leave a full attempt of margin past the last index ever observed to win.

Changed — a double HTTP 400 ends the ladder immediately

The dominant refusal is not blocking, it is HTTP 400: 28 of 36 refusals across the two runs. That is Instagram failing to serialize the account (the recurring "Asset ig_business_category_subvertical has been deleted"), not refusing us access — it repeated on every attempt and every exit IP. When both hosts answer 400 in round one, the remaining rounds cannot fix a server-side schema error, so the lookup goes straight to the public document. Blocks and empty payloads are deliberately not covered: those can differ per IP, so they keep the full ladder.

Measured, on identical input (8 ordinary public profiles)

lookup requests   31 -> 16   (-48%)
  session blocks     9 ->  2   (-78%)
  successful lookups 4 ->  4   (unchanged)
  records            8 ->  8   (unchanged)
  wire bytes                   (-6%)

Cost barely moves, and that is expected: the requests removed were tiny 400 replies, while the expensive part — the ~704 KB public document those four profiles still need — is untouched. Wall-clock is not claimed either way; the same scenario ran 19.5s / 73.2s / 38.2s across three builds, so at one run per build the duration signal is pure proxy variance.

Changed — details and contacts stop downloading a third of the profile document

Nobody had measured where the payload sits inside the ~707 KB profile document, so the actor read all of it. Measured without a single extra request by binary-searching the smallest prefix that still satisfies the caller's own has_payload predicate (profileLookup.payloadOffsets):

document 703 662 B -> needed 458 752 B (65%)
document 805 714 B -> needed 327 680 B (41%)
document 703 195 B -> needed 458 752 B (65%)
document 705 437 B -> needed 458 752 B (65%)
document 706 063 B -> needed 458 752 B (65%)

A third of every document was transferred for nothing. New _PROFILE_DOC_CAP_BYTES (470 KB) applies only where the payload is the profile object — details and contacts. A cap pays when the share of documents that fit exceeds cap / full (470/706 = 0.67); details measured 0 refetches in 4 reads, and wire bytes fell 58% against the session's starting build.

Posts and reels deliberately keep the full cap. They extract media nodes from the same document and those sit later: capping them measured 2 refetches in 5 reads, which is slice plus full page — the same trap already recorded on the post page at 250 KB. Confining the cap to the mode that was measured moved that scenario from 2/5 refetches back to 0/1.
documentReads / documentRefetches are now in the diagnostics, so the ratio that justifies this cap keeps being checked instead of assumed.

Measured and rejected — the HTTP 400 is account-side, not header-side

Proposal: the 400 is Instagram failing to serialise the account, and which fields get serialised depends on which app we present as — so a different X-IG-App-ID might route around it. Probed on the four profiles that bail on the double-400, result discarded:
```
appid_web_legacy (1217981644879628)   http_400 x4
appid_lite       (567067343352427)    http_400 x4
```
Zero resolutions. The response serialiser is not what varies, so no header lever exists and the public document really is the only remaining path for these accounts. The probe cost 8 tiny requests (+10 KB on the run).

Measured and rejected — the logged-out search lookup is a trap

Proposal: replace the ~704 KB public document with a cheap search lookup for the profiles whose web_profile_info returns 400, since search_targets calls www's topsearch the public variant and a cookie'd run resolved a profile there in 7 101 B. It was probed on 5 real logged-out profiles with the result discarded, so the runs behaved normally:

resolved                       0 of 5
www /web/search/topsearch/     5 attempts, all 4xx,     210 B
i.instagram /users/search/     5 attempts, all 4xx, 104 580 B
www /api/v1/users/search/     10 attempts, no 4xx, 4 013 431 B

The last row is the trap: logged-out that endpoint answers 200 with a ~400 KB login wall instead of an error, so the "cheap" lookup would have added ~800 KB per failing profile on top of the document it was meant to replace. The 7 KB figure only holds with a cookie. The ig.is_logged_in gate stays, now with the measurement recorded next to it.

Reverted — identity rotation is not the lever (measured negative result)

The hypothesis: the low logged-out success rate came from identity reuse — rotation happened only between rounds and on explicit blocks, so plain 4xx refusals retried the second host from the exit IP that had just refused us, and seed_session warmed only the first persona. Giving every attempt its own freshly-rotated and seeded identity changed nothing: success rate 13% -> 12%, document fallbacks 4 of 8 both times, blocks 9 -> 13, and the run got 20% slower from the extra seed round-trips. Reverted, and written down in tests/test_lookup_identity_rotation.py so it is not re-attempted.

Replaces every count-based guard against burning money with one that is actually denominated in money.

Changed — the run now keeps a ledger instead of counting requests

_run_is_futile, _FUTILITY_MIN_ATTEMPTS (120 requests) and _FUTILITY_MIN_TARGETS (8 targets) are gone, along with the breaker's logged-out binding. Both floors were proxies for cost that could not be calibrated: measured on real runs, one request costs between 15 B (a gated web_profile_info reply) and 707 135 B (a profile document), so the same "120 requests" trigger meant anywhere from $0.000003 to $0.14 — a 47 000x spread. New src/ledger.py tracks the two numbers that matter and takes both from measurement, not estimation:
- earned — ChargingManager.calculate_total_charged_amount(), the platform's own accounting, which includes the implicit apify-default-dataset-item event the SDK charges on every push (verified in apify/storage_clients/_ppe_dataset_mixin.py), i.e. today's live revenue;
- spent — real wire bytes from libcurl's SIZE_DOWNLOAD_T (the compressed size the residential proxy actually bills) plus memory-seconds.
The rule is budget = earned + stake x targets_not_yet_finished - spent, and the properties that used to be hand-built fall out of it: a one-target run is bounded at a single stake (the case the 8-target floor structurally could not reach), a large run is not abandoned because its first targets were duds (50 pending targets carry 50 stakes — the old floor's real job, at a false-abort risk that reached 12% on a dud-heavy list), and a productive run can never be cut because earned grows with every charged row. Enforcement sits at IgClient._consume_http_attempt, the single chokepoint every network call passes through, and winds down gracefully exactly like the throttle breaker — nothing already scraped is discarded.

Added — real wire bytes, so cost stops being a guess

HttpStats.wire_bytes records what each response cost on the wire, next to the decompressed bytes used for attribution. Measured compression on the dev runs came out at 4.39x / 4.83x / 5.81x, retiring the standing "divide by about 5 or 6" assumption. A byte-capped read that libcurl aborts has no response to read a wire size from, so its decompressed size is converted with the run's own observed ratio rather than a constant.
SUPPORT_DIAGNOSTICS.http.ledger carries the full balance — earned, spent, proxy vs compute split, allowance, wire bytes, observed compression. This is the data that tightens the stake later.
The wind-down now records why: run_budget_spent versus throttle_circuit_open, so a money stop is distinguishable from a throttling stop in the diagnostics.

Verified

Ledger accounting vs the platform's settled usageTotalUsd on three dev runs: 0.75-0.90x. It is a deliberate lower bound — it does not model dataset/KVS operations or the container start that precedes it. See the accuracy table in src/ledger.py.
The guard is disabled unless the run has pay-per-event pricing, so local and dev runs are never cut; all three dev runs confirmed ledger.enabled=false and ran untouched. That also means live enforcement is covered by unit tests (tests/test_run_ledger.py, 17 cases) rather than by a platform run — the dev actor is not monetised, so there is nothing there to weigh spending against.

Fixed — two accounting errors in the ledger itself

Revenue was counted at the list price. calculate_total_charged_amount() reports what the user is billed; Apify keeps apifyMarginPercentage (0.2), so the budget was 25% more generous than the money that actually arrives. Scaled by APIFY_REVENUE_SHARE.
A free-plan user's run is no longer guarded at all. Platform unit costs depend on the end user's discount tier, not ours, and "platform usage by FREE tier users is covered by Apify and does not contribute to your costs" — such a run costs nothing while still earning the free-tier price, so constraining it would be strictly wrong. The tier is recognised from the resolved per_event_prices (our free-tier ladder is the highest in each event's ladder, so a match is unambiguous). Paid tiers differ by at most 15% (BRONZE $0.20/CU + $8/GB, SILVER $0.16 + $7.5, GOLD $0.13 + $7), which is immaterial against a stake in whole cents, so everything paid is priced at BRONZE — the most expensive, i.e. the conservative direction.

Removed

tests/test_futility_breaker.py — the thresholds it pinned no longer exist. tests/test_run_ledger.py pins the properties instead, including the two the old breaker could not express: a bounded single-target run, and a decision that does not depend on login state.

Stops paying twice for the most expensive transfer in the actor. A logged-out profile that Instagram gates went from 19 requests / 8.49 MB to 10 requests / 2.12 MB — a −75% cut on the run shape that produces no results at all, and so earns nothing to offset it.

Fixed — the byte cap sat below the documents it was capping

Per-endpoint counters from every real run in load_reports put the logged-out profile document at 706 012 / 706 483 / 707 135 bytes and the post comment page at 690 230 / 709 391 / 739 338 bytes. _HTML_FETCH_CAP_BYTES was 700 000 — i.e. ~1% below both. Every capped read was therefore truncated, and a truncated slice carrying no payload (the login-wall case, where no payload exists anywhere in the document) paid a full refetch of the very same URL. The cap is now 900 KB, above the documents' real size, so truncated is False and the refetch is skipped. Cost when the payload is in the head: +7 KB per document — two orders of magnitude less than what it saves. This is the same trap already measured on _POST_HTML_CAP_BYTES, now closed on the other side.

Fixed — the same three documents are no longer bought twice per target

The user-object lookup (_fetch_profile_user_from_html) and the record-emitting fallback (_scrape_profile_from_public_html) each walked the same three profile-document URLs. Both extract with _extract_profile_user_from_blocks — the lookup's predicate is strictly broader (it also accepts a bare id via regex) — so a second download can never find something the first missed; only a different exit IP could. The lookup now hands the documents it downloaded to the fallback (on the raised error, beside the existing .retryable flag), which parses them for free and requests only the URLs it hasn't already got. The other entry point — API resolved the user but the feed returned no posts — never ran the lookup, so it still fetches its own documents exactly as before.

Fixed — the "second chance" walk ran on the target's most-burnt IP

The fallback started from ctx.session_id, the target's original sticky session — the very first exit IP Instagram had already gated — and only rotated on SessionBlockedError, which a 200 login wall never raises. So the walk that existed to retry on a fresh IP retried on the worst one, and never moved. Meanwhile the lookup's walk ran all three variants on a single persona. Now each document variant is tried on its own fresh persona: same three requests, three distinct exit IPs instead of one, so the recovery chance goes up while the byte count goes down.

Verified on the dev actor (build 2.0.35, 3 runs, $0.011)

verify_handover — the deterministic repro: a reserved handle (/en) makes web_profile_info return HTTP 400 regardless of exit IP, so the lookup always walks into the HTML ladder and always fails it, which is the only path that hands documents over. Result: skipReasons.profile_document_reused = 1 and 3 requests / 601 721 B to www.instagram.com/:profile/ for a target that crossed both ladders — one download per URL variant, where the old code paid 6 at best and 12 when the document exceeded the cap. requeue.attempts stayed 0, so the terminal-failure classification from 2.0.10 still holds.
verify_details_control — 8/8 detail records, 0 failed targets: no data lost to the raised cap or the per-variant persona rotation. Documents averaged 704 306 B/request, i.e. above the old 700 000 cap, so they now arrive whole instead of truncated.
verify_gated_docs — 48 records over 6 profiles, 1 failed. Note for the record: neither failing target reached the handover path (mr_beast resolved via the API and failed later in the feed-empty fallback), so this scenario proves no regression but says nothing about the reuse fix. Live proof of the cap half is still indirect — it is covered by tests/test_profile_document_reuse.py and tools/cost_repro.py, since demonstrating it on Instagram would need the same noisy A/B the block variance makes worthless.

Added — reproduction harness

tools/cost_repro.py replays candidate run shapes through the real _run_target / _run_targets_bounded path against a scripted Instagram at the IgClient._session seam, counting requests and bytes with the actor's own HttpStats and pricing them with Apify's model (compute at 512 MB + $8/GB residential vs the live per-event price). --breakdown also dumps the run's exit-IP schedule, which is how the two bugs above were found.

[2.0.10] - 2026-07-25

Stops wasting proxy on futile whole-target retries, and starts measuring whether those retries ever pay off.

Changed — a dead/gone profile is no longer retried on fresh IPs

When a profile yields no data logged-out, the scraper already rotates through several exit IPs within that target (fresh bootstrap sessions + HTML docs). A whole-target requeue on top of that just repeats the same spent discovery on one more IP for the same zero. Such a failure is now terminal when it's deterministic (empty web_profile_info, HTTP 400, "no data", or a public page with no payload) — the URL fails once instead of three times, cutting proxy and time on misspelled/deleted/reserved handles. A failure caused by a real block or transient stays retryable (a fresh IP genuinely may clear it), so no recoverable result is lost. Within-target IP rotation is unchanged.

Added — requeue recovery is now measured

The support diagnostics now include a requeue block — {attempts, recovered, recoveryRate} — recording how many fresh-IP whole-target requeues ran and how many actually produced results. This turns "do retries help?" into a number, so the remaining (block-caused) requeues can be tuned from evidence rather than assumption.

[2.0.9] - 2026-07-25

Closes the "spend proxy, deliver nothing" gap with several layered, provably-safe guards — none of which can cut short a run that is producing data — plus a crash fix for Contacts runs that fell back to public HTML.

Fixed — Contacts no longer crashes on the public-HTML fallback (critical)

A Contacts run whose profile didn't resolve via the API (e.g. a short or reserved handle like /en, or a business account that returns HTTP 400) fell back to the public profile document — where a stale guard rejected the contacts result type and aborted that URL with
```
unknown resultsType: contacts
```
, even though the fallback already knows how to build a contacts record. The guard now accepts contacts, so such URLs resolve to their real outcome (a contact record when the profile is public, or an accurate "no data" when the page genuinely doesn't exist) instead of a misleading crash.

Added — a dead/gone target no longer burns fresh-IP retries

Permanently-gone profiles fail fast. Instagram's authoritative HTTP 404 ("user not found", e.g. a deleted or misspelled username) now raises a terminal error: the target fails once, with no fresh-IP requeue and no public HTML fallback, since neither can resurrect an account that doesn't exist. Stochastic logged-out gates stay retryable (a fresh IP still recovers those), so no gettable result is lost.

The startup cookie check proved the session was authenticated but not that it could read the login-only data the run needs. A live session can still answer require_login on relationship/story endpoints. Login-only runs now run a second capability probe against the endpoint matching the run's need (followers/following → own following list; stories/highlights → own reels tray). It condemns a cookie only on the same require_login/checkpoint signal the live scrape uses — never on a throttle or action-block — so a good cookie is never false-flagged, and a blind one is caught up front instead of per URL.

Added — a logged-out run that produces nothing stops early

A logged-out run that has spent real proxy across several completed targets yet saved zero rows is asking for login-only/private data it can't get logged-out. It now winds down gracefully with an actionable message. The zero-rows gate means a run that produces anything is never affected; a dual floor (requests and completed targets) prevents misfiring on a slow start.

Choosing a login-only result type (stories, highlights, followers, following, likers) without pasting a cookie now prints a clear notice at the start — every target would otherwise be silently skipped — telling you exactly how to fix it (paste a Cookie header, or pick a public result type).

[2.0.7] - 2026-07-24

A large reliability + polish release: fewer blocks on anonymous scraping, a critical cookie-validation fix, a cleaner Output tab, and much shorter logs.

Cookie validation used the wrong endpoint. The startup probe hit the mobile-app endpoint accounts/current_user, which rejects this scraper's web browser fingerprint (web X-IG-App-ID + browser User-Agent) with HTTP 400 useragent mismatch for every cookie — valid or not. Perfectly good credentials were flagged dead and the run fell back to logged-out, so every login-only mode (followers, following, likers, tagged, stories, highlights, hashtag, location, search) returned zero results. The probe now uses the web-native direct_v2/get_badge_count endpoint. Verified live: all nine cookie modes return data.
The probe no longer condemns a cookie on an ambiguous reply. Only a genuine require_login / login_required / checkpoint_required auth marker (or a logged-out payload) marks the cookie dead; a stray 4xx, UA quirk, 5xx, or network blip stays inconclusive and the cookie is assumed usable. A truly dead cookie is still caught on the first real login-gated request.
Gated profile fetches requeue on a fresh IP. A logged-out/age-gated profile response now raises a retryable error so the worker retries on a new exit IP, instead of silently emitting zero for that URL.

Added — quieter, harder-to-detect anonymous scraping

Per-session device identity (persona pool). Each proxy session now carries a stable persona — a browser fingerprint paired with a salted proxy session id — instead of one shared identity for the whole run. Personas are pooled (LRU), reused coherently, and proactively rotated after ~20 requests so no single identity accumulates suspicion. This replaces the old blunt device rotation.
Rate-based circuit breaker. The run watches its recent block rate over a rolling 40-outcome window and trips gracefully when blocks dominate (≥75 % over ≥25 samples), ending the run cleanly instead of hammering a hopeless IP. It resets on success, so a run that recovers keeps going.
Adaptive concurrency governor (AIMD). Concurrency starts at the configured max, halves the moment a block is seen, then climbs back one step per clean window — automatically finding the fastest rate that Instagram tolerates for each run (speed and fewer blocks).
Barren-page give-up. Pagination stops after too many empty/blocked pages instead of grinding the budget to zero.

Changed

Coherent request host. www.instagram.com/api/v1 is now the primary host (coherent with the browser fingerprint the client presents); i.instagram.com is only a fallback. Fewer logged-out blocks.
Removed the block-cap / global-budget "crutches". The hard per-run block cap is gone in favour of the circuit breaker above; the global rate-limit budget is now a soft backstop with its default raised 50 → 300.
One unified "📋 Results" Output view. The per-mode dataset views are replaced by a single curated Results view. This fixes cross-filled columns (e.g. comment fields appearing on a Posts run because both shared an id field) and adapts to each mode automatically — Apify hides all-empty columns. Niche/technical fields still live in the auto-generated "All fields" tab.
images lists distinct images, not size variants of one image (carousel children, or the single display image). Size variants moved to displayResourceUrls.
Much shorter, readable logs and run status. Status messages and log lines are trimmed to concise one-liners; verbose diagnostics were demoted to debug.
Gentler feed pacing between paginated requests.

Removed

Audio-track scraping (/reels/audio/<id>/). This target type was unreachable through the mode-driven input flow and has been removed along with its dataset/input-schema mentions. Reel audio metadata (musicInfo, hasAudio, audioId, …) is unaffected and still emitted on reel records.

[2.0.6] - 2026-07-23

Invalid/expired cookies are detected up front, not per target. A single cheap accounts/current_user probe runs once when cookies are supplied. If the cookie is dead/expired/checkpointed the run flips to logged-out: login-only targets are skipped before scraping and the private-profile short-circuit re-enables. Previously an unusable cookie stayed undetected and every login-gated URL rediscovered it the hard way.
Auth failures no longer trigger a proxy-rotation storm. A require_login / checkpoint_required response while a cookie is installed is a dead credential, not a blocked IP — rotating the exit IP can't fix it. It now raises a distinct CookieInvalidError that bails the target immediately (no rotation, no requeue) and marks the run's login dead so other login-only targets short-circuit without a single request. Genuine logged-out IP blocks still rotate exactly as before. Measured on a dead-cookie Followers run: proxy-session rotations 20 → 0, bytes downloaded 7.3 MB → 0.4 MB, wall time 83 s → 10 s, all for the same (zero) billable results — pure cost removed.

[2.0.1] - 2026-07-23

Fixed — speed & clarity

Private profiles are ~10x faster. A private account never serves posts to a logged-out session, yet the scraper used to hammer its feed (retries + block budget) and then fetch its full HTML before giving up. It now short-circuits to "no posts" the moment web_profile_info reports the profile is private (unless cookies are present). On a 4-profile test with 2 private accounts, wall time dropped from ~110s to ~11s and Instagram blocks from 6 to 1.
Faster recovery from rate-limits. A retry already rotates to a fresh proxy IP, so the "polite" 3s backoff on the burnt IP just stalled the run — a fresh residential IP is rarely still throttled. The fallback backoff is now ~1s (an explicit Retry-After from Instagram is still honoured), so runs that hit Instagram throttling on bad exit IPs finish noticeably quicker.
Honest result count. A profile that resolves but returns nothing (private / empty / typo'd username) is now reported: "saved N records from X of Y URLs — Z returned no data …" instead of a misleading "from Y URLs".
Friendlier throttle note. The end-of-run block-budget line only appears when Instagram actually throttled the run, and now nudges toward adding a cookie for faster, block-free runs.
No "unknown event" warning during a pricing change. Charging is skipped for events not yet in the run's active pricing (e.g. new events inside Apify's 14-day notice window), so the SDK no longer logs a scary warning per result.

[2.0.0] - 2026-07-22

Changed — buyer-facing docs rewritten for the mode-centric model

README fully rewritten for non-technical buyers (Apify Store style): capability tables per mode, 3-step how-to, output samples (post + Contacts record), cookie guide, use-case snippets, pay-per-result pricing, speed table, and FAQ (legality, private accounts, scheduling, API). All examples use the new input (resultsType mode + directUrls + resultsLimit); the old publicUrls/lockedUrls model and hidden knobs (repliesLimit, maxHttpRequests, overall cap) are gone.
Store description (actor.json) updated to the all-in-one mode list.
Dataset schema: added Contacts fields (emails, phones, links, bioMentions), user-list fields (isPrivate, isVerified) and story fields (expiresAt, highlightId), plus new Contacts and Followers/Following/ Likers dataset views. Output-schema description now lists every record type.

The input is now mode-authoritative. A single "What do you want to scrape?" dropdown (resultsType) is the mode; it decides both what the run produces and how every line in the one "Instagram URLs or usernames" field is read. A Followers run reads lines as usernames, a Hashtag run as tags, a Comments run as post URLs, and so on. Lines that don't fit the chosen mode are skipped with a note (surfaced in the run log + input warnings). Search mode ignores the URL field and uses the Search field instead. New src/modes.py holds the mode registry + per-mode line interpreters; the scraper engine and URL classifier are unchanged (the mode simply builds correctly-typed targets).
Modes exposed: profile details, contacts, profile posts, profile reels, post/reel detail, post comments 🔑, post likers 🔑, followers 🔑, following 🔑, tagged 🔑, stories 🔑, highlights 🔑, hashtag 🔑, location 🔑, search 🔑 (🔑 = needs your Instagram login cookie). Bare usernames, @handles, #tags and plain location ids all work now (the field's strict URL pattern was relaxed — the mode validates each line instead).

Added — lean Contacts fetch (logged-in only)

With cookies, Contacts mode fetches the compact users/{username}/usernameinfo/ endpoint (~10x less data than web_profile_info, which embeds the 12 latest posts + related profiles). Logged-out it's skipped — that endpoint serves a login wall, so trying it would add bandwidth; verified logged-out Contacts stays ~58 KB/record. This corrects an earlier misdiagnosis: the profile-fetch bandwidth is web_profile_info's rich payload, not a session seed (the guest qe/sync seed isn't even called for profile-only runs). New _fetch_profile_user_compact + _profile_shape_from_mobile.

Fixed — short-timeout runs no longer abort with 0 records

The soft time-budget reserved a fixed 90s before the hard timeout, so a run with a short timeout (e.g. 90s) got 90 - 90 = 0s of work and aborted before the first request, silently returning 0 records. The flush margin is now capped at 15% of the remaining budget (_TIMEOUT_MARGIN_FRACTION), so every run keeps ≥85% of its time; the full 90s reserve still applies once the budget exceeds ~600s. Verified on the platform: a 90s-timeout run now returns records.

Added — Contacts (lead-gen) mode

contacts mode emits a lean lead-gen record per profile: emails and phones mined from both the structured business fields and the free-text bio (where most creators actually list them), plus every outbound link (external URL + bio links). Billed as detail-scraped. New parsers.contacts_from_profile, extract_emails, extract_phones.

Fixed — business accounts no longer fail

Profile resolution survives Instagram's web_profile_info HTTP 400. Instagram returns 400 ("Asset ig_business_category_subvertical has been deleted") for many business/professional accounts, which used to sink every profile-based result (details, followers, following, stories, highlights, and the profile feed) and waste bandwidth via retries. The 400 is now soft-failed and the user id is resolved through layered fallbacks: users/search / web topsearch (cookie'd), then a regex pull of the id straight from the public profile HTML. Verified on the platform: business accounts (e.g. natgeo) that failed outright now resolve and return data.

Added — cost telemetry

Downloaded-bytes tracking per endpoint + run total, surfaced in SUPPORT_DIAGNOSTICS (http.bytesDownloaded, summary.bytesPerRecord). Lets us calibrate the real per-result-type proxy cost for cost-plus pricing. (These are decompressed body bytes; the proxy bills the ~5-8× smaller compressed wire size.)

Changed — pricing (pay-per-event, per result type)

Different result types now cost different amounts. Instead of one flat charge per saved row (which billed a cheap comment or bare follower like a full post), the actor charges an explicit pay-per-event event chosen by the run's resultsType: post-scraped (posts/reels), detail-scraped (details), story-scraped (stories/highlights), comment-scraped (comments), user-scraped (followers/following/likers). See src/billing.py for the suggested prices.
Owner action required to activate: in Apify Console → Monetization → Pay per event, add those five events with your prices and remove apify-default-dataset-item (otherwise the implicit per-row charge stacks on top). Until then charging is a safe no-op — nothing changes for local/dev runs or unmonetised deployments.
The run also stops early if maxTotalChargeUsd is reached, and SUPPORT_DIAGNOSTICS records which pricingEvent the run charged.

Added — relationship & list data

Followers / Following — pick resultsType: followers or following on a profile URL (or bare username) to list a user's followers / who they follow (id, username, full name, verified, private, avatar). Paginated, login-gated.
Post likers — resultsType: likes on a post URL lists who liked it.
Highlights (all) — resultsType: highlights on a profile fetches the whole highlights tray and each highlight's items, tagged with highlightId, highlightTitle, and highlightCoverUrl.
All four are login-only (need your Instagram cookies) and are marked cookie-required automatically. Note that follower / following / likers endpoints are aggressively rate-limited by Instagram — keep limits modest.

Changed — simpler, unified input

One input field for everything. publicUrls + lockedUrls are merged into a single Instagram URLs or usernames field — the scraper classifies each entry (profile / post / reel / hashtag / location / audio / tagged / reels-tab / stories / highlight) and auto-detects which need cookies, telling you which URLs were skipped when cookies are missing. (Old publicUrls / lockedUrls / directUrls inputs still work for saved tasks and the API.)
Bare usernames accepted. Paste nasa or @nasa instead of the full profile URL — expanded automatically.
resultsType: stories — pick Stories on a profile URL to grab that user's active stories (in addition to the existing /stories/<user>/ URL). Always needs cookies.
Fewer knobs, better defaults out of the box. The technical tuning fields (concurrency, requestRetryBudget, urlRetryBudget, globalRateLimitBudget, maxHttpRequests, dedupResults, skipPinnedPosts, repliesLimit, proxyCountry) are removed from the visible input and tuned automatically — the UI now shows only what you actually choose (what to scrape, what to extract, how many, cookies, search, date filters). All of them remain overridable via the API for power users.

Improved — speed (no loss of results)

Comment post page is fetched byte-capped. The comment-bootstrap page (the media id + relay tokens + preview comments) is now read with the same decompressed-byte cap + full-read fallback the single-post path already uses, so the tail of a 1 MB+ page isn't transferred when the top slice already carries the media id. Falls back to a full read only when the slice was truncated and yielded no media id — no data lost. (Response compression was already on everywhere via the Chrome impersonation.)
Comments query is probed once per run, not once per post. The working GraphQL comment query is cached on the client; every post after the first skips the multi-candidate probe (up to ~6 fewer requests per post), and the comment page size was raised 24 → 50 (fewer round-trips). Per-page fallback still switches candidates if the cached one ever stops working.
Single-post HTML fetch overlaps oEmbed + profile resolution. The public post page needs neither, so it now starts immediately and races the profile path instead of waiting for the two sequential lookups — posts resolve 1–2 round-trips faster. Owner-detail enrichment from the profile is backfilled onto an HTML win, so nothing is lost; the only cost is an occasional wasted fetch when the profile fast-path would have sufficed.
Hashtag / location recent + top tabs paginate in parallel. The two section feeds now run concurrently, each on its own proxy exit IP, sharing one atomic results-limit counter and dedup set — roughly halving wall-clock for cookie'd hashtag/location runs without exceeding the limit or emitting duplicates.

Added — Stories & Highlights

Scrape a user's active Stories via instagram.com/stories/<username>/ and a Highlight reel via instagram.com/stories/highlights/<id>/. Both are new cookie-required URL kinds (login-gated by Instagram) and support the Posts / Reels result types. Items come from feed/reels_media in one request (no pagination); each record carries the usual media fields plus
```
productType: "story"
```
, expiresAt, and (for highlights) highlightId, with dataSource/parentUsername/parentHighlightId under Add-source-info.

Fixed

Terminal errors during the single-post race are no longer swallowed. _race_first_success caught every exception, so a run-wide rate-limit / HTTP-budget exhaustion raised while scraping a post fell through to a basic oEmbed record and the run kept burning the remaining targets. Those two terminal errors now propagate (a good result already in the same batch still wins), so the entrypoint aborts as intended.

Added — filters, proxy, observability

onlyPostsOlderThan date filter. Upper date bound to pair with onlyPostsNewerThan for a date window. Feeds are walked newest-first, so too-new posts are skipped while only the lower bound ends the walk. An empty window (newer > older) is rejected up front.
proxyCountry input. Optional 2-letter ISO code to pin the managed Residential proxy exit country.
Per-outcome skipReasons in diagnostics. A low-cardinality tally of why a target yielded nothing (cookie-gated comments, changed comments API, unreachable post page, …), distinct from the per-endpoint HTTP counters, surfaced in SUPPORT_DIAGNOSTICS.
Human-readable RUN_REPORT HTML artifact. A self-contained (no external requests), XSS-safe page written to the KV store alongside the JSON bundle — per-URL results, skip reasons, HTTP stats, and input warnings at a glance.
CI workflow. GitHub Actions runs the unittest suite and a narrow pyflakes/syntax lint (ruff --select F,E9) on every push/PR.

Changed — internal

Input coercion/validation helpers moved out of main.py into src/input_coercion.py (re-exported for callers/tests); the entrypoint stays focused on orchestration.

Fixed — silent data loss

Unicode thousands-separators no longer null out counts. The output sanitiser stripped only ASCII spaces/commas from numeric strings, so a non-break / narrow-no-break / thin space (which Instagram embeds, e.g. 1 234) survived and made int() fail — silently dropping likesCount, followersCount, etc. to None. All Unicode spaces + BOM are now stripped.
Dropped records are no longer permanently deduped out. A shortcode was added to the dedup set before the push; if the record was malformed or its batch failed to flush, the post was marked "seen" forever and could never be re-emitted from another target. The reservation is now released on both paths (and a failed-flush batch no longer counts toward maxTotalResults).

Added — networking resilience & recovery

Resumable run state. Dedup set, emitted count, and completed-target set are checkpointed to the KV store on a timer and on Apify PERSIST_STATE, so a mid-run worker migration resumes instead of restarting — no duplicate rows (which would double-charge the dataset-item event) and no re-spent proxy budget.
Soft time-budget. The run now stops gracefully a margin before the Apify hard timeout, so the final dataset flush + status message + SUPPORT_DIAGNOSTICS are always written instead of being lost to a mid-loop SIGKILL.
Bounded input lists. publicUrls / lockedUrls gained a maxItems (UI) plus a runtime cap of 1000 URLs (API / saved-task callers), so an oversized paste can't become a runaway, costly run.
Input adjustments surface to the user. Clamped / retyped / ignored input values now appear as inputWarnings in SUPPORT_DIAGNOSTICS and a note in the final status message, not only the run log.
Byte-capped HTML reads. New IgClient.get_limited aborts a page transfer after a decompressed-body cap (compression left on, so the compressed download stops early — unlike a naive identity slice that would out-transfer a gzipped page). The post HTML path fetches capped first and falls back to a full read when the media node lies beyond the cap, so it saves the tail transfer on the common case without ever losing data.
Requeue failed targets for a fresh-IP retry. A target that fails without a terminal error is re-queued to the back of the worker pool up to twice, with a short recovery delay and a salted proxy session, so a target caught by a transient proxy-pool brownout is rescued once the pool recovers instead of being recorded as a hard failure.
Proactive per-exit-IP request spacing. IgClient._throttle spaces consecutive requests that share one proxy session id by a small minimum interval (keyed per IP, so concurrent targets still run in parallel). It pre-empts throttling on the tight non-paginated bursts a target makes (oembed → profile → HTML) and complements the adaptive pagination pacing, which already spaces the paginated requests.

Added — data completeness

GraphQL posts now carry the fields that were previously hardcoded empty — carousel childPosts (from edge_sidecar_to_children), preview firstComment/latestComments, reel musicInfo, isPinned, captionIsEdited, and fbid are now extracted from web_profile_info top-12 nodes and xdt_shortcode_media/Polaris HTML nodes. Carousels no longer collapse to a single image on the GraphQL path.
locationLat / locationLng on post/reel records when Instagram tags a place with coordinates (v1 + GraphQL paths).
Richer musicInfo — added audioClusterId, coverArtUrl, and durationInMs alongside the existing artist/song/original-audio fields.
Hashtag & location feeds now paginate past the embedded first page via the /tags/<tag>/sections/ and /locations/<id>/sections/ cursor feeds. Cookie'd runs collect the full feed instead of just the first ~12–18 posts; results are deduped by shortcode and preserve feed order.
repliesLimit input — deep-paginate reply comments under each top-level comment via the v1 child-comments endpoint. Default 0 keeps only the replies Instagram ships inline (no extra requests, no behaviour change); a higher value fetches up to N replies per comment. The endpoint is login-gated, so it needs your own Instagram cookies in practice and falls back to the inline replies on any block — never a regression for logged-out runs.

Improved — single-post HTML fallback quality

Fallback tiers now cross-merge instead of returning at the first hit. When Instagram withholds a coherent GraphQL node from logged-out clients, the JSON-LD (full caption + exact timestamp + counts), raw-HTML regex sweep (location, dimensions, flags, play count), and Open Graph (thumbnail) tiers are combined into one maximally-complete record, preferring the richest source per field. Structural fields (location/dimensions/play count) are no longer dropped just because a text-rich tier won, and type is never left as Image when any tier saw a Video/Sidecar.
videoPlayCount is now populated on the HTML-regex fallback path and coerced by the output sanitiser (it was declared integer but never coerced).

Improved — networking speed / anti-throttling

Collision-safe proxy session ids. Session ids (the handle on one residential exit IP) are now normalised through a shared helper that keeps a readable head plus a digest of the full id when it exceeds Apify's 50-char cap, instead of a blind truncation. Two targets whose keys shared a long prefix could previously collapse onto the same truncated id — and thus one exit IP — quietly defeating per-target IP isolation; they now stay distinct.
Fast rotation on proxy-anchored errors. BoringSSL handshake failures and CONNECT tunnel failed errors are anchored to a dead exit IP that a same-IP retry can't heal. These now skip the exponential-backoff pause and rotate to a fresh proxy session immediately, instead of stalling seconds on a burnt IP.
Rotating browser fingerprint. The curl_cffi impersonation profile is picked from a small pool of Chrome builds per run instead of a single pinned chrome, trimming the logged-out bot signature.
Adaptive (AIMD) pagination pacing. The pause between feed/comment/section pages is no longer a fixed random 0.4–1.6 s on every page. It now starts at a low floor (~0.3 s) so a healthy sticky IP paginates fast, drifts back toward the floor on every clean page, and grows multiplicatively (×2, up to ~3 s) the moment Instagram pushes back (429, proxy-session block, or transient error). Result: deep single-target feeds finish substantially faster in the common healthy case, while throttling automatically slows the walk instead of burning the session-block cap and losing results.
Default parallelism raised from 4 to 8 URLs. Each URL scrapes through its own sticky proxy IP, so concurrent targets are independent — this mainly speeds up multi-URL runs with negligible added block risk. Still tunable via the concurrency input (1–32).

Improved — networking robustness / anti-throttling

Run-wide throttle budget is consumed at most once per request() instead of once per in-loop retry, so a single hot request can no longer burn several budget units and terminate the whole run (killing healthy targets).
Retry-After is honoured on 429 responses (capped at 30 s); backoff now has a small non-zero floor instead of a 0 s minimum.
X-ASBD-ID is sent on every request and the x-ig-set-www-claim value Instagram issues is captured and echoed back as X-IG-WWW-Claim — both reduce the logged-out bot signature and delay throttling under load.
HTTP 403 login-walls are treated as session blocks (free proxy-session rotation) instead of burning the per-URL retry budget.
Tagged feed no longer discards progress on a mid-feed blip. A transient failure on a later tagged (usertags) page now retries the same cursor on a fresh proxy session instead of throwing away the posts already collected and dropping to the un-paginated HTML fallback. Only a first-page failure falls back to HTML.
Guest-cookie seeding is retryable — a single failed qe/sync no longer permanently marks the client seeded and degrades every downstream cookie-gated request; it retries (up to 3×) until a csrftoken lands.
Output sanitiser now cleans nested carousel-child fields (control chars in a child's tagged-user names) that previously escaped the depth-4 recursion cap.

Added

New post/reel fields — inputUrl (the input URL that produced each record), videoPlayCount (separate from videoViewCount), isPinned, isSponsored + isPaidPartnership (alongside the existing isAdvertisement), coauthorProducers (collab authors), musicInfo (reel audio: artist/song/original-audio flags), and images (alias of displayResourceUrls). All additive — existing field names are unchanged.
New comment fields — postUrl, commentUrl, postShortcode, and replies (reply comments embedded inline with the parent, no deep pagination).
skipPinnedPosts input — skip posts pinned to the top of a profile / reels feed (they can be old and are returned first).
Run-wide HTTP safety budget - new maxHttpRequests advanced input caps outbound Instagram HTTP attempts, including retries and session seeding.
HTTP diagnostics - SUPPORT_DIAGNOSTICS now includes request attempts, successes, 4xx/5xx, 429, JSON decode, and budget-exhaustion counters.

Fixed

Single-post data degradation — when the author's profile fetch was blocked (or oEmbed itself was blocked), the scraper skipped the public HTML-page path and fell back to a thumbnail-only oEmbed record. Posts now always try the HTML page, which carries likes/comments/caption independently of oEmbed and the profile endpoint.
Pinned posts truncating date-filtered runs — a pinned old post at the top of a profile/reels feed no longer stops the onlyPostsNewerThan walk and drops all the newer posts behind it.
One blocked URL taking the whole run down — proxy-session-block rotations are now bounded per target, so a persistently-blocked URL gives up gracefully instead of rotating until it drains the run-wide rate-limit budget (which was terminal for every other target).
Concurrent cookie seeding race — guest-cookie seeding is now serialised, so parallel targets wait for the real cookie fetch instead of firing cookie-gated requests (comments, tagged, search) before the jar is populated.
maxTotalResults accuracy — the cap now counts only records accepted for writing (not malformed/dropped ones) and no longer transiently under-reports mid-flush.
Reels fallback no longer silently drops timeline reels whose productType is null/feed rather than clips.
Fixed a possible KeyError when an image candidate lacked a url, and a malformed sub-second timestamp in the JSON-LD path.

Improved

Large URL batches now run through a fixed-size worker pool instead of creating one asyncio task per URL.
Active pagination loops now stop when the run-wide abort flag is set, so maxTotalResults, rate-limit stops, and HTTP-budget stops waste less network traffic.
Comments extraction now finds nested Polaris preview comments inside the current logged-out post HTML. This fixes high-load Comments runs that previously saved 0 rows even when public preview comments were present.
Comment records now keep a numeric postId when Instagram wraps the real media object in a non-media POLARIS container.
Comments diagnostics now fail loudly when Instagram reports comments but exposes no public comment rows, and the support checklist points to comment API gating / cookie retry instead of a generic empty-feed hint.
Paginated profile posts/reels, comments, user reels, audio, and tagged feeds now retry the same cursor after transient page failures and rotate the proxy session for the next attempt. This prevents large runs from stopping early after a temporary 401/timeout poisons the cursor guard.
Added an internal profile HTML probe that records redaction-safe structural fingerprints from Apify's residential network path when Instagram changes logged-out Polaris/Relay profile payloads.

[1.0.0] — 2026-05-14

Initial public release.

Added

8 supported URL kinds — profile, post (/p/), reel (/reel/), user reels (/{user}/reels/), tagged feed (/{user}/tagged/), hashtag (/explore/tags/), location (/explore/locations/), audio (/reels/audio/).
resultsType values: posts, reels, details, comments.
search + searchType (hashtag / profile / place) + searchLimit — free-text discovery, resolved through Instagram's public topsearch.
onlyPostsNewerThan — date filter accepting ISO (2026-01-15) and relative (7 days, 2 months, 1 year).
addParentData — annotates every record with its dataSource
- parent (username / tag / location id) when scraping multiple sources.
maxTotalResults — overall cap on emitted records, independent of per-URL limits.
dedupResults — automatic dedup by shortCode across sources.
HTTP client — curl_cffi.requests.AsyncSession(impersonate="chrome") with sticky residential proxy sessions per target URL (single coherent IP per URL, fresh IP on retry).
Three independent retry budgets — per-request, per-URL, run-wide rate-limit budget; budgets are tracked and surfaced to the user.
5-tier fallback for single-post URLs — oEmbed → web_profile_info top-12 → feed/user pagination (parallel race) → /p/<sc>/ HTML page (Polaris JSON / JSON-LD / regex sweep / OG meta tags) → oEmbed basic.
Comments fetching without auth — Polaris HTML parse for preview comments + graphql/query POST pagination with rotating doc_id candidates and guest-session cookies seeded via qe/sync/.
Logged-out profile HTML fallback - profile URLs now try the public /{username}/ browser document after web_profile_info returns no data, extracting embedded Polaris/GraphQL profile and timeline JSON when Instagram ships it without login cookies.
Optional user cookies - users can paste an Instagram Cookie header, Cookie-Editor JSON export, JSON object, or Netscape cookie file into the secret sessionCookies input when Instagram hides data from logged-out clients. Values are redacted from logs; only installed cookie names are shown.
Cookie setup guide - the input UI and README now explain exactly how to copy a Cookie header from Chrome / Edge DevTools, with a warning that cookies are sensitive and should only come from the user's own account.
Clearer auth wording - public descriptions now say that public data works without login and optional user cookies are supported for profiles Instagram hides from logged-out users.
Batched streaming output — BatchedDatasetWriter flushes ~50 records every 5 s, draining cleanly on shutdown.
Output sanitisation — NULL byte / control char stripping, oversized caption / list truncation, id-field coercion, 9 MB-cap guard.
URL normalisation + dedup — accepts m.instagram.com, mixed case, query strings, missing protocol, etc.
Pay-per-event ready — Apify's default dataset-item event charges emitted records automatically when PPE pricing is configured.
User-friendly logs + status messages — emoji-prefixed structured output with explicit error guidance ("switch to RESIDENTIAL proxy", "double-check the username spelling", …).
Support diagnostics — runs now write a redacted SUPPORT_DIAGNOSTICS JSON record to the default key-value store with target result counts, failure reasons, proxy/cookie flags, rate-limit budget, and troubleshooting checklist. This helps debug user reports when the Actor owner cannot access the user's run.
124 offline unit tests covering parsers, sanitisation, retry budgets, URL classification, HTML / OG / JSON-LD extraction, profile cache, dataset writer, date filter, shortcode round-trip.
dataset_schema.json — full JSON Schema description of every output field + 6 named views (Overview, Posts & Reels, Profiles, Comments, Hashtags, Locations). Each view ships its own table layout (labels + format hints), so the Apify Console picks the right columns automatically for the chosen resultsType.

Performance highlights

Profile (50 posts): 5–10 s
Single recent post: 3–5 s
Single old post (4+ years): 10–15 s via parallel HTML race
Comments on one post: 15–30 s

Known limitations

Comments require a residential proxy (mobile API is auth-gated; we parse the www.instagram.com HTML page instead).
Old posts beyond ~165 of the author's feed return ~18–22 fields instead of the full 36+ — the GraphQL media node isn't shipped to logged-out clients, so we reconstruct from a scoped regex sweep of the HTML page.
Stories require an authenticated session — not supported in this release.

Instagram Scraper — Posts, Profiles, Followers & Hashtags

pro100chok/instagram-scraper-all-in-one

The all-in-one Instagram scraper: extract profiles, posts, reels, comments, likes, followers, following, tagged, stories, highlights, hashtags, locations, search and bio emails/contacts — from one Actor. Clean structured data with full pagination, exported to JSON, CSV or Excel. Pay only per result.

Raven

5.0

Instagram Scraper

singhera07/instagram-scraper

Scrape Instagram profiles, posts, reels, stories, highlights, and download media. All-in-one actor, no login required.

Umer Singhera

190

Instagram Intelligence Suite

qaseemiqbal/instagram-intelligence-suite

Scrape public Instagram profiles, posts, reels, comments, hashtags, keywords, locations, followers, and following lists.

Muhammad Qaseem Iqbal

Instagram Scraper - Profiles, Posts & Hashtags

pear_fight/instagram-scraper

Scrape Instagram profiles, posts, reels & hashtags. Extract followers, bio, captions, likes, comments, engagement metrics, image URLs. No login or API needed. Bulk scraping supported. Pay per result. Export JSON/CSV.

Harald

Instagram Posts Scraper

api-ninja/instagram-posts-scraper

🔥 Scrape Instagram user posts, reels, tagged content, or reposts from one or more public profiles.

API ninja

137

5.0

Instagram Scraper

muhammetakkurtt/instagram-scraper

Scrape Instagram posts, profile info, stories, and highlights from public profiles. Enter one or more usernames or profile URLs to extract structured Instagram data without login, cookies, or browser setup, then export results via Apify datasets, API, scheduling, and integrations.

Muhammet Akkurt

185

5.0

Instagram Profile & Posts Scraper

scrapers_lat/instagram-scraper

Scrape public Instagram profiles and posts by username, no login. Extract followers, following, bio, verified badge, business category, captions, likes, comments, video views and media URLs. Export to JSON, CSV or Excel.

Scrapers Lat

5.0

Instagram Scraper

rupom888/instagram-scraper-js

Scrape Instagram posts, profiles, hashtags, reels, and location pages. Extract likes, comments, captions, media URLs, author stats, and more. No Instagram login required for public content.

Syed Rupom

Instagram Scraper - Profiles, Posts, Reels & Comments

aiscraperdev/scrape-instagram-profiles-posts-reels

Extract Instagram profiles, posts, reels, comments, and hashtags without login or API access. Input a direct URL, username, or hashtag and get structured JSON/CSV data including captions, likes, views, followers, and engagement metrics. Fast, reliable, no authentication required.

Randeep Dhillon

Instagram Downloader API – Reels, Stories & Highlights

snapinsta/instagram-downloader-api

Instagram downloader API for public Reels, videos, photos, Stories, Highlights, and carousels. Get direct media links in clean JSON without login.

Snapinsta

Instagram Scraper

Changelog

[2.0.14] - 2026-07-27

Added — you can now say which country you use Instagram from

Headline, on identical input (8 ordinary public profiles, details)

Added — the profile lookup now reports which attempt actually wins

Changed — the lookup ladder is two rounds, not three

Changed — a double HTTP 400 ends the ladder immediately

Measured, on identical input (8 ordinary public profiles)

Changed — details and contacts stop downloading a third of the profile document

Measured and rejected — the HTTP 400 is account-side, not header-side

Measured and rejected — the logged-out search lookup is a trap

Reverted — identity rotation is not the lever (measured negative result)

Changed — the run now keeps a ledger instead of counting requests

Added — real wire bytes, so cost stops being a guess

Verified

Fixed — two accounting errors in the ledger itself

Removed

Fixed — the byte cap sat below the documents it was capping

Fixed — the same three documents are no longer bought twice per target

Fixed — the "second chance" walk ran on the target's most-burnt IP

Verified on the dev actor (build 2.0.35, 3 runs, $0.011)

Added — reproduction harness

[2.0.10] - 2026-07-25

Changed — a dead/gone profile is no longer retried on fresh IPs

Added — requeue recovery is now measured

[2.0.9] - 2026-07-25

Fixed — Contacts no longer crashes on the public-HTML fallback (critical)

Added — a dead/gone target no longer burns fresh-IP retries

Added — the cookie is probed for capability, not just liveness

Added — a logged-out run that produces nothing stops early

Added — up-front notice when a login-only mode has no cookie

[2.0.7] - 2026-07-24

Fixed — valid login cookies are no longer rejected (critical)

Added — quieter, harder-to-detect anonymous scraping

Changed

Removed

[2.0.6] - 2026-07-23

Fixed — a dead cookie no longer burns resources for zero results

[2.0.1] - 2026-07-23

Fixed — speed & clarity

[2.0.0] - 2026-07-22

Changed — buyer-facing docs rewritten for the mode-centric model

Changed — mode-centric input (one dropdown decides everything)

Added — lean Contacts fetch (logged-in only)

Fixed — short-timeout runs no longer abort with 0 records

Added — Contacts (lead-gen) mode

Fixed — business accounts no longer fail

Added — cost telemetry

Changed — pricing (pay-per-event, per result type)

Added — relationship & list data

Changed — simpler, unified input

Improved — speed (no loss of results)

Added — Stories & Highlights

Fixed

Added — filters, proxy, observability

Changed — internal

Fixed — silent data loss

Added — networking resilience & recovery

Added — data completeness

Improved — single-post HTML fallback quality

Improved — networking speed / anti-throttling

Improved — networking robustness / anti-throttling

Added

Fixed

Improved

[1.0.0] — 2026-05-14

Added

Performance highlights

Known limitations

You might also like

Instagram Scraper — Posts, Profiles, Followers & Hashtags

Instagram Scraper

Instagram Intelligence Suite

Instagram Scraper - Profiles, Posts & Hashtags

Instagram Posts Scraper

Instagram Scraper

Instagram Profile & Posts Scraper

Instagram Scraper

Instagram Scraper - Profiles, Posts, Reels & Comments