Threads Reply Scraper — Conversation Graph avatar

Threads Reply Scraper — Conversation Graph

Pricing

Pay per event

Go to Apify Store
Threads Reply Scraper — Conversation Graph

Threads Reply Scraper — Conversation Graph

Export the full reply tree of any public Threads post — no Meta login — as a conversation graph plus engagement counts, to JSON or CSV. A Threads post scraper built on the threads.net SSR HTML payload. We retry and rotate so the thread lands.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Threads Reply Scraper

Threads Reply Scraper — Full Conversation Graph Export

We do the dirty work so your dataset stays clean. 😈

$5.05 / 1,000 rows — Export the full reply tree of any public Threads (threads.net) post. No Meta account. No API key. Every visible reply chain, depth-linked, with engagement counts on every node — ready for conversation-graph analysis, brand triage, or NLP research.

Existing Threads scrapers on the Apify Store cap at ~20 posts per profile with no reply-tree expansion. This Actor goes the opposite direction: pick one post, get the whole conversation underneath it, parent-pointed and depth-indexed.

🎯 What this scrapes

You pass one or more Threads post URLs. For each post, this Actor:

  1. Fetches https://www.threads.net/@{username}/post/{code} with a real browser fingerprint.
  2. Extracts the server-rendered conversation payload Threads embeds inside <script type="application/json" data-sjs> blocks.
  3. Walks the payload's edges -> thread_items tree and emits one flat row per node — the root post plus every reply Threads inlined into the initial HTML, including nested chains (depth 2+).

Every row carries a row_type discriminator ("post" for the root, "reply" for everything else), a depth integer (0 for root, 1 for direct replies, 2+ for nested chain replies), and a parent_reply_id pointer so consumers can reconstruct the conversation graph with a single LEFT JOIN.

This is the threads.net scraper that goes where the official Threads API won't: third-party conversation trees, no developer-account review required.

FieldTypeDescription
row_typestring enum"post" for the root, "reply" for everything else
root_post_idstringThreads pk of the input post this row belongs to
root_post_urlstringInput URL (normalized) — same value on every row from one input
parent_reply_idstring | nullnull for root; root pk for direct replies; predecessor pk for nested
reply_idstringThis node's pk
reply_urlstringPublic Threads URL of this node
reply_textstringCaption body (empty string if media-only)
author_usernamestringAuthor handle
author_display_namestring | nullAuthor full name
author_user_idstring | nullInternal Threads user pk
author_followersinteger | nullFollower count when Threads surfaces it (usually only root author)
posted_atstringISO 8601 UTC timestamp
like_countintegerLikes at scrape time
reply_countintegerDirect reply count at scrape time
repost_countintegerRepost count at scrape time
quote_countintegerQuote-post count at scrape time
depthinteger0=root, 1=direct, 2+=nested chain
scraped_atstringISO 8601 UTC when the row was written

🔥 Features

  • Full reply tree per post — root + every reply chain Threads server-renders into the initial HTML, including nested depth-2/3+ chains.
  • Depth-linked outputdepth + parent_reply_id let you rebuild the conversation graph trivially in SQL, pandas, or networkx.
  • Engagement counts on every node — likes, direct replies, reposts, and quote-posts captured per post and per reply.
  • We rotate browser fingerprints — curl-cffi Chrome 131 TLS + HTTP/2 impersonation so the target sees a real browser, not Python. Fingerprint profiles cycle per session.
  • We rotate residential proxies — BUYPROXIES94952 residential pool is on by default; fresh session ID on every block. Meta bans datacenter IPs within minutes; we route around it.
  • We retry with exponential backoff — up to 5 attempts per URL on 408 / 429 / 5xx with Retry-After honoured. You get results, not empty datasets.
  • Per-post cost controlmaxDepth (1–10) and maxRepliesPerNode (1–500) caps so you pay exactly for what you need.
  • Pydantic v2 input validation — bad URLs, empty lists, and out-of-range caps fail fast before any network call, not after you've paid for a run.
  • Clean typed rows — ISO 8601 timestamps, stable pk-based IDs, nullable fields declared — no surprise nulls or mixed types in your dataset.

💡 Use cases

  • Brand reputation monitoring — pull the entire reply pile-on under a viral brand mention and triage by like count or reach in 2 minutes, not 90.
  • Crisis communications — export every visible reply to a controversial post for PR review without manual scrolling.
  • Social-listening dashboards — feed conversation-graph rows into Slack, Looker, Tableau, or Hex for real-time sentiment tracking on Threads.
  • Competitive intelligence — track reply sentiment under competitor product launches on Threads.
  • Creator analytics — see which of your own replies sparked sub-conversations vs which died after one comment.
  • Academic research — bootstrap conversation-tree datasets for NLP and argument-mining models from public Threads discussions.
  • Meta-policy research — measure conversation topology on policy-adjacent posts (fanout, nested-debate sub-threads, engagement decay).
  • OSINT investigation — track public discussion threads around named events or accounts.

⚙️ How to use it

  1. Open the Actor input form.
  2. Paste one or more Threads post URLs into Threads post URLs — e.g. https://www.threads.net/@mosseri/post/DYX3oNcAO4r. Up to 50 URLs per run.
  3. Set Maximum reply depth to control how deep into nested chains the Actor goes (default 3).
  4. Set Max top-level reply threads per post to cap how many direct replies are exported per post (default 50).
  5. Leave Use Apify Proxy ON — Meta blocks datacenter IPs within minutes. We handle the rotation.
  6. Click Start. Results stream into the default dataset in real time and are downloadable as JSON, CSV, Excel, or XML.

Finding a Threads post URL

Open any public post on threads.net or in the Threads app and copy the share link. The URL format is https://www.threads.net/@{username}/post/{code}. Trailing query strings (?xmt=...) and fragments are stripped automatically — paste the raw share URL straight in.

📥 Input

FieldTypeRequiredDefaultDescription
postUrlsarray of stringsyes1–50 Threads post URLs
maxDepthintegerno3Reply-depth cap (1–10)
maxRepliesPerNodeintegerno50Top-level reply cap per post (1–500)
useProxybooleannotrueRoute via BUYPROXIES94952 residential

Single-URL example

{
"postUrls": [
"https://www.threads.net/@mosseri/post/DYX3oNcAO4r"
],
"maxDepth": 3,
"maxRepliesPerNode": 50,
"useProxy": true
}

Batch example

{
"postUrls": [
"https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"https://www.threads.net/@threads/post/AAA",
"https://www.threads.net/@zuck/post/BBB"
],
"maxDepth": 2,
"maxRepliesPerNode": 25,
"useProxy": true
}

📤 Output

One flat dataset row per post or reply. Use row_type to filter to roots only, or depth to filter to direct replies or specific nested layers.

{
"row_type": "post",
"root_post_id": "3897828658278100523",
"root_post_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"parent_reply_id": null,
"reply_id": "3897828658278100523",
"reply_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"reply_text": "Does DMing people back help with reach?",
"author_username": "mosseri",
"author_display_name": "Adam Mosseri",
"author_user_id": "63482099442",
"author_followers": null,
"posted_at": "2026-05-15T13:36:48+00:00",
"like_count": 427,
"reply_count": 98,
"repost_count": 12,
"quote_count": 2,
"depth": 0,
"scraped_at": "2026-05-16T12:00:00+00:00"
}

Export formats

After a run completes, click Export in the Apify Console for JSON (full fidelity), CSV (flat — ideal for spreadsheets), Excel (.xlsx), or XML. All formats are available via GET /datasets/{id}/items?format=csv&clean=true on the Apify REST API.

Reconstructing the conversation tree

In SQL:

SELECT parent.reply_text AS parent_text,
child.reply_text AS reply_text,
child.depth,
child.like_count
FROM rows child
LEFT JOIN rows parent
ON child.parent_reply_id = parent.reply_id
WHERE child.root_post_id = '3897828658278100523'
ORDER BY child.depth, child.like_count DESC;

In pandas: df.merge(df, left_on="parent_reply_id", right_on="reply_id", suffixes=("", "_parent")).

💰 Pricing

Pay-Per-Event (PPE) — you pay only for what you scrape. No result → no charge beyond the small start fee.

EventPrice (USD)When
actor-start$0.05Once per run, at boot
result-row$0.005Per post or reply row written

Example costs

Rows scrapedActor startsTotal cost
1001$0.55
5001$2.55
1,0001$5.05
5,0001$25.05

A typical run on a single mid-sized post (root + ~50 direct replies + a handful of nested chains) emits 60–120 rows, costing $0.35–$0.65.

🚧 Limitations

  • No "Show replies" pagination. Threads renders some deeply nested replies behind a "Show replies" click that triggers a client-side XHR. This Actor emits exactly what threads.net serves in the initial HTML — typically the root, all direct replies, and any inline-expanded depth-2/3 chains Threads already included. Deeper hidden replies require their own XHR and are not fetched in this version.
  • No reposter user list. Threads renders the /reposts/ sub-page entirely client-side from a private endpoint using rotating tokens. Repost counts are captured on every row (repost_count), but the list of accounts that reposted is out of scope.
  • No quote-post bodies. quote_count is captured per row; the bodies of quote-posts referencing the input are not included.
  • No media (images, videos). Only reply_text is captured. Image ALT text, video transcripts, and external link cards are not extracted.
  • Private profiles / login-walled posts return zero rows. If the page returns a login wall instead of the conversation payload, the Actor logs a WARNING and skips that URL. Enable useProxy to maximise success rate.
  • Very large batches may encounter rate-limit windows. With residential proxy and per-URL session rotation the Actor handles single-post scrapes reliably. Batches larger than 20 posts in one run may trigger short pauses — we retry with exponential backoff up to 5 attempts per URL.
  • Not real-time. The Actor reads what threads.net serves in its current SSR HTML. Replies posted seconds before the scrape may not yet be inlined in that snapshot.
  • Apify FREE plan retains run-scoped storage for 7 days only. Export your dataset immediately after the run or use a named dataset to retain longer.
  • ToS responsibility. Meta's Terms of Service prohibit scraping. The threads.net post URL is publicly accessible without login, but you remain responsible for verifying your jurisdiction's data-protection rules and Meta's current Terms before using scraped data commercially.

❓ FAQ

Do I need a Threads or Instagram account?

No. The Actor fetches threads.net directly with a real Chrome browser fingerprint. No Meta login, no API key, no OAuth flow.

Is this a Meta Threads API alternative?

It is complementary to the official API. Meta's Threads API is gated behind a developer-account review and exposes only the post owner's own data — it does not support third-party conversation-tree reads. This Actor reads the same public SSR HTML any browser renders when visiting a threads.net post URL. Use both where each fits.

Does this work as a threads.net scraper for any public post?

Yes, as long as the post is reachable at its public URL without a login wall. Private accounts, deleted posts, and posts behind an age-gate return zero rows and a clear status message.

How deep into the reply tree does this go?

By default, depth 3 — root post (depth 0), direct replies (depth 1), and the first two layers of inline-expanded nested replies Threads embeds in the initial HTML (depth 2 and 3). Increase maxDepth up to 10 if you need every embedded chain.

Why isn't the reposter list included?

Threads' /reposts/ sub-page loads its user list via an internal client-side request with rotating tokens. Implementing that would create constant breakage as the tokens rotate. Repost counts are still captured on every row.

Why is residential proxy on by default?

Meta blocks repeated requests from the same datacenter IP within minutes. The residential pool rotates IPs per URL — that is the difference between consistent results and an empty dataset. We manage the rotation so you don't have to.

What happens if Meta blocks a request?

We retry with exponential backoff — up to 5 attempts per URL. If all retries fail or the page returns a login wall, that URL is skipped with a WARNING log and the run continues. If every URL fails, the Actor exits non-zero with a clear status message so you always know what happened.

Can I rebuild the conversation graph from the output?

Yes — every row carries parent_reply_id, so a single LEFT JOIN on reply_id reconstructs the tree. See the Output → Reconstructing the conversation tree section above for SQL and pandas examples.

Can I download threads replies as a spreadsheet?

Yes. After the run completes, click Export and choose CSV or Excel. The flat row-per-reply structure maps directly to a spreadsheet without any transformation needed.

💬 Your feedback

Found a bug, hit a rate-limit pattern, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.