Threads Reply Tree Scraper
Pricing
Pay per event
Threads Reply Tree Scraper
Export the full reply tree of any public Threads post. No Meta login. Conversation graph plus engagement counts via the threads.net SSR HTML payload.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Threads Reply Tree Scraper
We do the dirty work so your dataset stays clean. 😈
$5.05 / 1,000 rows — Export the full reply tree of any public Threads (threads.net) post. No Meta account. No API key. Get every visible reply chain, depth-linked, with engagement counts on every node, ready for conversation-graph analysis.
Existing Threads scrapers on the Apify Store cap at ~20 posts per profile with no reply-tree expansion. This Actor goes the opposite direction: pick one post, get the whole conversation underneath it.
🎯 What this scrapes
You pass one or more Threads post URLs. For each post, this Actor:
- Fetches
https://www.threads.net/@{username}/post/{code}with a real browser fingerprint (curl-cffiChrome 131 TLS + H2 impersonation). - Extracts the server-rendered conversation payload Threads embeds inside
<script type="application/json" data-sjs>blocks. - Walks the payload's
edges -> thread_itemstree and emits one flat row per node — the root post plus every reply Threads inlined into the initial HTML, including nested chains (depth 2+).
Every row carries a row_type discriminator ("post" for the root, "reply" for everything else), a depth integer (0 for root, 1 for direct replies, 2+ for nested chain replies), and a parent_reply_id pointer so consumers can reconstruct the conversation graph with a single LEFT JOIN.
| Field | Type | Description |
|---|---|---|
row_type | string enum | "post" for the root, "reply" for everything else |
root_post_id | string | Threads pk of the input post this row belongs to |
root_post_url | string | Input URL (normalized) — same value on every row from one input |
parent_reply_id | string | null | null for root; root pk for direct replies; predecessor pk for nested |
reply_id | string | This node's pk |
reply_url | string | Public Threads URL of this node |
reply_text | string | Caption body (empty string if media-only) |
author_username | string | Author handle |
author_display_name | string | null | Author full name |
author_user_id | string | null | Internal Threads user pk |
author_followers | integer | null | Follower count when Threads surfaces it (usually only root author) |
posted_at | string | ISO 8601 UTC timestamp |
like_count | integer | Likes at scrape time |
reply_count | integer | Direct reply count at scrape time |
repost_count | integer | Repost count at scrape time |
quote_count | integer | Quote-post count at scrape time |
depth | integer | 0=root, 1=direct, 2+=nested chain |
scraped_at | string | ISO 8601 UTC when the row was written |
🔥 Features
- No Meta account needed — uses public
threads.netSSR HTML; no login, no API key, no OAuth. - Full reply tree per post — root + every reply chain Threads server-renders into the initial HTML.
- Depth-linked output —
depth+parent_reply_idlet you rebuild the conversation graph trivially in SQL, pandas, or networkx. - Engagement counts on every node — likes, direct replies, reposts, quote-posts captured per post and per reply.
- Per-post cost control via
maxDepth(1–10) andmaxRepliesPerNode(1–500) caps. - Real Chrome 131 TLS + H2 fingerprint via
curl-cffiimpersonation — survives Meta's basic fingerprint checks. - Residential proxy (BUYPROXIES94952) ON by default — Meta blocks datacenter IPs aggressively.
- Pydantic v2 input validation — bad URLs, empty lists, and out-of-range caps fail fast before any network call.
- Apache-2.0 licensed; source code style locked by Devil Scrapes ADR-0003 (SRP, ≤40-line functions, no magic values).
💡 Use cases
- Brand reputation monitoring — pull the entire reply pile-on under a viral mention of your brand and triage by like count or top-level reach.
- Social-listening dashboards — feed conversation-graph rows into Slack, Looker, Tableau, or Hex for real-time sentiment analysis on Threads.
- Meta-policy research — measure conversation topology on policy-adjacent posts (e.g. how often does a public official's post attract nested debate sub-threads?).
- Creator analytics — see which of your own replies sparked sub-conversations vs which died after one comment.
- Crisis comms — quickly download every visible reply to a controversial post for PR review, without manual scrolling.
- Academic research — bootstrap conversation-tree datasets for NLP / argument-mining models from public Threads discussions.
- Competitive intelligence — track reply sentiment under competitor product launches on Threads.
⚙️ How to use it
- Open the Actor input form.
- Paste one or more Threads post URLs into Threads post URLs — e.g.
https://www.threads.net/@mosseri/post/DYX3oNcAO4r. Up to 50 URLs per run. - Set Maximum reply depth to control how deep into nested chains the Actor goes (default
3). - Set Max top-level reply threads per post to cap how many direct replies are exported per post (default
50). - Leave Use Apify Proxy ON unless you have a specific reason to disable it — Meta blocks datacenter IPs within minutes.
- Click Start. Results stream into the default dataset in real time and are downloadable as JSON, CSV, Excel, or XML.
Finding a Threads post URL
Open any public post on threads.net or in the Threads app and copy the link. The URL format is https://www.threads.net/@{username}/post/{code}. Trailing query strings (?xmt=...) and fragments (#...) are stripped automatically — paste the raw share URL straight in.
📥 Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
postUrls | array of strings | yes | — | 1–50 Threads post URLs |
maxDepth | integer | no | 3 | Reply-depth cap (1–10) |
maxRepliesPerNode | integer | no | 50 | Top-level reply cap per post (1–500) |
useProxy | boolean | no | true | Route via BUYPROXIES94952 residential |
Single-URL example
{"postUrls": ["https://www.threads.net/@mosseri/post/DYX3oNcAO4r"],"maxDepth": 3,"maxRepliesPerNode": 50,"useProxy": true}
Batch example
{"postUrls": ["https://www.threads.net/@mosseri/post/DYX3oNcAO4r","https://www.threads.net/@threads/post/AAA","https://www.threads.net/@zuck/post/BBB"],"maxDepth": 2,"maxRepliesPerNode": 25,"useProxy": true}
📤 Output
One flat dataset row per post or reply. Use row_type to filter to roots only, or depth to filter to direct replies, top-level threads, or specific nested layers.
{"row_type": "post","root_post_id": "3897828658278100523","root_post_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r","parent_reply_id": null,"reply_id": "3897828658278100523","reply_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r","reply_text": "Does DMing people back help with reach?","author_username": "mosseri","author_display_name": "Adam Mosseri","author_user_id": "63482099442","author_followers": null,"posted_at": "2026-05-15T13:36:48+00:00","like_count": 427,"reply_count": 98,"repost_count": 12,"quote_count": 2,"depth": 0,"scraped_at": "2026-05-16T12:00:00+00:00"}
Export formats
After a run completes, click Export in the Apify Console for JSON (full fidelity), CSV (flat — ideal for spreadsheets), Excel (.xlsx), or XML. All formats are available via GET /datasets/{id}/items?format=csv&clean=true on the Apify REST API.
Reconstructing the conversation tree
In SQL:
SELECT parent.reply_text AS parent_text,child.reply_text AS reply_text,child.depth,child.like_countFROM rows childLEFT JOIN rows parentON child.parent_reply_id = parent.reply_idWHERE child.root_post_id = '3897828658278100523'ORDER BY child.depth, child.like_count DESC;
In pandas: df.merge(df, left_on="parent_reply_id", right_on="reply_id", suffixes=("", "_parent")).
💰 Pricing
Pay-Per-Event (PPE) — you pay only for what you scrape.
| Event | Price (USD) | When |
|---|---|---|
actor-start | $0.05 | Once per run, at boot |
result-row | $0.005 | Per post or reply row written |
Example costs
| Rows scraped | Actor starts | Total cost |
|---|---|---|
| 100 | 1 | $0.55 |
| 500 | 1 | $2.55 |
| 1,000 | 1 | $5.05 |
| 5,000 | 1 | $25.05 |
A typical run on a single mid-sized post (root + ~50 direct replies + a handful of nested chains) emits 60–120 rows, costing $0.35–$0.65.
🚧 Limitations
- No "Show replies" pagination. Threads renders some deeply nested replies behind a "Show replies" click that triggers a client-side XHR. This Actor emits exactly what
threads.netserves in the initial HTML — typically the root, all direct replies, and any inline-expanded depth-2/3 chains Threads already included. Deeper hidden replies require their own click and are not fetched. - No reposter user list. Threads renders the
/reposts/sub-page entirely client-side from a private GraphQL endpoint that uses rotatingdoc_id+lsdtokens. Repost counts are captured on every row (repost_count), but the list of accounts that reposted is out of scope. - No quote-post bodies.
quote_countis captured per row; the bodies of quote-posts referencing the input would require additional fetches and are not included. - No media (images, videos). Only
reply_textis captured. Image ALT text, video transcripts, and external link cards are not extracted. - Private profiles / login-walled posts return zero rows. If Threads serves a login wall (depending on IP reputation), the Actor logs a WARNING and skips that URL. Enable
useProxyto maximise success rate. - Meta may rate-limit. With residential proxy + per-URL session rotation the Actor handles single-post scrapes reliably. Very large batches (>20 posts in one run) may trigger short rate-limit windows — the Actor retries with exponential backoff up to 5 attempts per URL.
- Apify FREE plan retains run-scoped storage for 7 days only. Export your dataset immediately after the run or use a named dataset to retain longer.
- ToS responsibility. Meta's Terms of Service prohibit scraping. The
threads.netpost URL is publicly accessible without login, but you remain responsible for verifying your jurisdiction's data-protection rules and Meta's current Terms before using scraped data commercially.
❓ FAQ
Do I need a Threads or Instagram account?
No. The Actor calls threads.net directly with a real Chrome 131 fingerprint. No Meta login, no API key, no OAuth.
Does this work for any public Threads post?
Yes, as long as the post is reachable at its public URL without a login wall. Private accounts, deleted posts, and posts behind an age-gate return zero rows and a clear status message.
How deep into the reply tree does this go?
By default, depth 3 — root post (depth 0), direct replies (depth 1), and the first two layers of inline-expanded nested replies Threads embeds in the initial HTML (depth 2 and 3). Increase maxDepth up to 10 if you need every embedded chain.
Why isn't the reposter list included?
Threads' /reposts/ sub-page loads its user list via an internal client-side XHR that uses rotating GraphQL doc_id + lsd tokens. Implementing that would create constant breakage as Meta rotates the tokens. Repost counts are still captured on every row.
Why is useProxy ON by default?
Meta blocks repeated requests from the same datacenter IP within minutes. The Devil Scrapes BUYPROXIES94952 residential pool rotates IPs per URL, which is the difference between consistent success and an empty dataset.
Can I rebuild the conversation graph?
Yes — every row carries parent_reply_id, so a single LEFT JOIN on reply_id reconstructs the tree. See the Output → Reconstructing the conversation tree section above for SQL and pandas examples.
What happens if Meta blocks the request?
The Actor retries with exponential backoff (up to 5 attempts per URL). If all retries fail or the page returns a login wall instead of the conversation payload, that URL is skipped with a WARNING log and the run continues with the next URL. If every URL fails, the Actor exits non-zero with a clear status message.
💬 Your feedback
Found a bug, hit a rate-limit pattern, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.