Threads Reply Scraper — Conversation Graph
Pricing
Pay per event
Threads Reply Scraper — Conversation Graph
Export the full reply tree of any public Threads post — no Meta login — as a conversation graph plus engagement counts, to JSON or CSV. A Threads post scraper built on the threads.net SSR HTML payload. We retry and rotate so the thread lands.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Threads Reply Scraper — Full Conversation Graph Export
We do the dirty work so your dataset stays clean. 😈
$5.05 / 1,000 rows — Export the full reply tree of any public Threads (threads.net) post. No Meta account. No API key. Every visible reply chain, depth-linked, with engagement counts on every node — ready for conversation-graph analysis, brand triage, or NLP research.
Existing Threads scrapers on the Apify Store cap at ~20 posts per profile with no reply-tree expansion. This Actor goes the opposite direction: pick one post, get the whole conversation underneath it, parent-pointed and depth-indexed.
🎯 What this scrapes
You pass one or more Threads post URLs. For each post, this Actor:
- Fetches
https://www.threads.net/@{username}/post/{code}with a real browser fingerprint. - Extracts the server-rendered conversation payload Threads embeds inside
<script type="application/json" data-sjs>blocks. - Walks the payload's
edges -> thread_itemstree and emits one flat row per node — the root post plus every reply Threads inlined into the initial HTML, including nested chains (depth 2+).
Every row carries a row_type discriminator ("post" for the root, "reply" for everything else), a depth integer (0 for root, 1 for direct replies, 2+ for nested chain replies), and a parent_reply_id pointer so consumers can reconstruct the conversation graph with a single LEFT JOIN.
This is the threads.net scraper that goes where the official Threads API won't: third-party conversation trees, no developer-account review required.
| Field | Type | Description |
|---|---|---|
row_type | string enum | "post" for the root, "reply" for everything else |
root_post_id | string | Threads pk of the input post this row belongs to |
root_post_url | string | Input URL (normalized) — same value on every row from one input |
parent_reply_id | string | null | null for root; root pk for direct replies; predecessor pk for nested |
reply_id | string | This node's pk |
reply_url | string | Public Threads URL of this node |
reply_text | string | Caption body (empty string if media-only) |
author_username | string | Author handle |
author_display_name | string | null | Author full name |
author_user_id | string | null | Internal Threads user pk |
author_followers | integer | null | Follower count when Threads surfaces it (usually only root author) |
posted_at | string | ISO 8601 UTC timestamp |
like_count | integer | Likes at scrape time |
reply_count | integer | Direct reply count at scrape time |
repost_count | integer | Repost count at scrape time |
quote_count | integer | Quote-post count at scrape time |
depth | integer | 0=root, 1=direct, 2+=nested chain |
scraped_at | string | ISO 8601 UTC when the row was written |
🔥 Features
- Full reply tree per post — root + every reply chain Threads server-renders into the initial HTML, including nested depth-2/3+ chains.
- Depth-linked output —
depth+parent_reply_idlet you rebuild the conversation graph trivially in SQL, pandas, or networkx. - Engagement counts on every node — likes, direct replies, reposts, and quote-posts captured per post and per reply.
- We rotate browser fingerprints — curl-cffi Chrome 131 TLS + HTTP/2 impersonation so the target sees a real browser, not Python. Fingerprint profiles cycle per session.
- We rotate residential proxies — BUYPROXIES94952 residential pool is on by default; fresh session ID on every block. Meta bans datacenter IPs within minutes; we route around it.
- We retry with exponential backoff — up to 5 attempts per URL on
408 / 429 / 5xxwithRetry-Afterhonoured. You get results, not empty datasets. - Per-post cost control —
maxDepth(1–10) andmaxRepliesPerNode(1–500) caps so you pay exactly for what you need. - Pydantic v2 input validation — bad URLs, empty lists, and out-of-range caps fail fast before any network call, not after you've paid for a run.
- Clean typed rows — ISO 8601 timestamps, stable
pk-based IDs, nullable fields declared — no surprise nulls or mixed types in your dataset.
💡 Use cases
- Brand reputation monitoring — pull the entire reply pile-on under a viral brand mention and triage by like count or reach in 2 minutes, not 90.
- Crisis communications — export every visible reply to a controversial post for PR review without manual scrolling.
- Social-listening dashboards — feed conversation-graph rows into Slack, Looker, Tableau, or Hex for real-time sentiment tracking on Threads.
- Competitive intelligence — track reply sentiment under competitor product launches on Threads.
- Creator analytics — see which of your own replies sparked sub-conversations vs which died after one comment.
- Academic research — bootstrap conversation-tree datasets for NLP and argument-mining models from public Threads discussions.
- Meta-policy research — measure conversation topology on policy-adjacent posts (fanout, nested-debate sub-threads, engagement decay).
- OSINT investigation — track public discussion threads around named events or accounts.
⚙️ How to use it
- Open the Actor input form.
- Paste one or more Threads post URLs into Threads post URLs — e.g.
https://www.threads.net/@mosseri/post/DYX3oNcAO4r. Up to 50 URLs per run. - Set Maximum reply depth to control how deep into nested chains the Actor goes (default
3). - Set Max top-level reply threads per post to cap how many direct replies are exported per post (default
50). - Leave Use Apify Proxy ON — Meta blocks datacenter IPs within minutes. We handle the rotation.
- Click Start. Results stream into the default dataset in real time and are downloadable as JSON, CSV, Excel, or XML.
Finding a Threads post URL
Open any public post on threads.net or in the Threads app and copy the share link. The URL format is https://www.threads.net/@{username}/post/{code}. Trailing query strings (?xmt=...) and fragments are stripped automatically — paste the raw share URL straight in.
📥 Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
postUrls | array of strings | yes | — | 1–50 Threads post URLs |
maxDepth | integer | no | 3 | Reply-depth cap (1–10) |
maxRepliesPerNode | integer | no | 50 | Top-level reply cap per post (1–500) |
useProxy | boolean | no | true | Route via BUYPROXIES94952 residential |
Single-URL example
{"postUrls": ["https://www.threads.net/@mosseri/post/DYX3oNcAO4r"],"maxDepth": 3,"maxRepliesPerNode": 50,"useProxy": true}
Batch example
{"postUrls": ["https://www.threads.net/@mosseri/post/DYX3oNcAO4r","https://www.threads.net/@threads/post/AAA","https://www.threads.net/@zuck/post/BBB"],"maxDepth": 2,"maxRepliesPerNode": 25,"useProxy": true}
📤 Output
One flat dataset row per post or reply. Use row_type to filter to roots only, or depth to filter to direct replies or specific nested layers.
{"row_type": "post","root_post_id": "3897828658278100523","root_post_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r","parent_reply_id": null,"reply_id": "3897828658278100523","reply_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r","reply_text": "Does DMing people back help with reach?","author_username": "mosseri","author_display_name": "Adam Mosseri","author_user_id": "63482099442","author_followers": null,"posted_at": "2026-05-15T13:36:48+00:00","like_count": 427,"reply_count": 98,"repost_count": 12,"quote_count": 2,"depth": 0,"scraped_at": "2026-05-16T12:00:00+00:00"}
Export formats
After a run completes, click Export in the Apify Console for JSON (full fidelity), CSV (flat — ideal for spreadsheets), Excel (.xlsx), or XML. All formats are available via GET /datasets/{id}/items?format=csv&clean=true on the Apify REST API.
Reconstructing the conversation tree
In SQL:
SELECT parent.reply_text AS parent_text,child.reply_text AS reply_text,child.depth,child.like_countFROM rows childLEFT JOIN rows parentON child.parent_reply_id = parent.reply_idWHERE child.root_post_id = '3897828658278100523'ORDER BY child.depth, child.like_count DESC;
In pandas: df.merge(df, left_on="parent_reply_id", right_on="reply_id", suffixes=("", "_parent")).
💰 Pricing
Pay-Per-Event (PPE) — you pay only for what you scrape. No result → no charge beyond the small start fee.
| Event | Price (USD) | When |
|---|---|---|
actor-start | $0.05 | Once per run, at boot |
result-row | $0.005 | Per post or reply row written |
Example costs
| Rows scraped | Actor starts | Total cost |
|---|---|---|
| 100 | 1 | $0.55 |
| 500 | 1 | $2.55 |
| 1,000 | 1 | $5.05 |
| 5,000 | 1 | $25.05 |
A typical run on a single mid-sized post (root + ~50 direct replies + a handful of nested chains) emits 60–120 rows, costing $0.35–$0.65.
🚧 Limitations
- No "Show replies" pagination. Threads renders some deeply nested replies behind a "Show replies" click that triggers a client-side XHR. This Actor emits exactly what
threads.netserves in the initial HTML — typically the root, all direct replies, and any inline-expanded depth-2/3 chains Threads already included. Deeper hidden replies require their own XHR and are not fetched in this version. - No reposter user list. Threads renders the
/reposts/sub-page entirely client-side from a private endpoint using rotating tokens. Repost counts are captured on every row (repost_count), but the list of accounts that reposted is out of scope. - No quote-post bodies.
quote_countis captured per row; the bodies of quote-posts referencing the input are not included. - No media (images, videos). Only
reply_textis captured. Image ALT text, video transcripts, and external link cards are not extracted. - Private profiles / login-walled posts return zero rows. If the page returns a login wall instead of the conversation payload, the Actor logs a WARNING and skips that URL. Enable
useProxyto maximise success rate. - Very large batches may encounter rate-limit windows. With residential proxy and per-URL session rotation the Actor handles single-post scrapes reliably. Batches larger than 20 posts in one run may trigger short pauses — we retry with exponential backoff up to 5 attempts per URL.
- Not real-time. The Actor reads what
threads.netserves in its current SSR HTML. Replies posted seconds before the scrape may not yet be inlined in that snapshot. - Apify FREE plan retains run-scoped storage for 7 days only. Export your dataset immediately after the run or use a named dataset to retain longer.
- ToS responsibility. Meta's Terms of Service prohibit scraping. The
threads.netpost URL is publicly accessible without login, but you remain responsible for verifying your jurisdiction's data-protection rules and Meta's current Terms before using scraped data commercially.
❓ FAQ
Do I need a Threads or Instagram account?
No. The Actor fetches threads.net directly with a real Chrome browser fingerprint. No Meta login, no API key, no OAuth flow.
Is this a Meta Threads API alternative?
It is complementary to the official API. Meta's Threads API is gated behind a developer-account review and exposes only the post owner's own data — it does not support third-party conversation-tree reads. This Actor reads the same public SSR HTML any browser renders when visiting a threads.net post URL. Use both where each fits.
Does this work as a threads.net scraper for any public post?
Yes, as long as the post is reachable at its public URL without a login wall. Private accounts, deleted posts, and posts behind an age-gate return zero rows and a clear status message.
How deep into the reply tree does this go?
By default, depth 3 — root post (depth 0), direct replies (depth 1), and the first two layers of inline-expanded nested replies Threads embeds in the initial HTML (depth 2 and 3). Increase maxDepth up to 10 if you need every embedded chain.
Why isn't the reposter list included?
Threads' /reposts/ sub-page loads its user list via an internal client-side request with rotating tokens. Implementing that would create constant breakage as the tokens rotate. Repost counts are still captured on every row.
Why is residential proxy on by default?
Meta blocks repeated requests from the same datacenter IP within minutes. The residential pool rotates IPs per URL — that is the difference between consistent results and an empty dataset. We manage the rotation so you don't have to.
What happens if Meta blocks a request?
We retry with exponential backoff — up to 5 attempts per URL. If all retries fail or the page returns a login wall, that URL is skipped with a WARNING log and the run continues. If every URL fails, the Actor exits non-zero with a clear status message so you always know what happened.
Can I rebuild the conversation graph from the output?
Yes — every row carries parent_reply_id, so a single LEFT JOIN on reply_id reconstructs the tree. See the Output → Reconstructing the conversation tree section above for SQL and pandas examples.
Can I download threads replies as a spreadsheet?
Yes. After the run completes, click Export and choose CSV or Excel. The flat row-per-reply structure maps directly to a spreadsheet without any transformation needed.
💬 Your feedback
Found a bug, hit a rate-limit pattern, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.