Threads Reply Tree Scraper avatar

Threads Reply Tree Scraper

Pricing

Pay per event

Go to Apify Store
Threads Reply Tree Scraper

Threads Reply Tree Scraper

Export the full reply tree of any public Threads post. No Meta login. Conversation graph plus engagement counts via the threads.net SSR HTML payload.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Threads Reply Tree Scraper

Threads Reply Tree Scraper

We do the dirty work so your dataset stays clean. 😈

$5.05 / 1,000 rows — Export the full reply tree of any public Threads (threads.net) post. No Meta account. No API key. Get every visible reply chain, depth-linked, with engagement counts on every node, ready for conversation-graph analysis.

Existing Threads scrapers on the Apify Store cap at ~20 posts per profile with no reply-tree expansion. This Actor goes the opposite direction: pick one post, get the whole conversation underneath it.

🎯 What this scrapes

You pass one or more Threads post URLs. For each post, this Actor:

  1. Fetches https://www.threads.net/@{username}/post/{code} with a real browser fingerprint (curl-cffi Chrome 131 TLS + H2 impersonation).
  2. Extracts the server-rendered conversation payload Threads embeds inside <script type="application/json" data-sjs> blocks.
  3. Walks the payload's edges -> thread_items tree and emits one flat row per node — the root post plus every reply Threads inlined into the initial HTML, including nested chains (depth 2+).

Every row carries a row_type discriminator ("post" for the root, "reply" for everything else), a depth integer (0 for root, 1 for direct replies, 2+ for nested chain replies), and a parent_reply_id pointer so consumers can reconstruct the conversation graph with a single LEFT JOIN.

FieldTypeDescription
row_typestring enum"post" for the root, "reply" for everything else
root_post_idstringThreads pk of the input post this row belongs to
root_post_urlstringInput URL (normalized) — same value on every row from one input
parent_reply_idstring | nullnull for root; root pk for direct replies; predecessor pk for nested
reply_idstringThis node's pk
reply_urlstringPublic Threads URL of this node
reply_textstringCaption body (empty string if media-only)
author_usernamestringAuthor handle
author_display_namestring | nullAuthor full name
author_user_idstring | nullInternal Threads user pk
author_followersinteger | nullFollower count when Threads surfaces it (usually only root author)
posted_atstringISO 8601 UTC timestamp
like_countintegerLikes at scrape time
reply_countintegerDirect reply count at scrape time
repost_countintegerRepost count at scrape time
quote_countintegerQuote-post count at scrape time
depthinteger0=root, 1=direct, 2+=nested chain
scraped_atstringISO 8601 UTC when the row was written

🔥 Features

  • No Meta account needed — uses public threads.net SSR HTML; no login, no API key, no OAuth.
  • Full reply tree per post — root + every reply chain Threads server-renders into the initial HTML.
  • Depth-linked output — depth + parent_reply_id let you rebuild the conversation graph trivially in SQL, pandas, or networkx.
  • Engagement counts on every node — likes, direct replies, reposts, quote-posts captured per post and per reply.
  • Per-post cost control via maxDepth (1–10) and maxRepliesPerNode (1–500) caps.
  • Real Chrome 131 TLS + H2 fingerprint via curl-cffi impersonation — survives Meta's basic fingerprint checks.
  • Residential proxy (BUYPROXIES94952) ON by default — Meta blocks datacenter IPs aggressively.
  • Pydantic v2 input validation — bad URLs, empty lists, and out-of-range caps fail fast before any network call.
  • Apache-2.0 licensed; source code style locked by Devil Scrapes ADR-0003 (SRP, ≤40-line functions, no magic values).

💡 Use cases

  • Brand reputation monitoring — pull the entire reply pile-on under a viral mention of your brand and triage by like count or top-level reach.
  • Social-listening dashboards — feed conversation-graph rows into Slack, Looker, Tableau, or Hex for real-time sentiment analysis on Threads.
  • Meta-policy research — measure conversation topology on policy-adjacent posts (e.g. how often does a public official's post attract nested debate sub-threads?).
  • Creator analytics — see which of your own replies sparked sub-conversations vs which died after one comment.
  • Crisis comms — quickly download every visible reply to a controversial post for PR review, without manual scrolling.
  • Academic research — bootstrap conversation-tree datasets for NLP / argument-mining models from public Threads discussions.
  • Competitive intelligence — track reply sentiment under competitor product launches on Threads.

⚙️ How to use it

  1. Open the Actor input form.
  2. Paste one or more Threads post URLs into Threads post URLs — e.g. https://www.threads.net/@mosseri/post/DYX3oNcAO4r. Up to 50 URLs per run.
  3. Set Maximum reply depth to control how deep into nested chains the Actor goes (default 3).
  4. Set Max top-level reply threads per post to cap how many direct replies are exported per post (default 50).
  5. Leave Use Apify Proxy ON unless you have a specific reason to disable it — Meta blocks datacenter IPs within minutes.
  6. Click Start. Results stream into the default dataset in real time and are downloadable as JSON, CSV, Excel, or XML.

Finding a Threads post URL

Open any public post on threads.net or in the Threads app and copy the link. The URL format is https://www.threads.net/@{username}/post/{code}. Trailing query strings (?xmt=...) and fragments (#...) are stripped automatically — paste the raw share URL straight in.

📥 Input

FieldTypeRequiredDefaultDescription
postUrlsarray of stringsyes1–50 Threads post URLs
maxDepthintegerno3Reply-depth cap (1–10)
maxRepliesPerNodeintegerno50Top-level reply cap per post (1–500)
useProxybooleannotrueRoute via BUYPROXIES94952 residential

Single-URL example

{
"postUrls": [
"https://www.threads.net/@mosseri/post/DYX3oNcAO4r"
],
"maxDepth": 3,
"maxRepliesPerNode": 50,
"useProxy": true
}

Batch example

{
"postUrls": [
"https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"https://www.threads.net/@threads/post/AAA",
"https://www.threads.net/@zuck/post/BBB"
],
"maxDepth": 2,
"maxRepliesPerNode": 25,
"useProxy": true
}

📤 Output

One flat dataset row per post or reply. Use row_type to filter to roots only, or depth to filter to direct replies, top-level threads, or specific nested layers.

{
"row_type": "post",
"root_post_id": "3897828658278100523",
"root_post_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"parent_reply_id": null,
"reply_id": "3897828658278100523",
"reply_url": "https://www.threads.net/@mosseri/post/DYX3oNcAO4r",
"reply_text": "Does DMing people back help with reach?",
"author_username": "mosseri",
"author_display_name": "Adam Mosseri",
"author_user_id": "63482099442",
"author_followers": null,
"posted_at": "2026-05-15T13:36:48+00:00",
"like_count": 427,
"reply_count": 98,
"repost_count": 12,
"quote_count": 2,
"depth": 0,
"scraped_at": "2026-05-16T12:00:00+00:00"
}

Export formats

After a run completes, click Export in the Apify Console for JSON (full fidelity), CSV (flat — ideal for spreadsheets), Excel (.xlsx), or XML. All formats are available via GET /datasets/{id}/items?format=csv&clean=true on the Apify REST API.

Reconstructing the conversation tree

In SQL:

SELECT parent.reply_text AS parent_text,
child.reply_text AS reply_text,
child.depth,
child.like_count
FROM rows child
LEFT JOIN rows parent
ON child.parent_reply_id = parent.reply_id
WHERE child.root_post_id = '3897828658278100523'
ORDER BY child.depth, child.like_count DESC;

In pandas: df.merge(df, left_on="parent_reply_id", right_on="reply_id", suffixes=("", "_parent")).

💰 Pricing

Pay-Per-Event (PPE) — you pay only for what you scrape.

EventPrice (USD)When
actor-start$0.05Once per run, at boot
result-row$0.005Per post or reply row written

Example costs

Rows scrapedActor startsTotal cost
1001$0.55
5001$2.55
1,0001$5.05
5,0001$25.05

A typical run on a single mid-sized post (root + ~50 direct replies + a handful of nested chains) emits 60–120 rows, costing $0.35–$0.65.

🚧 Limitations

  • No "Show replies" pagination. Threads renders some deeply nested replies behind a "Show replies" click that triggers a client-side XHR. This Actor emits exactly what threads.net serves in the initial HTML — typically the root, all direct replies, and any inline-expanded depth-2/3 chains Threads already included. Deeper hidden replies require their own click and are not fetched.
  • No reposter user list. Threads renders the /reposts/ sub-page entirely client-side from a private GraphQL endpoint that uses rotating doc_id + lsd tokens. Repost counts are captured on every row (repost_count), but the list of accounts that reposted is out of scope.
  • No quote-post bodies. quote_count is captured per row; the bodies of quote-posts referencing the input would require additional fetches and are not included.
  • No media (images, videos). Only reply_text is captured. Image ALT text, video transcripts, and external link cards are not extracted.
  • Private profiles / login-walled posts return zero rows. If Threads serves a login wall (depending on IP reputation), the Actor logs a WARNING and skips that URL. Enable useProxy to maximise success rate.
  • Meta may rate-limit. With residential proxy + per-URL session rotation the Actor handles single-post scrapes reliably. Very large batches (>20 posts in one run) may trigger short rate-limit windows — the Actor retries with exponential backoff up to 5 attempts per URL.
  • Apify FREE plan retains run-scoped storage for 7 days only. Export your dataset immediately after the run or use a named dataset to retain longer.
  • ToS responsibility. Meta's Terms of Service prohibit scraping. The threads.net post URL is publicly accessible without login, but you remain responsible for verifying your jurisdiction's data-protection rules and Meta's current Terms before using scraped data commercially.

❓ FAQ

Do I need a Threads or Instagram account?

No. The Actor calls threads.net directly with a real Chrome 131 fingerprint. No Meta login, no API key, no OAuth.

Does this work for any public Threads post?

Yes, as long as the post is reachable at its public URL without a login wall. Private accounts, deleted posts, and posts behind an age-gate return zero rows and a clear status message.

How deep into the reply tree does this go?

By default, depth 3 — root post (depth 0), direct replies (depth 1), and the first two layers of inline-expanded nested replies Threads embeds in the initial HTML (depth 2 and 3). Increase maxDepth up to 10 if you need every embedded chain.

Why isn't the reposter list included?

Threads' /reposts/ sub-page loads its user list via an internal client-side XHR that uses rotating GraphQL doc_id + lsd tokens. Implementing that would create constant breakage as Meta rotates the tokens. Repost counts are still captured on every row.

Why is useProxy ON by default?

Meta blocks repeated requests from the same datacenter IP within minutes. The Devil Scrapes BUYPROXIES94952 residential pool rotates IPs per URL, which is the difference between consistent success and an empty dataset.

Can I rebuild the conversation graph?

Yes — every row carries parent_reply_id, so a single LEFT JOIN on reply_id reconstructs the tree. See the Output → Reconstructing the conversation tree section above for SQL and pandas examples.

What happens if Meta blocks the request?

The Actor retries with exponential backoff (up to 5 attempts per URL). If all retries fail or the page returns a login wall instead of the conversation payload, that URL is skipped with a WARNING log and the run continues with the next URL. If every URL fails, the Actor exits non-zero with a clear status message.

💬 Your feedback

Found a bug, hit a rate-limit pattern, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.