Threads Post Scraper avatar
Threads Post Scraper

Pricing

from $2.00 / 1,000 posts

Go to Apify Store
Threads Post Scraper

Threads Post Scraper

Scrape any public Threads post, including full media, text, mentions, links, and all nested comments. Supports multiple URLs, raw JSON mode, and reliable HTML parsing without login. Outputs structured data to Dataset and SUMMARY.json.

Pricing

from $2.00 / 1,000 posts

Rating

0.0

(0)

Developer

Tran Tu

Tran Tu

Maintained by Community

Actor stats

0

Bookmarked

8

Total users

4

Monthly active users

16 days ago

Last modified

Categories

Share

Search and extract public Threads posts (by post URL) and export thread-level results (root post + nested comments) to your Apify Dataset and SUMMARY.json in Key-Value Store.


What this actor does

  • Accepts one or more Threads post URLs and fetches the HTML page.

  • Parses nodes into a root post and nested comments (top-level comment + replies).

  • Emits one dataset item per parsed post page (post_with_comments_nested) containing:

    • root_post (normalized or raw depending on raw)
    • comments (array of { comment, replies: [] })
  • Saves a run summary to SUMMARY.json.

Works only on public content (no login). Use a post URL like https://www.threads.net/@username/post/CODE.


Input

Example (single URL, normalized)

{
"url": "https://www.threads.net/@quachthihang/post/DR3ioaSEjx8",
}

Example (multiple URLs, raw)

{
"urls": [
"https://www.threads.net/@user/post/ABC",
"https://www.threads.net/@user/post/DEF"
],
}

Input fields

FieldTypeRequiredNotes
urlstringNo*Single Threads post URL. If provided, urls may be omitted.
urlsarrayNo*Multiple Threads post URLs (preferred for batch runs).
rawbooleanNofalse = normalized output (default). true = include original raw JSON in each post.
timeoutMsintegerNoHTTP request timeout in milliseconds (default 15000).
  • At least one of url or urls must be provided.

Output

  • Dataset items: one item per processed post page with type: "post_with_comments_nested".
  • Key-Value Store: SUMMARY.json.

Normalized item (example)

{
"type": "post_with_comments_nested",
"url": "https://www.threads.net/@quachthihang/post/DR3ioaSEjx8",
"ok": true,
"root_post": {
"id": "3780642722781740156_67808873631",
"pk": "3780642722781740156",
"user": {
"id": "67808873631",
"username": "quachthihang",
"full_name": "Thanh Hằng",
"profile_pic_url": "https://...jpg",
"is_verified": false
},
"text": "chưa gì mà đã cuống hết cả lên thế này rồi",
"images": [
{ "url": "https://...jpg", "width": 640, "height": 1136 }
],
"taken_at": 1764907754,
"like_count": 9156,
"code": "DR3ioaSEjx8"
},
"comments": [
{
"comment": { "id": "...", "text": "link...", "reply_to_post": "378064272..." },
"replies": [
{ "id": "...", "text": "reply to comment", "reply_to_comment": "..." }
]
}
],
"total_nodes_found": 10,
"total_top_comments": 3,
"collectedAt": "2025-12-06T..."
}

Raw item (example)

When raw: true, root_post and comment objects include the original raw JSON from Threads (the rest of normalized fields are still present).


SUMMARY.json

Example (normalized):

{
"input": {
"urls": ["https://www.threads.net/@user/post/ABC"],
"raw": false
},
"processed": [
{ "url": "https://www.threads.net/@user/post/ABC", "ok": true, "top_comments": 4 }
],
"collectedAt": "2025-12-06T..."
}

Example (raw):

{
"input": { "urls": ["..."], "raw": true },
"processed": [
{ "url": "...", "ok": true, "top_comments": 4 }
],
"collectedAt": "2025-12-06T..."
}

Quick start (no code)

  1. Open this actor in the Apify console.

  2. Click Run.

  3. Paste your input.json (see examples above).

  4. Start the run.

  5. Check results:

    • Dataset → items (post_with_comments_nested)
    • Key-Value StoreSUMMARY.json

CLI / API examples

Run with apify CLI

$apify run --input-file=input.json

Start a run via Apify API (curl)

curl -X POST "https://api.apify.com/v2/acts/trantus~threads-post-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"urls":["https://www.threads.net/@user/post/ABC"], "raw": false}'

Fetch dataset (last run)

$curl "https://api.apify.com/v2/acts/YOUR_ACCOUNT~threads-post-scraper/runs/last/dataset/items?token=YOUR_TOKEN"

Notes & tips

  • Provide full post URLs (https://www.threads.net/@user/post/CODE) — the actor extracts JSON embedded in the page.
  • If you get network errors (DNS / ENOTFOUND), try running with the url content copied into relayJson offline (not supported in this actor build) or resolve DNS connectivity.
  • Increase timeoutMs for slow connections or heavy pages.

Changelog

  • 1.0.0

    • Initial release