Meta Threads Scraper — CSV, No Login, No Rate Limits
Pricing
Pay per usage
Meta Threads Scraper — CSV, No Login, No Rate Limits
Meta Threads (threads.net) data as JSON/CSV — POSTs (author, text, source) + PROFILEs (followers, biography, avatar) by username/search. 23+ runs. For audience research + brand mentions + competitor content. No API waitlist. Custom fork: spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
6
Total users
2
Monthly active users
6 days ago
Last modified
Categories
Share
Threads Scraper — Profiles + Posts from Public Threads Pages (No Login, No API Waitlist)
Pull Threads profile metadata and post text from public Threads pages — without a Meta developer account, without the Threads API waitlist, and without browser automation. The actor reads what a logged-out visitor sees: the server-rendered HTML and embedded JSON blob.
Who this is for: PR teams tracking brand mentions. Marketing teams sizing up influencer partners. Market researchers benchmarking competitor content cadence. Founders watching a niche community grow in real time.
What you actually get
The actor pushes two record types to the dataset (verified against src/main.js):
_type: "PROFILE" — one per username
Two parsing paths, depending on whether Meta ships the embedded JSON blob in this run:
| Path | Fields pushed | Triggered when |
|---|---|---|
| A — JSON regex | _type, username, followers, following, biography, url, scrapedAt | The page contains the React/Next.js SSR JSON with follower_count / following_count |
| B — Open Graph fallback | _type, username, displayName, description, avatar, url, scrapedAt | JSON regex fails — OG meta tags only |
The two paths are mutually exclusive per record. If you need follower counts specifically, expect occasional records that fall through to Path B (no follower count). If your analysis depends on follower counts, drop Path B records or re-run.
_type: "POST" — one per extracted post
{"_type": "POST","author": "zuck","text": "Threads passes 200M monthly users.","source": "profile:zuck","scrapedAt": "2026-04-29T12:00:00.000Z"}
source is profile:<username> for posts harvested from a profile page, or search:<query> for posts harvested from the search endpoint.
Honest disclosure on what's NOT extracted: likes, replies, reshares, post timestamps, post URLs, image attachments, and quoted-thread context are not parsed. Only text and author. Meta exposes these counts inconsistently on logged-out pages — if you need engagement metrics at scale, see Custom scraping below.
Input
{"usernames": ["zuck", "mosseri"],"searchQueries": ["AI agents"],"maxPostsPerSource": 50}
| Parameter | Type | Default | Description |
|---|---|---|---|
usernames | Array | [] | Threads handles without @. The @ prefix and threads.net/ are stripped. |
searchQueries | Array | [] | Keywords searched against threads.net/search. |
maxPostsPerSource | Number | 50 | Cap per profile or query. |
maxConcurrency=3, maxRequestsPerCrawl=200, requestHandlerTimeoutSecs=30. No proxy is used — direct fetches over CheerioCrawler default agent.
Common questions
Q: Will this trigger Apify or Meta abuse flags?
A: The actor only hits publicly accessible Threads URLs (the same pages a logged-out visitor sees) and parses the server-rendered JSON already embedded in the HTML. No login, no auth tokens, no private endpoints. That said: any large-volume crawler against Meta surfaces eventually rate-limits. We've seen clean runs at the default maxConcurrency=3; bursting harder is on you.
Q: What if Meta changes the page layout? A: Two parsers run in sequence — (1) regex against the embedded React/Next.js JSON, (2) Open Graph meta tags as fallback. When one layer breaks, the other usually still returns at least the username + display-name pair. Email if both layers stop returning data and we'll patch within a session.
Q: Can I get likes / replies / reshares?
A: Not from this actor — engagement counts are inconsistently rendered on logged-out Threads pages, and the regex extraction here only captures text. For guaranteed engagement metrics at scale, request a custom build (see below).
Q: Bulk run cost? A: Apify charges by compute units, not per-profile. Each profile run is a single CheerioCrawler request (~1-3 seconds). Free tier covers small batches. For large batches (1000+ profiles), email and we can quote a fixed-price custom build instead.
Honest Limitations (regex extraction edge cases)
Both parsing paths use regex against the embedded JSON blob, not a JSON parser. Known consequences:
- Posts shorter than 2 chars or longer than 500 chars are silently dropped. The regex range
[^"]{2,500}is a deliberate noise filter, but it does drop one-emoji posts and long-form posts. - Biographies / post text with escaped double-quotes (
\") get truncated at the first\". Only\nis decoded back to newline; other JSON escape sequences (\",\u0000,\\) are left as-is or break the match. - Search branch regex
"text":"([^"]{10,500})".*?"username":"([^"]+)"uses lazy.*?to bridge text → username. On dense JSON pages this can occasionally cross a record boundary and pair text with the WRONG username. If you see suspicious pairings, drop them by deduping ontext + authorand re-running. - The PROFILE record's POST extraction matches
"text_post_app_info":{...}.*?"text":"([^"]{2,500})"— the.*?similarly can drift across fields. On well-structured pages this is fine; on malformed/partial responses it may capture unrelatedtextvalues. - Search-branch records do NOT include profile metadata — only
_type, text, author, source, scrapedAt(5 fields). If you need follower counts for search-result authors, run a second pass through the profile branch with the deduped author list. - No proxy. If Threads escalates anti-bot on a particular IP range, the actor returns 0 records silently — no flag, no error. Re-run from a fresh Apify run (which uses a different egress IP) or commission a custom build with residential-proxy routing.
Step-by-step
- Open Threads Scraper → click "Try for free".
- Paste usernames:
["zuck", "mosseri"]. - Click Start → download JSON / CSV when the run finishes.
For programmatic use:
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/threads-scraper").call(run_input={"usernames": ["zuck", "mosseri"], "maxPostsPerSource": 30})for item in client.dataset(run["defaultDatasetId"]).iterate_items():if item["_type"] == "PROFILE":print(item["username"], "→", item.get("followers"), "followers,", item.get("biography") or item.get("description"))else:print(f" POST by {item['author']}: {item['text'][:80]}")
Social-listening toolkit (related actors)
| Platform | Tool |
|---|---|
| Threads (this tool) | Meta's text-post network |
| Reddit Discussion | Community discussions, public JSON API |
| Bluesky Scraper | AT Protocol, open API |
| YouTube Comments | Video audience reactions |
| Hacker News | Developer sentiment |
All 31 published actors free to inspect on Apify Store.
Proof of delivery
24 lifetime runs on this actor — but the broader portfolio is what backs every pilot:
- 31 published / 78 total Apify scrapers across socials, B2B, dev tools.
- Flagship: Trustpilot Review Scraper — 951 lifetime runs, 0 bot-detection failures across 30 days.
- Recent paid series: $150 / 3-article postmortem for a client in the proxy industry (March 2026, delivered).
- Code-honest READMEs: every claim in this readme is verified against
src/. No "supports X" without proof.
Pilot pricing locked through May 2026:
- 1 case-study article (1100w+, code blocks): $50
- 3-article series: $150
- Custom build (this actor → your variant: follower-list pulls, comment trees, multi-source enrichment with Instagram + Bluesky): from $50 depending on schema delta.
Reply sample to spinov001@gmail.com — get 2 published case-study articles within 24h. No commitment.
Custom scraping — pilot tiers
Need engagement metrics, multi-platform fan-out, or a different schema? Three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single Threads + Instagram fan-out or a one-off competitor cadence report.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most social-listening projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily competitor sentiment, multi-source enrichment with Instagram + Threads + Bluesky).
Email: spinov001@gmail.com — drop specs, schema, or target handles and get a quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 951 / Reddit 82 / Google News 45 / Glassdoor 39 / Email Extractor 107 / Hacker News 27 / Bluesky 25. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Disclaimer
Designed for market-research, brand-monitoring, and academic use. Respect Threads' / Meta's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with Meta Platforms, Inc.
Honest disclosure: extracts only text + author for posts and username + (followers/following/biography) | (displayName/description/avatar) for profiles — engagement counts, post URLs, post timestamps, and image attachments are not in the schema. Two parsing paths run with mutually exclusive output fields per profile record.