Threads Crawler
Pricing
Pay per usage
Threads Crawler
One Actor for Threads profiles, tags, searches, posts with replies, and custom feeds. Supports bulk accounts, relative dates, engagement metrics, ISO timestamps, and media URLs. No login or API token required.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Luke
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 hours ago
Last modified
Categories
Share
An Apify Actor for scraping publicly visible Threads pages with Python, Crawlee, Playwright, and Camoufox.
The Actor is designed for local development and Apify deployment through GitHub. It loads dynamic Threads pages in a Camoufox browser, extracts visible profile and post data, and stores structured results in an Apify dataset.
Features
- Crawl publicly visible Threads profile pages.
- Batch crawl up to 100 accounts in one run.
- Support five crawl modes:
- Profile pages by username.
- Tag / topic pages.
- Keyword search pages.
- Single thread / post URLs, including publicly visible replies.
- Custom Threads feed URLs.
- Support username input with or without
@. - Support tags with or without
#. - Support bulk paste fields for accounts and keywords.
- Support relative date windows such as
7 days,1 month,24 hours,7 天, or1 個月. - Limit returned posts per account or target with
maxPostsPerAccount. - Support absolute
startDateandendDatefilters. - Extract profile metadata when visible:
- username
- display name
- bio
- external URL
- follower count
- Extract visible post data:
- author
- relative timestamp
- best-effort ISO timestamp
- post text
- visible metrics
- media URLs found on the page
- Optional raw visible text output for parser debugging.
Important Limitations
This Actor does not log in to Threads and does not use a private API token. It only extracts data that Threads renders publicly in the browser.
Because of that:
- Full historical posts may not be available. Threads can show
Log in to see more. - Replies are only extracted when they are publicly visible on the loaded page.
- Engagement fields depend on what the public page exposes.
likes,replies,reposts,shares,views, andquotesare best-effort mappings from visible metric numbers.- Some metrics may be
nullif Threads does not expose them publicly. - ISO timestamps are estimated from relative timestamps such as
12h,1d, or3wusing the scrape time. - Exact post URLs, permanent IDs, and deeper media metadata may require additional parser work.
- Threads page structure can change, which may require selector/parser updates.
- Camoufox reduces common automation fingerprints, but it does not guarantee access or bypass platform limits.
Use this Actor only for public data and make sure your usage complies with applicable laws, Threads terms, and Apify platform rules.
Input
The Actor input is configured in .actor/input_schema.json.
mode
Choose what to crawl:
"profile"
Supported values:
profiletagsearchthreadfeed
Profile Mode
Use accounts for structured input:
{"mode": "profile","accounts": ["largitdata"],"maxPostsPerAccount": 10}
Or use bulkAccounts:
{"mode": "profile","bulkAccounts": "largitdata\nopenai\nmeta","maxPostsPerAccount": 10}
Tag / Topic Mode
{"mode": "tag","keywordsOrTags": ["AI", "MachineLearning"],"maxPostsPerAccount": 10}
Keyword Search Mode
{"mode": "search","keywordsOrTags": ["AI agent", "Crawlee"],"searchSort": "top","maxPostsPerAccount": 10}
Single Thread Mode
{"mode": "thread","threadUrls": [{"url": "https://www.threads.com/@largitdata/post/POST_ID"}],"maxPostsPerAccount": 10}
Custom Feed Mode
{"mode": "feed","feedUrls": [{"url": "https://www.threads.com/"}],"maxPostsPerAccount": 10}
Date Filters
{"mode": "profile","accounts": ["largitdata"],"relativeDate": "7 days"}
You can also use absolute dates:
{"mode": "profile","accounts": ["largitdata"],"startDate": "2026-05-01","endDate": "2026-05-12"}
Date filtering is best-effort because public Threads pages often expose only relative timestamps.
Output
Each dataset item represents one crawled target.
Example shape:
{"url": "https://www.threads.com/@largitdata","mode": "profile","target": "largitdata","scraped_at": "2026-05-12T06:44:40.405672+00:00","title": "@largitdata • Threads, Say more","profile": {"username": "largitdata","display_name": "largitdata","bio": "...","external_url": "largitdata.com","followers": "4,847 followers"},"posts": [{"author": "largitdata","posted_at": "15h","posted_at_iso": "2026-05-11T15:44:40.405672+00:00","text": "...","metrics": {"likes": "10","replies": "1","reposts": "1","shares": "2","views": null,"quotes": null,"raw": ["10", "1", "1", "2"]}}],"media_urls": ["https://..."]}
Local Development
Install the Apify CLI:
$npm install -g apify-cli
Create and activate a Python virtual environment:
python -m venv .venv.venv\Scripts\activate
Install dependencies:
$python -m pip install -r requirements.txt
Run locally:
$apify run --purge
Local test input is stored in:
storage/key_value_stores/default/INPUT.json
Local dataset output is stored in:
storage/datasets/default/
The storage/ and .venv/ directories are ignored by Git.
Deploy to Apify
This project is intended to be deployed from GitHub.
- Push changes to GitHub.
- Open Apify Console.
- Create or open the Actor.
- Link the GitHub repository.
- Build the Actor from the
mainbranch.
You can also deploy directly with:
apify loginapify push
GitHub deployment is recommended because it keeps version history and makes future parser updates easier to review.
Tech Stack
- Apify Python SDK
- Crawlee for Python
- Playwright
- Camoufox
- Python 3.12+
Notes for Future Improvements
- Add deeper scrolling for profiles and feed pages.
- Improve single-thread reply parsing.
- Extract stable post URLs and IDs.
- Map metric numbers to labels more reliably by inspecting DOM structure.
- Add optional authenticated session support for private internal use cases.
- Add tests around parser behavior using saved page text fixtures.