4Chan Thread & Board Scraper
Pricing
from $1.25 / 1,000 results
4Chan Thread & Board Scraper
Scrape threads from one or more 4chan boards using the official 4chan JSON API. Collect structured thread data, original posts, optional replies, attachments, extracted links, participant summaries, and thread-level metadata for research, monitoring, archiving, and downstream analysis.
Pricing
from $1.25 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
4chan JSON API Scraper
Scrape threads from one or more 4chan boards through the official 4chan JSON API.
This actor is useful when you want structured thread data for research, monitoring, archiving, enrichment, or downstream analysis. You choose which boards to scrape, how many threads to collect from each board, and whether to include replies or just the original post.
What The Actor Does
- Fetches the current catalog for each selected board
- Collects up to your chosen number of threads per board
- Stores thread metadata, original post data, attachment details, and optional replies
- Enriches output with catalog context, participation summaries, and extracted links
Input
boards
List of boards to scrape, without slashes.
Example:
["g", "biz", "tv"]
Default: ["g"]
maxThreadsPerBoard
Maximum number of threads to collect from each selected board.
Example:
20
Default: 10
scrapeReplies
true: store the original post and all available replies in each threadfalse: store only the original post, while still keeping thread-level counts and summary metadata
Default: true
proxyConfiguration
Optional proxy settings for the requests. By default, the actor uses direct requests because the public 4chan JSON API is usually faster without a proxy. Enable Apify Proxy or provide custom proxy URLs only when your environment needs it.
Output
Each dataset item represents one scraped post row with repeated thread-level metadata.
Top-level fields include:
boardthreadIdthreadUrlapiUrlscrapedAtsubjectsemanticUrlreplyCountimageCountisStickyisClosedisArchivedarchivedOncatalogstatsparticipantslinkspost
catalog
Board catalog context for the thread, including:
- catalog page number
- last modified timestamp
- omitted reply count
- omitted image count
- recent reply post IDs when available
stats
Thread-level summary fields, including:
- total posts and replies in the thread
- how many posts and replies are stored in the dataset item
- attachment totals
- quote counts
- external link counts
- board reference counts
- simple content flags such as code and greentext counts
participants
Participant summaries, including:
- unique poster IDs when present
- countries represented in the thread when present
links
Extracted link-related fields, including:
- external links
- external domains
- quoted post IDs
- board references such as
>>>/g/123456789
post
Each post record can include:
- author and subject
- timestamp and formatted posting date
- comment HTML and cleaned comment text
- quote targets
- board references
- external links
- attachment metadata
- content flags such as
containsCodeandcontainsGreentext
Best Practices
- Use
scrapeReplies: truewhen you need full thread content. - Use
scrapeReplies: falsewhen you want faster, lighter discovery runs across many boards. - Start with a smaller
maxThreadsPerBoardif you are exploring new board mixes. The default value of10is chosen to keep quick validation and test runs lightweight. - Split very wide crawls across multiple runs if you are scraping many boards at once.
Large Scraping Guidance
This actor has been tested on larger multi-board runs and works well for long-running scrapes. For the best production experience:
- Use separate runs for very broad board coverage instead of putting every board into one run.
- Keep reply scraping enabled only when you need full thread bodies.
- Use leaner discovery runs first, then follow up with deeper runs on boards or threads that matter most.
In practice, splitting large board lists across scheduled runs is the safest approach for high-volume scraping.
Notes
- Invalid or unavailable boards are skipped.
- Threads that disappear before they are fetched are skipped.
- Very large threads may be split into multiple dataset items to stay within dataset size limits.
- The actor only returns data available through the public 4chan JSON API at the time of scraping.