Pricing

from $0.35 / 1,000 results

4Chan Thread & Board Scraper

Scrape threads from one or more 4chan boards using the official 4chan JSON API. Collect structured thread data, original posts, optional replies, attachments, extracted links, participant summaries, and thread-level metadata for research, monitoring, archiving, and downstream analysis.

Pricing

from $0.35 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

4chan JSON API Scraper

At a glance: what it does is scrape public 4chan board and thread data through the official JSON API; input examples include board names and direct thread URLs; output examples are thread, post, attachment, and reply rows; use cases include research, monitoring, and archiving; limitations, troubleshooting, and pricing/cost notes are covered below.

Scrape threads from one or more 4chan boards through the official 4chan JSON API.

This actor is useful when you want structured thread data for research, monitoring, archiving, enrichment, or downstream analysis. You choose which boards to scrape, how many threads to collect from each board, and whether to include replies or just the original post.

What The Actor Does

Fetches the current catalog for each selected board
Collects up to your chosen number of threads per board
Stores thread metadata, original post data, attachment details, and optional replies
Enriches output with catalog context, participation summaries, and extracted links

Input

`boards`

List of boards to scrape, without slashes.

Example:

["g", "biz", "tv"]

Default: ["g"]

`maxThreadsPerBoard`

Maximum number of threads to collect from each selected board.

Example:

Default: 10

`threadUrls`

Optional direct thread URLs to scrape. Use this when you already know the threads you need and want to skip board catalog discovery.

Example:

["https://boards.4chan.org/g/thread/105076684"]

`maxRepliesPerThread`

Optional cap on stored replies for each thread when scrapeReplies is enabled. The newest replies are kept. Leave empty to store all available replies.

Example:

`scrapeReplies`

false: store only the original post, while still keeping thread-level counts and summary metadata
true: store the original post and all available replies in each thread

Default: false

`proxyConfiguration`

Optional proxy settings for the requests. By default, the actor uses direct requests because the public 4chan JSON API is usually faster without a proxy. Enable Apify Proxy or provide custom proxy URLs only when your environment needs it.

Output

Each dataset item represents one scraped post row with repeated thread-level metadata.

Top-level fields include:

board
threadId
threadUrl
apiUrl
scrapedAt
subject
semanticUrl
replyCount
imageCount
isSticky
isClosed
isArchived
archivedOn
catalog
stats
participants
links
post

`catalog`

Board catalog context for the thread, including:

catalog page number
last modified timestamp
omitted reply count
omitted image count
recent reply post IDs when available

`stats`

Thread-level summary fields, including:

total posts and replies in the thread
how many posts and replies are stored in the dataset item
attachment totals
quote counts
external link counts
board reference counts
simple content flags such as code and greentext counts

`participants`

Participant summaries, including:

unique poster IDs when present
countries represented in the thread when present

`links`

Extracted link-related fields, including:

external links
external domains
quoted post IDs
board references such as >>>/g/123456789

`post`

Each post record can include:

author and subject
timestamp and formatted posting date
comment HTML and cleaned comment text
quote targets
board references
external links
attachment metadata
content flags such as containsCode and containsGreentext

Best Practices

Use scrapeReplies: false when you want faster, lighter discovery runs across many boards.
Use scrapeReplies: true when you need full thread content.
Use threadUrls when you already know specific threads; this skips catalog discovery and finishes faster.
Use maxRepliesPerThread when you only need the latest replies and want to reduce dataset volume.
Start with a smaller maxThreadsPerBoard if you are exploring new board mixes. The default value of 10 is chosen to keep quick validation and test runs lightweight.
Split very wide crawls across multiple runs if you are scraping many boards at once.
Keep normal runs at 128 MB. Launch larger reply-heavy runs at 256 MB, especially when scraping 50 or more thread detail pages or storing uncapped/high reply counts.

Large Scraping Guidance

This actor has been tested on larger multi-board runs and works well for long-running scrapes. For the best production experience:

Use separate runs for very broad board coverage instead of putting every board into one run.
Keep reply scraping enabled only when you need full thread bodies.
Use leaner discovery runs first, then follow up with deeper runs on boards or threads that matter most.
For reply-heavy board runs, allow roughly one second per fetched thread plus startup and catalog overhead. The actor logs a recommended timeout at startup and warns if the current run timeout is likely too low.
The actor logs a recommended memory setting at startup. Use 256 MB for large reply-heavy runs and keep 128 MB for small discovery runs to minimize compute cost.
The default cloud timeout is set high enough for large normal runs, but API users can still override it per run if they intentionally want a stricter cap.

In practice, splitting large board lists across scheduled runs is the safest approach for high-volume scraping.

Notes

Invalid or unavailable boards are skipped.
Threads that disappear before they are fetched are skipped.
Results are pushed to the Apify dataset as each catalog batch or thread is processed; the actor does not wait until the end of the run to write all rows.
Very large threads may be split into multiple dataset items to stay within dataset size limits.
The actor only returns data available through the public 4chan JSON API at the time of scraping.

4chan Scraper

goat255/fourchan-scraper

Scrape 4chan boards and threads without a login. Pull the catalog of a board (every live thread's opening post) or a full thread with all its replies. Comment HTML is stripped to clean plain text, with media URLs, counts, and metadata.

Goutam Soni

Reddit Thread Details Scraper

ecomscrape/reddit-thread-details-scraper

Reddit Thread Details Scraper automates extraction of comprehensive thread metadata including post content, engagement metrics, author information, and moderation data. Efficiently collect detailed Reddit data for social listening, market research, sentiment analysis, and community insights.

ecomscrape

Threads Search Post Scraper

trantus/threads-search-post-scraper

Scrape public Threads search results and extract full thread data. Accepts text queries, date filters, and author filters. Returns structured thread items with all posts, root post info, and summary stats. Ideal for research, monitoring, and analytics.

Tran Tu

175

Hackernews Thread Fetcher

simplifysme/hackernews-thread-fetcher

📰 Fetch Hacker News thread data using the official Firebase API - no authentication required! Perfect for tech news monitoring and community insights.

SimplifySME Toolbox

Threads Profile & Post Scraper

headlessagent/thread-profile-post-scraper

Scrape Thread profiles and posts. Get clean JSON with profile stats, media URLs, and more.

Headless Agent

X.com (Twitter) Thread Unroller & Scraper

lurkapi/x-twitter-thread-unroller

Paste any tweet URL. Get the entire thread unrolled: root, all self-replies, and clean Markdown. Pay per thread ($5/1k base). Optional add-ons: media downloads, full quoted tweets.

LurkAPI

Threads Scraper

gio21/threads-scraper

Scrape Meta Threads posts from public profiles or thread URLs. Caption, author, timestamp, likes, replies, media. Pay per post.

Gio

Hacker News Thread Summarizer

didactic_liszt/hn-summarizer

Fetches top comments from a Hacker News thread and summarizes them using Claude API.

HJ JOO

X(Twitter) Thread Scraper

powerai/twitter-thread-scraper

Expand a tweet into its thread and export each item with engagement fields and metadata, with automatic cursor paging up to your cap.

PowerAI

Thread media downlaoder(video,images)

fingolfin/thread-media-downlaoder-video-images

this crawler is to download media from thread social platform i give you back the link where the downloaded file it and you can get it in png or mp4 format

Mate Papava

4Chan Thread & Board Scraper

4chan JSON API Scraper

What The Actor Does

Input

boards

maxThreadsPerBoard

threadUrls

maxRepliesPerThread

scrapeReplies

proxyConfiguration

Output

catalog

stats

participants

links

post

Best Practices

Large Scraping Guidance

Notes

You might also like

4chan Scraper

Reddit Thread Details Scraper

Threads Search Post Scraper

Hackernews Thread Fetcher

Threads Profile & Post Scraper

X.com (Twitter) Thread Unroller & Scraper

Threads Scraper

Hacker News Thread Summarizer

X(Twitter) Thread Scraper

Thread media downlaoder(video,images)

`boards`

`maxThreadsPerBoard`

`threadUrls`

`maxRepliesPerThread`

`scrapeReplies`

`proxyConfiguration`

`catalog`

`stats`

`participants`

`links`

`post`