4Chan Thread & Board Scraper avatar

4Chan Thread & Board Scraper

Pricing

from $1.25 / 1,000 results

Go to Apify Store
4Chan Thread & Board Scraper

4Chan Thread & Board Scraper

Scrape threads from one or more 4chan boards using the official 4chan JSON API. Collect structured thread data, original posts, optional replies, attachments, extracted links, participant summaries, and thread-level metadata for research, monitoring, archiving, and downstream analysis.

Pricing

from $1.25 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

4chan JSON API Scraper

Scrape threads from one or more 4chan boards through the official 4chan JSON API.

This actor is useful when you want structured thread data for research, monitoring, archiving, enrichment, or downstream analysis. You choose which boards to scrape, how many threads to collect from each board, and whether to include replies or just the original post.

What The Actor Does

  • Fetches the current catalog for each selected board
  • Collects up to your chosen number of threads per board
  • Stores thread metadata, original post data, attachment details, and optional replies
  • Enriches output with catalog context, participation summaries, and extracted links

Input

boards

List of boards to scrape, without slashes.

Example:

["g", "biz", "tv"]

Default: ["g"]

maxThreadsPerBoard

Maximum number of threads to collect from each selected board.

Example:

20

Default: 10

scrapeReplies

  • true: store the original post and all available replies in each thread
  • false: store only the original post, while still keeping thread-level counts and summary metadata

Default: true

proxyConfiguration

Optional proxy settings for the requests. By default, the actor uses direct requests because the public 4chan JSON API is usually faster without a proxy. Enable Apify Proxy or provide custom proxy URLs only when your environment needs it.

Output

Each dataset item represents one scraped post row with repeated thread-level metadata.

Top-level fields include:

  • board
  • threadId
  • threadUrl
  • apiUrl
  • scrapedAt
  • subject
  • semanticUrl
  • replyCount
  • imageCount
  • isSticky
  • isClosed
  • isArchived
  • archivedOn
  • catalog
  • stats
  • participants
  • links
  • post

catalog

Board catalog context for the thread, including:

  • catalog page number
  • last modified timestamp
  • omitted reply count
  • omitted image count
  • recent reply post IDs when available

stats

Thread-level summary fields, including:

  • total posts and replies in the thread
  • how many posts and replies are stored in the dataset item
  • attachment totals
  • quote counts
  • external link counts
  • board reference counts
  • simple content flags such as code and greentext counts

participants

Participant summaries, including:

  • unique poster IDs when present
  • countries represented in the thread when present

Extracted link-related fields, including:

  • external links
  • external domains
  • quoted post IDs
  • board references such as >>>/g/123456789

post

Each post record can include:

  • author and subject
  • timestamp and formatted posting date
  • comment HTML and cleaned comment text
  • quote targets
  • board references
  • external links
  • attachment metadata
  • content flags such as containsCode and containsGreentext

Best Practices

  • Use scrapeReplies: true when you need full thread content.
  • Use scrapeReplies: false when you want faster, lighter discovery runs across many boards.
  • Start with a smaller maxThreadsPerBoard if you are exploring new board mixes. The default value of 10 is chosen to keep quick validation and test runs lightweight.
  • Split very wide crawls across multiple runs if you are scraping many boards at once.

Large Scraping Guidance

This actor has been tested on larger multi-board runs and works well for long-running scrapes. For the best production experience:

  • Use separate runs for very broad board coverage instead of putting every board into one run.
  • Keep reply scraping enabled only when you need full thread bodies.
  • Use leaner discovery runs first, then follow up with deeper runs on boards or threads that matter most.

In practice, splitting large board lists across scheduled runs is the safest approach for high-volume scraping.

Notes

  • Invalid or unavailable boards are skipped.
  • Threads that disappear before they are fetched are skipped.
  • Very large threads may be split into multiple dataset items to stay within dataset size limits.
  • The actor only returns data available through the public 4chan JSON API at the time of scraping.