Pricing

Pay per event

Reddit All-in-One Scraper

Scrape massive historical datasets across Reddit by extracting subreddits, complex search results, post content, and deep comment trees.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

14 days ago

Last modified

📡 Reddit All-in-One Scraper

Unlock massive historical datasets with the ultimate Reddit all-in-one scraper, engineered for deep web extraction and historical backfilling. Data scientists, AI researchers, and analysts use this web scraper to systematically extract complex search results, full subreddit feeds, and densely nested comment trees. When you need to scrape Reddit to build foundational training data or run large-scale sentiment analysis, this tool bypasses basic limitations to deliver comprehensive structured data.

Unlike simple point-and-click tools, this scraper is designed for heavy-duty data collection. You can input hundreds of post URLs, execute broad search queries with specific parameters, or target exact subreddits to map out years of discussions. Users frequently schedule this tool to run extensive weekly backfills, ensuring their localized databases contain every relevant thread, upvote metric, and author detail. It is the definitive starting point for any robust social intelligence workflow.

Every time you run the extractor, you capture rich, contextual data including thread_id, nested_comment_text, author_flair, timestamp, and score. By exporting these deep results, AI models get the conversational context they require, and researchers gain a clear picture of community dynamics. Once your massive historical backfill is complete and safely exported, you can seamlessly shift your daily monitoring over to a specialized keyword alert tool to catch newly published posts.

Store Quickstart

Start with store-input.example.json or Quickstart — Compact Backfill for a compact initial dataset.
Then use the research ladder from store-input.templates.json:
1. Quickstart — Compact Backfill
2. Recurring Research Refresh
3. Webhook → Article Cleanup Queue
Side presets stay available for deeper pulls: Competitor Subreddit Backfill, Search + Comments Research, and User/Profile Backfill.
Move true net-new recurring alerting to reddit-keyword-monitor-alerts once the backfill dataset is useful.
Buyer-facing proof assets live in sample-output.example.json and live-proof.example.json.

Key Features

📡 All source types — Subreddits, post URLs, user profiles, and search queries
💬 Comments with depth control — Nested comment trees with configurable depth
🔍 Search support — Reddit-wide search via search:your query
🏷️ Keyword filtering — Filter posts by title/body keywords
📊 Normalized output — Clean, flat objects for research pipelines
🤝 Pack handoff — Built for backfill/research before recurring monitoring handoff

Use Cases

Who	Why
Market researchers	Backfill competitor/category subreddit history
Analysts	Pull search + comments datasets for thematic analysis
Data teams	Collect profile/subreddit sources for downstream scoring
PM/GTM teams	Build context sets, then move to recurring monitor alerts

Input

Field	Type	Default	Description
sources	array	required	List of sources: subreddit name/URL, post URL, user (e.g. `u/spez`), user URL, or `search:query`.
maxPostsPerSource	integer	`25`	Maximum posts to collect from each subreddit, user, or search source.
includeComments	boolean	`false`	Fetch comments for each post. Increases run time.
maxCommentsPerPost	integer	`50`	Maximum top-level + nested comments to extract per post (when includeComments is on).
commentDepth	integer	`3`	How many reply levels to extract (1 = top-level only).
sort	string	`"hot"`	Sort order for subreddit and search listings.
time	string	`"all"`	Time range filter (applies when sort is 'top' or 'controversial').
keywords	array	`[]`	Only include posts whose title or selftext contains at least one keyword (case-insensitive). Leave empty to include all.

Input Example

{
  "sources": ["javascript", "u/spez", "search:web scraping"],
  "maxPostsPerSource": 10,
  "includeComments": false,
  "sort": "hot",
  "keywords": [],
  "delivery": "dataset"
}

Input Examples

Example: Subreddit research

{
  "mode": "subreddit",
  "subreddit": "MachineLearning",
  "sort": "top",
  "time": "month"
}

Example: User activity history

{
  "mode": "user",
  "username": "anExampleUser",
  "maxItems": 100
}

Example: Cross-subreddit keyword search

{
  "mode": "search",
  "query": "claude vs gpt",
  "sort": "relevance",
  "maxResults": 50
}

Output

Field	Type	Description
`meta`	object
`posts`	array
`posts[].id`	string
`posts[].subreddit`	string
`posts[].title`	string
`posts[].author`	string
`posts[].score`	number
`posts[].upvoteRatio`	number
`posts[].numComments`	number
`posts[].createdAt`	timestamp
`posts[].url`	string (url)
`posts[].permalink`	string (url)
`posts[].selftext`	string
`posts[].isSelf`	boolean
`posts[].isNsfw`	boolean
`posts[].isStickied`	boolean
`posts[].flair`	string
`posts[].domain`	string
`posts[].thumbnail`	null
`posts[].awards`	number
`posts[].sourceType`	string
`posts[].sourceValue`	string

Output Example

{
  "id": "abc123",
  "subreddit": "javascript",
  "title": "New ESM features in Node 22",
  "author": "devuser",
  "score": 842,
  "upvoteRatio": 0.96,
  "numComments": 127,
  "createdAt": "2026-01-15T12:30:00.000Z",
  "url": "https://example.com/article",
  "permalink": "https://www.reddit.com/r/javascript/comments/abc123/…",
  "selftext": null,
  "isSelf": false,
  "isNsfw": false,
  "flair": "News",
  "sourceType": "subreddit",
  "sourceValue": "javascript"
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-all-in-one-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "sources": ["javascript", "u/spez", "search:web scraping"], "maxPostsPerSource": 10, "includeComments": false, "sort": "hot", "keywords": [], "delivery": "dataset" }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/reddit-all-in-one-scraper").call(run_input={
  "sources": ["javascript", "u/spez", "search:web scraping"],
  "maxPostsPerSource": 10,
  "includeComments": false,
  "sort": "hot",
  "keywords": [],
  "delivery": "dataset"
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/reddit-all-in-one-scraper').call({
  "sources": ["javascript", "u/spez", "search:web scraping"],
  "maxPostsPerSource": 10,
  "includeComments": false,
  "sort": "hot",
  "keywords": [],
  "delivery": "dataset"
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Validation & Cloud Setup

This actor follows shared store-ops conventions:

npm test — local unit tests
npm run canary:check — live canary validation against latest Apify run/task
npm run contract:test:live — live dataset contract check
npm run apify:cloud:setup — bootstrap/update Apify task + schedule from local config

Tips & Limitations

This actor is best for research/backfill, not recurring diff alerting.
For net-new recurring alerts + baseline snapshots, use reddit-keyword-monitor-alerts.
429s are common on aggressive pulls; increase delayMs and trim maxPostsPerSource.
For links discovered in posts, use article-content-extractor for full-page content cleanup.

FAQ

Does this need a Reddit API key?

No. It uses public Reddit .json endpoints without authentication.

Can this replace recurring monitoring?

Not directly. This actor does not maintain monitoring snapshots across runs. Use reddit-keyword-monitor-alerts for net-new recurring alert workflows.

Can I scrape private subreddits?

No. Only public subreddits are accessible via public endpoints.

What is the best pack workflow?

Use this actor to gather research/backfill context, then move recurring alert operations to reddit-keyword-monitor-alerts.

Reddit Intelligence Pack workflow:

🚨 Reddit Keyword Monitor Alerts — Hero recurring monitor for net-new alerts + webhook handoff.
📰 Article Extractor — Extract linked article text from Reddit URLs.
💬 Reddit Scraper (Legacy) — Legacy/proxy-sensitive fallback, not primary entry point.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.001 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

Reddit Scraper — Posts, Comments, Subreddits | MCP + AI

scrape.badger/reddit-scraper

Scrape Reddit posts, users, subreddits and other data with affordable ScrapeBadger Reddit Scraper. High success rates and fast support. 20 modes: posts, comment trees, subreddits, rules, wiki, user data, keyword & domain search, trending. No Reddit API key needed. From $1.00/1K items.

Scrape Badger

Reddit Search — Find Posts Across All Subreddits

maged120/reddit-search

Scrape Reddit search results for any query. Get posts, scores, subreddits, authors, and timestamps without logging in. Supports all Reddit search filters.

Maged

2.0

Reddit Comment Scraper

scraperx/reddit-comment-scraper

ScraperX

Reddit Comment Scraper

scraperforge/reddit-comment-scraper

ScraperForge

Reddit Comment Scraper

scrapepilotapi/reddit-comment-scraper

ScrapePilot

Reddit Post & Comment Scraper

miccho27/reddit-post-scraper

Scrape Reddit posts and comments from any subreddit or thread URL. Extract titles, scores, authors, comment trees, and metadata. No Reddit API key or OAuth required.