Reddit All-in-One Scraper avatar

Reddit All-in-One Scraper

Pricing

Pay per event

Go to Apify Store
Reddit All-in-One Scraper

Reddit All-in-One Scraper

Scrape massive historical datasets across Reddit by extracting subreddits, complex search results, post content, and deep comment trees.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

13 days ago

Last modified

Share

📡 Reddit All-in-One Scraper

The research/backfill companion in the Reddit Intelligence Pack.

Use this actor when you need broad Reddit collection (subreddit feeds, searches, post URLs, user/profile pulls, optional comments) to build historical context, analysis datasets, or backfill records.

For recurring net-new alerting, use the pack hero: reddit-keyword-monitor-alerts.

Store Quickstart

  • Start with Brand Mention Research (Backfill) for a compact initial dataset.
  • Use Search + Comments Research when you need deeper discussion context.
  • Move recurring monitoring and webhook alerting to reddit-keyword-monitor-alerts.

Key Features

  • 📡 All source types — Subreddits, post URLs, user profiles, and search queries
  • 💬 Comments with depth control — Nested comment trees with configurable depth
  • 🔍 Search support — Reddit-wide search via search:your query
  • 🏷️ Keyword filtering — Filter posts by title/body keywords
  • 📊 Normalized output — Clean, flat objects for research pipelines
  • 🤝 Pack handoff — Built for backfill/research before recurring monitoring handoff

Use Cases

WhoWhy
Market researchersBackfill competitor/category subreddit history
AnalystsPull search + comments datasets for thematic analysis
Data teamsCollect profile/subreddit sources for downstream scoring
PM/GTM teamsBuild context sets, then move to recurring monitor alerts

Input

FieldTypeDefaultDescription
sourcesarrayrequiredList of sources: subreddit name/URL, post URL, user (e.g. u/spez), user URL, or search:query.
maxPostsPerSourceinteger25Maximum posts to collect from each subreddit, user, or search source.
includeCommentsbooleanfalseFetch comments for each post. Increases run time.
maxCommentsPerPostinteger50Maximum top-level + nested comments to extract per post (when includeComments is on).
commentDepthinteger3How many reply levels to extract (1 = top-level only).
sortstring"hot"Sort order for subreddit and search listings.
timestring"all"Time range filter (applies when sort is 'top' or 'controversial').
keywordsarray[]Only include posts whose title or selftext contains at least one keyword (case-insensitive). Leave empty to include all.

Input Example

{
"sources": ["javascript", "u/spez", "search:web scraping"],
"maxPostsPerSource": 10,
"includeComments": false,
"sort": "hot",
"keywords": [],
"delivery": "dataset"
}

Output

FieldTypeDescription
metaobject
postsarray
posts[].idstring
posts[].subredditstring
posts[].titlestring
posts[].authorstring
posts[].scorenumber
posts[].upvoteRationumber
posts[].numCommentsnumber
posts[].createdAttimestamp
posts[].urlstring (url)
posts[].permalinkstring (url)
posts[].selftextstring
posts[].isSelfboolean
posts[].isNsfwboolean
posts[].isStickiedboolean
posts[].flairstring
posts[].domainstring
posts[].thumbnailnull
posts[].awardsnumber
posts[].sourceTypestring
posts[].sourceValuestring

Output Example

{
"id": "abc123",
"subreddit": "javascript",
"title": "New ESM features in Node 22",
"author": "devuser",
"score": 842,
"upvoteRatio": 0.96,
"numComments": 127,
"createdAt": "2026-01-15T12:30:00.000Z",
"url": "https://example.com/article",
"permalink": "https://www.reddit.com/r/javascript/comments/abc123/…",
"selftext": null,
"isSelf": false,
"isNsfw": false,
"flair": "News",
"sourceType": "subreddit",
"sourceValue": "javascript"
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-all-in-one-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "sources": ["javascript", "u/spez", "search:web scraping"], "maxPostsPerSource": 10, "includeComments": false, "sort": "hot", "keywords": [], "delivery": "dataset" }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/reddit-all-in-one-scraper").call(run_input={
"sources": ["javascript", "u/spez", "search:web scraping"],
"maxPostsPerSource": 10,
"includeComments": false,
"sort": "hot",
"keywords": [],
"delivery": "dataset"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/reddit-all-in-one-scraper').call({
"sources": ["javascript", "u/spez", "search:web scraping"],
"maxPostsPerSource": 10,
"includeComments": false,
"sort": "hot",
"keywords": [],
"delivery": "dataset"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Validation & Cloud Setup

This actor follows shared store-ops conventions:

  • npm test — local unit tests
  • npm run canary:check — live canary validation against latest Apify run/task
  • npm run contract:test:live — live dataset contract check
  • npm run apify:cloud:setup — bootstrap/update Apify task + schedule from local config

Tips & Limitations

  • This actor is best for research/backfill, not recurring diff alerting.
  • For net-new recurring alerts + baseline snapshots, use reddit-keyword-monitor-alerts.
  • 429s are common on aggressive pulls; increase delayMs and trim maxPostsPerSource.
  • For links discovered in posts, use article-content-extractor for full-page content cleanup.

FAQ

Does this need a Reddit API key?

No. It uses public Reddit .json endpoints without authentication.

Can this replace recurring monitoring?

Not directly. This actor does not maintain monitoring snapshots across runs. Use reddit-keyword-monitor-alerts for net-new recurring alert workflows.

Can I scrape private subreddits?

No. Only public subreddits are accessible via public endpoints.

What is the best pack workflow?

Use this actor to gather research/backfill context, then move recurring alert operations to reddit-keyword-monitor-alerts.

Reddit Intelligence Pack workflow:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.001 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.