Reddit All-in-One Scraper
Pricing
Pay per event
Reddit All-in-One Scraper
Scrape massive historical datasets across Reddit by extracting subreddits, complex search results, post content, and deep comment trees.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
13 days ago
Last modified
Categories
Share
📡 Reddit All-in-One Scraper
The research/backfill companion in the Reddit Intelligence Pack.
Use this actor when you need broad Reddit collection (subreddit feeds, searches, post URLs, user/profile pulls, optional comments) to build historical context, analysis datasets, or backfill records.
For recurring net-new alerting, use the pack hero: reddit-keyword-monitor-alerts.
Store Quickstart
- Start with Brand Mention Research (Backfill) for a compact initial dataset.
- Use Search + Comments Research when you need deeper discussion context.
- Move recurring monitoring and webhook alerting to reddit-keyword-monitor-alerts.
Key Features
- 📡 All source types — Subreddits, post URLs, user profiles, and search queries
- 💬 Comments with depth control — Nested comment trees with configurable depth
- 🔍 Search support — Reddit-wide search via
search:your query - 🏷️ Keyword filtering — Filter posts by title/body keywords
- 📊 Normalized output — Clean, flat objects for research pipelines
- 🤝 Pack handoff — Built for backfill/research before recurring monitoring handoff
Use Cases
| Who | Why |
|---|---|
| Market researchers | Backfill competitor/category subreddit history |
| Analysts | Pull search + comments datasets for thematic analysis |
| Data teams | Collect profile/subreddit sources for downstream scoring |
| PM/GTM teams | Build context sets, then move to recurring monitor alerts |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| sources | array | required | List of sources: subreddit name/URL, post URL, user (e.g. u/spez), user URL, or search:query. |
| maxPostsPerSource | integer | 25 | Maximum posts to collect from each subreddit, user, or search source. |
| includeComments | boolean | false | Fetch comments for each post. Increases run time. |
| maxCommentsPerPost | integer | 50 | Maximum top-level + nested comments to extract per post (when includeComments is on). |
| commentDepth | integer | 3 | How many reply levels to extract (1 = top-level only). |
| sort | string | "hot" | Sort order for subreddit and search listings. |
| time | string | "all" | Time range filter (applies when sort is 'top' or 'controversial'). |
| keywords | array | [] | Only include posts whose title or selftext contains at least one keyword (case-insensitive). Leave empty to include all. |
Input Example
{"sources": ["javascript", "u/spez", "search:web scraping"],"maxPostsPerSource": 10,"includeComments": false,"sort": "hot","keywords": [],"delivery": "dataset"}
Output
| Field | Type | Description |
|---|---|---|
meta | object | |
posts | array | |
posts[].id | string | |
posts[].subreddit | string | |
posts[].title | string | |
posts[].author | string | |
posts[].score | number | |
posts[].upvoteRatio | number | |
posts[].numComments | number | |
posts[].createdAt | timestamp | |
posts[].url | string (url) | |
posts[].permalink | string (url) | |
posts[].selftext | string | |
posts[].isSelf | boolean | |
posts[].isNsfw | boolean | |
posts[].isStickied | boolean | |
posts[].flair | string | |
posts[].domain | string | |
posts[].thumbnail | null | |
posts[].awards | number | |
posts[].sourceType | string | |
posts[].sourceValue | string |
Output Example
{"id": "abc123","subreddit": "javascript","title": "New ESM features in Node 22","author": "devuser","score": 842,"upvoteRatio": 0.96,"numComments": 127,"createdAt": "2026-01-15T12:30:00.000Z","url": "https://example.com/article","permalink": "https://www.reddit.com/r/javascript/comments/abc123/…","selftext": null,"isSelf": false,"isNsfw": false,"flair": "News","sourceType": "subreddit","sourceValue": "javascript"}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-all-in-one-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "sources": ["javascript", "u/spez", "search:web scraping"], "maxPostsPerSource": 10, "includeComments": false, "sort": "hot", "keywords": [], "delivery": "dataset" }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/reddit-all-in-one-scraper").call(run_input={"sources": ["javascript", "u/spez", "search:web scraping"],"maxPostsPerSource": 10,"includeComments": false,"sort": "hot","keywords": [],"delivery": "dataset"})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/reddit-all-in-one-scraper').call({"sources": ["javascript", "u/spez", "search:web scraping"],"maxPostsPerSource": 10,"includeComments": false,"sort": "hot","keywords": [],"delivery": "dataset"});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Validation & Cloud Setup
This actor follows shared store-ops conventions:
npm test— local unit testsnpm run canary:check— live canary validation against latest Apify run/tasknpm run contract:test:live— live dataset contract checknpm run apify:cloud:setup— bootstrap/update Apify task + schedule from local config
Tips & Limitations
- This actor is best for research/backfill, not recurring diff alerting.
- For net-new recurring alerts + baseline snapshots, use reddit-keyword-monitor-alerts.
- 429s are common on aggressive pulls; increase
delayMsand trimmaxPostsPerSource. - For links discovered in posts, use article-content-extractor for full-page content cleanup.
FAQ
Does this need a Reddit API key?
No. It uses public Reddit .json endpoints without authentication.
Can this replace recurring monitoring?
Not directly. This actor does not maintain monitoring snapshots across runs. Use reddit-keyword-monitor-alerts for net-new recurring alert workflows.
Can I scrape private subreddits?
No. Only public subreddits are accessible via public endpoints.
What is the best pack workflow?
Use this actor to gather research/backfill context, then move recurring alert operations to reddit-keyword-monitor-alerts.
Related Actors
Reddit Intelligence Pack workflow:
- 🚨 Reddit Keyword Monitor Alerts — Hero recurring monitor for net-new alerts + webhook handoff.
- 📰 Article Extractor — Extract linked article text from Reddit URLs.
- 💬 Reddit Scraper (Legacy) — Legacy/proxy-sensitive fallback, not primary entry point.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.001 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01
No subscription required — you only pay for what you use.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.
