Reddit Posts & Subreddit Comment Scraper
Pricing
Pay per event
Reddit Posts & Subreddit Comment Scraper
Scrape Reddit posts and nested comment trees from specific subreddits. Proxy-aware fallback for the legacy public surface. Sort by hot, top, new, rising with optional comment depth control.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Maintained by CommunityActor stats
1
Bookmarked
8
Total users
4
Monthly active users
10 days ago
Last modified
Categories
Share
💬 Reddit Scraper (Legacy Fallback)
Dive deep into niche communities with the Subreddit & Comment Scraper, a powerful extraction utility built to navigate and capture targeted discussions. Specifically maintained for proxy-sensitive environments, this tool serves as a reliable fallback for workflows that require robust IP management and older routing methods to bypass aggressive scraping countermeasures. It excels at scraping high-volume subreddits, pulling down both parent posts and the complex, nested comment trees that contain valuable user opinions and sentiment.
Research teams, community managers, and OSINT analysts utilize this scraper to conduct historical audits of specific subreddits, track viral topics, and analyze authentic user feedback. By specifying target subreddits and applying granular sort filters (such as Top of All Time or Newest), you can precisely control what data enters your pipeline. The tool bypasses the limitations of standard APIs, ensuring you get unfiltered access to community conversations.
Your resulting datasets will include rich, structured details: accurate timestamps, total upvotes, author handles, full comment bodies, and post URLs. This allows for seamless downstream analysis of social trends. Note that this is a legacy-focused actor prioritizing proxy-aware subreddit flows. If your primary goal involves setting up recurring keyword alerts across the entire site, or broad user-profile scraping, our newer Reddit All-in-One Scraper is recommended. However, for specialized subreddit extraction where you control the residential proxies and demand a straightforward, list-based collection method, this scraper remains a highly effective and fully maintained choice.
Store Quickstart
- Start with
store-input.example.jsonor Legacy Quickstart (Proxy-aware). If running on Apify infrastructure, configure Residential proxy first. - Then use the legacy ladder from
store-input.templates.json:- Legacy Quickstart (Proxy-aware)
- Legacy Recurring Refresh (Proxy-aware)
- Legacy Webhook Handoff (Proxy-aware)
- Buyer-facing proof assets live in
sample-output.example.jsonandlive-proof.example.json. - New recurring or pack-first users should still move to
reddit-all-in-one-scraper/reddit-keyword-monitor-alertsonce the legacy need is proven.
Legacy Scope
- Subreddit-based post scraping
- Optional comment extraction
- Basic sort/time controls
- No recurring snapshot diff monitoring
Input
| Field | Type | Default | Description |
|---|---|---|---|
| subreddits | string[] | (required) | Subreddit names (max 20) |
| sort | string | hot | hot, new, top, rising |
| maxItems | integer | 25 | Max posts per subreddit (1-500) |
| includeComments | boolean | false | Include nested comments |
Input Example
{"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true}
Input Examples
Example: Top of all time in a subreddit
{"subreddits": ["DataIsBeautiful"],"sort": "top","time": "all","maxPosts": 25,"includeComments": true,"commentDepth": 2}
Example: Newest posts (multi-subreddit)
{"subreddits": ["MachineLearning","datascience"],"sort": "new","maxPosts": 50,"includeComments": false}
Example: Specific post + comment tree
{"posts": ["https://old.reddit.com/r/programming/comments/abc123/"],"includeComments": true,"commentDepth": 5}
Output
| Field | Type | Description |
|---|---|---|
id | string | Reddit post ID |
title | string | Post title |
author | string | Username of poster |
subreddit | string | Subreddit name |
url | string | Permalink to post |
score | integer | Upvote score |
numComments | integer | Comment count |
createdAt | string | ISO timestamp |
selftext | string | Post body (for text posts) |
comments | object[] | Top comments (if includeComments enabled) |
Output Example
{"title": "New JavaScript framework released","author": "dev_user","score": 1250,"url": "https://example.com/framework","selftext": "Detailed writeup inside...","subreddit": "programming","createdUtc": 1712345678,"numComments": 342,"comments": [{"author": "...", "body": "..."}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-data-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "subreddits": ["programming", "technology"], "sort": "hot", "maxItems": 50, "includeComments": true }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/reddit-data-scraper").call(run_input={"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/reddit-data-scraper').call({"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
⚠️ Proxy Required on Apify Datacenter
Reddit blocks many shared datacenter IPs. Without proxy setup on Apify infra, runs can fail with runStatus: all_blocked and 0 posts.
To fix: enable Apify Residential proxy (APIFY_USE_APIFY_PROXY=true, APIFY_PROXY_GROUPS=RESIDENTIAL) or provide your own residential PROXY_URL.
Legacy Positioning
- This actor is not the recommended first choice for new pack users.
- Prefer
reddit-all-in-one-scraperfor research/backfill andreddit-keyword-monitor-alertsfor recurring alerting.
FAQ
Is this the main Reddit Intelligence Pack actor?
No. This is the legacy fallback actor. New recurring monitor workflows should use reddit-keyword-monitor-alerts.
Does Reddit block this?
Yes, frequently on datacenter IPs. Residential proxy is typically required on Apify cloud.
What is runStatus in output?
| Value | Meaning |
|---|---|
ok | All subreddits fetched successfully |
partial | Some subreddits succeeded; others were blocked/errored |
all_blocked | Every subreddit was blocked — no posts collected (exit code 1) |
Related Actors
Reddit Intelligence Pack (recommended path):
- 🚨 Reddit Keyword Monitor Alerts — Hero recurring monitor for net-new alerts.
- 📡 Reddit All-in-One Scraper — Research/backfill companion.
- 📰 Article Extractor — Linked URL cleanup add-on.
- 🐘 Mastodon Hashtag & Account Scraper — Federated social listening (Twitter/X-free), same query/result shape on the Fediverse.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.