💬 Subreddit & Comment Scraper
Pricing
Pay per event
💬 Subreddit & Comment Scraper
Scrape specific subreddits for top posts, historical discussions, and nested comments. Built with proxy-aware routing to bypass aggressive platform blocking.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
1
Bookmarked
6
Total users
2
Monthly active users
5 days ago
Last modified
Categories
Share
💬 Reddit Scraper (Legacy Fallback)
This actor is maintained as a legacy, proxy-sensitive fallback in the Reddit Intelligence Pack.
For new workflows:
- Use reddit-keyword-monitor-alerts for recurring monitoring + net-new alerts
- Use reddit-all-in-one-scraper for research/backfill collection
Use this actor only when you specifically need the older subreddit-focused flow.
Store Quickstart
Start with the legacy quickstart template (single subreddit). If running on Apify infrastructure, configure Residential proxy first.
Legacy Scope
- Subreddit-based post scraping
- Optional comment extraction
- Basic sort/time controls
- No recurring snapshot diff monitoring
Input
| Field | Type | Default | Description |
|---|---|---|---|
| subreddits | string[] | (required) | Subreddit names (max 20) |
| sort | string | hot | hot, new, top, rising |
| maxItems | integer | 25 | Max posts per subreddit (1-500) |
| includeComments | boolean | false | Include nested comments |
Input Example
{"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true}
Output
| Field | Type | Description |
|---|---|---|
id | string | Reddit post ID |
title | string | Post title |
author | string | Username of poster |
subreddit | string | Subreddit name |
url | string | Permalink to post |
score | integer | Upvote score |
numComments | integer | Comment count |
createdAt | string | ISO timestamp |
selftext | string | Post body (for text posts) |
comments | object[] | Top comments (if includeComments enabled) |
Output Example
{"title": "New JavaScript framework released","author": "dev_user","score": 1250,"url": "https://example.com/framework","selftext": "Detailed writeup inside...","subreddit": "programming","createdUtc": 1712345678,"numComments": 342,"comments": [{"author": "...", "body": "..."}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-data-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "subreddits": ["programming", "technology"], "sort": "hot", "maxItems": 50, "includeComments": true }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/reddit-data-scraper").call(run_input={"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/reddit-data-scraper').call({"subreddits": ["programming", "technology"],"sort": "hot","maxItems": 50,"includeComments": true});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
⚠️ Proxy Required on Apify Datacenter
Reddit blocks many shared datacenter IPs. Without proxy setup on Apify infra, runs can fail with runStatus: all_blocked and 0 posts.
To fix: enable Apify Residential proxy (APIFY_USE_APIFY_PROXY=true, APIFY_PROXY_GROUPS=RESIDENTIAL) or provide your own residential PROXY_URL.
Legacy Positioning
- This actor is not the recommended first choice for new pack users.
- Prefer
reddit-all-in-one-scraperfor research/backfill andreddit-keyword-monitor-alertsfor recurring alerting.
FAQ
Is this the main Reddit Intelligence Pack actor?
No. This is the legacy fallback actor. New recurring monitor workflows should use reddit-keyword-monitor-alerts.
Does Reddit block this?
Yes, frequently on datacenter IPs. Residential proxy is typically required on Apify cloud.
What is runStatus in output?
| Value | Meaning |
|---|---|
ok | All subreddits fetched successfully |
partial | Some subreddits succeeded; others were blocked/errored |
all_blocked | Every subreddit was blocked — no posts collected (exit code 1) |
Related Actors
Reddit Intelligence Pack (recommended path):
- 🚨 Reddit Keyword Monitor Alerts — Hero recurring monitor for net-new alerts.
- 📡 Reddit All-in-One Scraper — Research/backfill companion.
- 📰 Article Extractor — Linked URL cleanup add-on.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.