Reddit Posts & Subreddit Comment Scraper avatar

Reddit Posts & Subreddit Comment Scraper

Pricing

Pay per event

Go to Apify Store
Reddit Posts & Subreddit Comment Scraper

Reddit Posts & Subreddit Comment Scraper

Scrape Reddit posts and nested comment trees from specific subreddits. Proxy-aware fallback for the legacy public surface. Sort by hot, top, new, rising with optional comment depth control.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

1

Bookmarked

8

Total users

4

Monthly active users

10 days ago

Last modified

Share

💬 Reddit Scraper (Legacy Fallback)

Dive deep into niche communities with the Subreddit & Comment Scraper, a powerful extraction utility built to navigate and capture targeted discussions. Specifically maintained for proxy-sensitive environments, this tool serves as a reliable fallback for workflows that require robust IP management and older routing methods to bypass aggressive scraping countermeasures. It excels at scraping high-volume subreddits, pulling down both parent posts and the complex, nested comment trees that contain valuable user opinions and sentiment.

Research teams, community managers, and OSINT analysts utilize this scraper to conduct historical audits of specific subreddits, track viral topics, and analyze authentic user feedback. By specifying target subreddits and applying granular sort filters (such as Top of All Time or Newest), you can precisely control what data enters your pipeline. The tool bypasses the limitations of standard APIs, ensuring you get unfiltered access to community conversations.

Your resulting datasets will include rich, structured details: accurate timestamps, total upvotes, author handles, full comment bodies, and post URLs. This allows for seamless downstream analysis of social trends. Note that this is a legacy-focused actor prioritizing proxy-aware subreddit flows. If your primary goal involves setting up recurring keyword alerts across the entire site, or broad user-profile scraping, our newer Reddit All-in-One Scraper is recommended. However, for specialized subreddit extraction where you control the residential proxies and demand a straightforward, list-based collection method, this scraper remains a highly effective and fully maintained choice.

Store Quickstart

  • Start with store-input.example.json or Legacy Quickstart (Proxy-aware). If running on Apify infrastructure, configure Residential proxy first.
  • Then use the legacy ladder from store-input.templates.json:
    1. Legacy Quickstart (Proxy-aware)
    2. Legacy Recurring Refresh (Proxy-aware)
    3. Legacy Webhook Handoff (Proxy-aware)
  • Buyer-facing proof assets live in sample-output.example.json and live-proof.example.json.
  • New recurring or pack-first users should still move to reddit-all-in-one-scraper / reddit-keyword-monitor-alerts once the legacy need is proven.

Legacy Scope

  • Subreddit-based post scraping
  • Optional comment extraction
  • Basic sort/time controls
  • No recurring snapshot diff monitoring

Input

FieldTypeDefaultDescription
subredditsstring[](required)Subreddit names (max 20)
sortstringhothot, new, top, rising
maxItemsinteger25Max posts per subreddit (1-500)
includeCommentsbooleanfalseInclude nested comments

Input Example

{
"subreddits": ["programming", "technology"],
"sort": "hot",
"maxItems": 50,
"includeComments": true
}

Input Examples

Example: Top of all time in a subreddit

{
"subreddits": [
"DataIsBeautiful"
],
"sort": "top",
"time": "all",
"maxPosts": 25,
"includeComments": true,
"commentDepth": 2
}

Example: Newest posts (multi-subreddit)

{
"subreddits": [
"MachineLearning",
"datascience"
],
"sort": "new",
"maxPosts": 50,
"includeComments": false
}

Example: Specific post + comment tree

{
"posts": [
"https://old.reddit.com/r/programming/comments/abc123/"
],
"includeComments": true,
"commentDepth": 5
}

Output

FieldTypeDescription
idstringReddit post ID
titlestringPost title
authorstringUsername of poster
subredditstringSubreddit name
urlstringPermalink to post
scoreintegerUpvote score
numCommentsintegerComment count
createdAtstringISO timestamp
selftextstringPost body (for text posts)
commentsobject[]Top comments (if includeComments enabled)

Output Example

{
"title": "New JavaScript framework released",
"author": "dev_user",
"score": 1250,
"url": "https://example.com/framework",
"selftext": "Detailed writeup inside...",
"subreddit": "programming",
"createdUtc": 1712345678,
"numComments": 342,
"comments": [{"author": "...", "body": "..."}]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-data-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "subreddits": ["programming", "technology"], "sort": "hot", "maxItems": 50, "includeComments": true }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/reddit-data-scraper").call(run_input={
"subreddits": ["programming", "technology"],
"sort": "hot",
"maxItems": 50,
"includeComments": true
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/reddit-data-scraper').call({
"subreddits": ["programming", "technology"],
"sort": "hot",
"maxItems": 50,
"includeComments": true
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

⚠️ Proxy Required on Apify Datacenter

Reddit blocks many shared datacenter IPs. Without proxy setup on Apify infra, runs can fail with runStatus: all_blocked and 0 posts.

To fix: enable Apify Residential proxy (APIFY_USE_APIFY_PROXY=true, APIFY_PROXY_GROUPS=RESIDENTIAL) or provide your own residential PROXY_URL.

Legacy Positioning

  • This actor is not the recommended first choice for new pack users.
  • Prefer reddit-all-in-one-scraper for research/backfill and reddit-keyword-monitor-alerts for recurring alerting.

FAQ

Is this the main Reddit Intelligence Pack actor?

No. This is the legacy fallback actor. New recurring monitor workflows should use reddit-keyword-monitor-alerts.

Does Reddit block this?

Yes, frequently on datacenter IPs. Residential proxy is typically required on Apify cloud.

What is runStatus in output?

ValueMeaning
okAll subreddits fetched successfully
partialSome subreddits succeeded; others were blocked/errored
all_blockedEvery subreddit was blocked — no posts collected (exit code 1)

Reddit Intelligence Pack (recommended path):

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.