Reddit Scraper — Posts, Comments & Subreddit Data avatar

Reddit Scraper — Posts, Comments & Subreddit Data

Pricing

Pay per usage

Go to Apify Store
Reddit Scraper — Posts, Comments & Subreddit Data

Reddit Scraper — Posts, Comments & Subreddit Data

Extract Reddit posts, comments, and subreddit data via public API. Scrape titles, scores, authors, comment threads, dates, and flairs. Sort by hot, new, top, or rising. Perfect for market research, sentiment analysis, and content monitoring. No login required.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Ricardo Akiyoshi

Maintained by Community

Actor stats

0

Bookmarked

20

Total users

12

Monthly active users

6 days ago

Last modified

Categories

Share

Scrape Reddit posts, comments, and subreddit data at scale using Reddit's public JSON API. Extract post titles, scores, upvote ratios, comment counts, authors, full text, flairs, and nested comment threads. No API keys, authentication, or Reddit developer account required.

Important: Proxy Recommended — Reddit rate-limits and blocks datacenter IPs aggressively. For reliable large-scale scraping, configure proxyConfiguration with residential or premium proxies ("apifyProxyGroups": ["RESIDENTIAL"]). Without proxies, requests may be throttled or return 429 errors.

What It Does

  • Subreddit scraping — Enter any subreddit name and get structured data for posts sorted by hot, new, top, or rising
  • Search within subreddits — Combine subreddit + keyword search to find specific discussions and topics
  • Comment extraction — Optionally scrape full nested comment threads with author, score, depth, and timestamps
  • Time filtering — Filter top/controversial posts by hour, day, week, month, year, or all time
  • No authentication — Uses Reddit's public JSON endpoints, so no API keys or developer accounts needed
  • Built-in rate limiting — Automatic request throttling and exponential backoff to stay within Reddit's rate limits

Input Parameters

ParameterTypeDefaultDescription
subredditstring"programming"Subreddit name to scrape (without r/)
searchQuerystringOptional keyword to search within the subreddit
sortstring"hot"Sort posts by: hot, new, top, or rising
timeFilterstring"week"Time filter for top/controversial: hour, day, week, month, year, or all
maxPostsinteger10Maximum number of posts to extract (max 1,000)
includeCommentsbooleanfalseAlso scrape comments for each post
maxCommentsPerPostinteger10Maximum comments to extract per post (max 500)
proxyConfigurationobject{"useApifyProxy": true}Proxy settings for avoiding blocks

Output Example

Each Reddit post includes the following fields:

{
"title": "I mass-applied to 200 jobs using AI. Here's what happened.",
"score": 8472,
"upvoteRatio": 0.94,
"numComments": 1523,
"author": "tech_job_seeker",
"subreddit": "programming",
"url": "https://www.reddit.com/r/programming/comments/1abcdef/i_mass_applied_to_200_jobs_using_ai_heres_what/",
"selfText": "After three months of unemployment, I decided to try something different. I used AI tools to customize my resume and cover letter for each application...",
"permalink": "/r/programming/comments/1abcdef/i_mass_applied_to_200_jobs_using_ai_heres_what/",
"fullUrl": "https://www.reddit.com/r/programming/comments/1abcdef/i_mass_applied_to_200_jobs_using_ai_heres_what/",
"createdAt": "2026-03-01T14:30:00.000Z",
"createdUtc": 1772198600,
"flair": "Discussion",
"isNSFW": false,
"isSelf": true,
"domain": "self.programming",
"postId": "1abcdef",
"scrapedAt": "2026-03-03T15:10:22.456Z"
}

When includeComments is enabled, each post also includes a comments array:

{
"title": "I mass-applied to 200 jobs using AI...",
"score": 8472,
"comments": [
{
"author": "hiring_manager_42",
"body": "As someone who reviews applications, we can tell when cover letters are AI-generated. The ones that stand out actually reference specific things about the company.",
"score": 2341,
"createdAt": "2026-03-01T15:12:00.000Z",
"createdUtc": 1772201120,
"depth": 0,
"commentId": "kl2mn3o"
},
{
"author": "tech_job_seeker",
"body": "That's fair. I actually had better response rates when I spent 5 minutes personalizing each one vs. pure AI generation. The AI just helped with the base template.",
"score": 1856,
"createdAt": "2026-03-01T15:45:00.000Z",
"createdUtc": 1772203100,
"depth": 1,
"commentId": "kl4pq5r"
}
]
}

Use Cases

  • Market research — Monitor subreddits for product feedback, brand mentions, and consumer sentiment
  • Content research — Find trending topics and popular discussions in any niche for content ideas
  • Competitor intelligence — Track what people say about competitor products in relevant subreddits
  • Academic research — Collect large datasets of Reddit discussions for NLP, sentiment analysis, or social science studies
  • Product development — Discover pain points and feature requests by scraping product-related subreddits
  • SEO and keyword research — Find common questions and terminology people use when discussing topics in your field

API Usage

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('sovereigntaylor/reddit-scraper').call({
subreddit: 'machinelearning',
sort: 'top',
timeFilter: 'month',
maxPosts: 100,
includeComments: true,
maxCommentsPerPost: 25,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('sovereigntaylor/reddit-scraper').call(run_input={
'subreddit': 'machinelearning',
'sort': 'top',
'timeFilter': 'month',
'maxPosts': 100,
'includeComments': True,
'maxCommentsPerPost': 25,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/sovereigntaylor~reddit-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"subreddit": "machinelearning",
"sort": "top",
"timeFilter": "month",
"maxPosts": 100,
"includeComments": true,
"maxCommentsPerPost": 25
}'

Pricing

This actor uses pay-per-event pricing — you only pay for data successfully scraped.

EventPrice
Post scraped$0.002
Comment scraped$0.001

Example: Scraping 100 posts + 25 comments each = (100 x $0.002) + (2,500 x $0.001) = $2.70

Limitations

  • Reddit rate-limits at approximately 60 requests per minute — the actor automatically throttles to stay under this limit
  • Private and quarantined subreddits cannot be scraped
  • Very deeply nested comment threads (depth > 10) are truncated — Reddit's API limits comment depth
  • "More comments" links are not followed — only the top comments loaded in the initial response are extracted
  • Post selfText for link posts may be empty (link posts don't have body text)
  • Reddit may return different results based on geographic location
  • Maximum 1,000 posts per run
  • Deleted and removed posts/comments show [deleted] or [removed] as author/body