
Reddit Scraper Plus
Pricing
$30.00/month + usage

Reddit Scraper Plus
0.0 (0)
Pricing
$30.00/month + usage
0
2
2
Last modified
8 hours ago
Reddit Advanced Scraper (Apify Actor)
Pull posts, comments, users, and subreddit metadata from Reddit at scale — fast, resilient, and configurable.
This actor uses Reddit’s public JSON endpoints, smart session handling, and targeted rate limits to minimize blocks while giving you clean, normalized output ready for analytics or downstream processing.
Why this actor
- Auto-detects mode from any Reddit URL (subreddit, post, user, or search)
- Deep comment expansion with max depth and "more" replies handling
- Global item limit to control dataset size and cost
- Time range and sorting for listings and search
- Optional user and subreddit metadata enrichment (about, moderators, rules)
- Pluggable extendOutputFunction to enrich items without changing the code
- Built-in proxy support (defaults to Apify Proxy)
- Clean, consistent output across item types (CSV/JSON)
- Verbose logging with an optional debug mode
What you can extract
- Posts: titles, scores, ratios, permalinks, flairs, media, crosspost info, etc.
- Comments: full tree traversal, depth, scores, replies (including expanded "more" nodes)
- Users (optional): public profile metadata
- Subreddit metadata (optional): about, moderators, rules
Inputs
- startUrls: array of Reddit URLs. Mode will be auto-detected unless overridden by
mode
. - mode: one of
auto
,subreddit
,post
,user
,search
. - searchQueries: array of strings to run Reddit-wide search (used if
mode
issearch
orauto
andstartUrls
is empty). - timeRange: one of
hour
,day
,week
,month
,year
,all
. - sortBy: for listings and search (e.g.,
hot
,new
,top
,rising
, or for search:relevance
,new
,top
,comments
). - maxItems: global cap on how many items to output (posts, comments, users, rules, etc. combined).
- includePosts: boolean, default true.
- includeComments: boolean, default true.
- maxCommentsPerPost: cap comments per post, default 50.
- expandCommentReplies: boolean, default true (expand "more" nodes where possible).
- maxCommentDepth: maximum depth for comments (default 5).
- includeUsers: boolean, default false (queues user profiles and outputs public user metadata).
- includeSubredditMeta: boolean, default true (about, moderators, rules when a subreddit appears).
- proxyConfiguration: standard Apify proxy configuration; defaults to
{ useApifyProxy: true }
. - extendOutputFunction: stringified async function to enrich each item.
- debugLog: boolean; set true for verbose logs.
Example inputs
Minimal (subreddit listing):
{"startUrls": [{ "url": "https://www.reddit.com/r/apify/" }],"timeRange": "week","sortBy": "top","maxItems": 100,"maxCommentsPerPost": 25,"includeComments": true,"includePosts": true,"includeSubredditMeta": true,"proxyConfiguration": { "useApifyProxy": true }}
Reddit-wide search:
{"debugLog": false,"expandCommentReplies": true,"includeComments": true,"includePosts": true,"includeSubredditMeta": true,"includeUsers": false,"maxCommentsPerPost": 15,"maxItems": 50,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "GB"},"startUrls": [{"url": "https://www.reddit.com/r/worldnews/","method": "GET"}]}
Output
Items are pushed to the default dataset. Types you can expect:
- type:
post
— fields include: id, name, title, author, subreddit, url, permalink, created_utc, upvote_ratio, score, num_comments, over_18, flair, awards, media, preview, crosspost_parent, etc. - type:
comment
— fields include: id, name, parent_id, link_id, permalink, author, body, score, created_utc, depth, stickied, gilded, etc. - type:
user
— fields include: id, name, created_utc, link_karma, comment_karma, total_karma, is_gold, is_mod, verified, icon_img, subreddit (public profile sub). - type:
subreddit
— fields include: id, name, title, subscribers, active_user_count, public_description, description, over18, url, created_utc, lang, quarantine. - type:
subreddit_moderator
— subreddit+moderator info. - type:
subreddit_rule
orsubreddit_rules_raw
— subreddit rules in normalized or raw form.
Export as JSON, JSONL, or CSV from the dataset tab in Apify.
- Run in Apify Console.
Extend the output
Enrich every item without forking the code using extendOutputFunction
. Provide an async function (as a string) that receives { data, request, helpers }
and returns extra fields to merge:
async ({ data, request, helpers }) => {// Add your own logic, e.g., language detection or custom taggingconst isNSFW = data.over_18 === true;return { custom_tag: isNSFW ? 'nsfw' : 'safe' };}
How it works under the hood
- Uses Reddit’s public JSON endpoints and normalizes responses.
- Auto-detects mode from URL, or constructs listing/search/user endpoints directly.
- Employs a session pool with randomized headers and device IDs to reduce blocks.
- Warms up each session and respects moderate RPM and concurrency.
- Expands comment trees including "more" nodes (configurable depth and limits).
Safety, compliance, and care when using this actor
- Respect Reddit’s Terms of Service and robots directives. Ensure your use case is allowed in your jurisdiction and by Reddit’s policies.
- Rate limiting and access: Although this actor uses conservative defaults (e.g., requests per minute and concurrency), Reddit can still block with 403/429. If that happens, reduce
maxRequestsPerMinute
/maxConcurrency
in the code or run fewer concurrent tasks. - Proxies: Use reliable proxies for higher volumes. The actor defaults to Apify Proxy; configure residential/geolocation as needed for your use case.
- Sensitive content: The actor sets a cookie
over18=1
to avoid age gates in some endpoints. Be mindful of NSFW content and handle it responsibly. - Personal data: Public user metadata can still be sensitive. Avoid building profiles or making decisions that might infringe on privacy or local regulations.
- Legal and ethical use: Do not circumvent technical protection measures. Do not scrape private data or content requiring authentication.
- Load management: Large comment trees and subreddit meta expansion can generate very big datasets. Use
maxItems
,maxCommentsPerPost
, andmaxCommentDepth
to keep runs predictable. - Stability: Endpoints and response formats can change without notice. Pin versions and monitor runs with
debugLog
for troubleshooting.
Troubleshooting
- Many 403/429 responses: Lower request rate, switch to residential proxies, or retry later. Ensure headers and sessions are not reused too aggressively.
- Empty or partial results: Check that your URLs are valid and the target subreddit/post exists and is public. Try different
timeRange
/sortBy
for listings. - Duplicates or limits hit early: Remember that
maxItems
is global across all item types produced during the run. - Need raw data: Some endpoints (e.g., rules) may be pushed as
subreddit_rules_raw
if normalization isn't possible.
Build powerful datasets without the hassle of brittle HTML scraping — and do it responsibly.