This actor uses Reddit’s public JSON endpoints, smart session handling, and targeted rate limits to minimize blocks while giving you clean, normalized output ready for analytics or downstream processing.

Why this actor

Auto-detects mode from any Reddit URL (subreddit, post, user, or search)
Deep comment expansion with max depth and "more" replies handling
Global item limit to control dataset size and cost
Time range and sorting for listings and search
Optional user and subreddit metadata enrichment (about, moderators, rules)
Pluggable extendOutputFunction to enrich items without changing the code
Built-in proxy support (defaults to Apify Proxy)
Clean, consistent output across item types (CSV/JSON)
Verbose logging with an optional debug mode

What you can extract

Posts: titles, scores, ratios, permalinks, flairs, media, crosspost info, etc.
Comments: full tree traversal, depth, scores, replies (including expanded "more" nodes)
Users (optional): public profile metadata
Subreddit metadata (optional): about, moderators, rules

Inputs

startUrls: array of Reddit URLs. Mode will be auto-detected unless overridden by mode.
mode: one of auto, subreddit, post, user, search.
searchQueries: array of strings to run Reddit-wide search (used if mode is search or auto and startUrls is empty).
timeRange: one of hour, day, week, month, year, all.
sortBy: for listings and search (e.g., hot, new, top, rising, or for search: relevance, new, top, comments).
maxItems: global cap on how many items to output (posts, comments, users, rules, etc. combined).
includePosts: boolean, default true.
includeComments: boolean, default true.
maxCommentsPerPost: cap comments per post, default 50.
expandCommentReplies: boolean, default true (expand "more" nodes where possible).
maxCommentDepth: maximum depth for comments (default 5).
includeUsers: boolean, default false (queues user profiles and outputs public user metadata).
includeSubredditMeta: boolean, default true (about, moderators, rules when a subreddit appears).
proxyConfiguration: standard Apify proxy configuration; defaults to { useApifyProxy: true }.
extendOutputFunction: stringified async function to enrich each item.
debugLog: boolean; set true for verbose logs.

Example inputs

Minimal (subreddit listing):

{
  "startUrls": [{ "url": "https://www.reddit.com/r/apify/" }],
  "timeRange": "week",
  "sortBy": "top",
  "maxItems": 100,
  "maxCommentsPerPost": 25,
  "includeComments": true,
  "includePosts": true,
  "includeSubredditMeta": true,
  "proxyConfiguration": { "useApifyProxy": true }
}

Reddit-wide search:

{
    "debugLog": false,
    "expandCommentReplies": true,
    "includeComments": true,
    "includePosts": true,
    "includeSubredditMeta": true,
    "includeUsers": false,
    "maxCommentsPerPost": 15,
    "maxItems": 50,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "GB"
    },
    "startUrls": [
        {
            "url": "https://www.reddit.com/r/worldnews/",
            "method": "GET"
        }
    ]
}

Output

Items are pushed to the default dataset. Types you can expect:

type: post — fields include: id, name, title, author, subreddit, url, permalink, created_utc, upvote_ratio, score, num_comments, over_18, flair, awards, media, preview, crosspost_parent, etc.
type: comment — fields include: id, name, parent_id, link_id, permalink, author, body, score, created_utc, depth, stickied, gilded, etc.
type: user — fields include: id, name, created_utc, link_karma, comment_karma, total_karma, is_gold, is_mod, verified, icon_img, subreddit (public profile sub).
type: subreddit — fields include: id, name, title, subscribers, active_user_count, public_description, description, over18, url, created_utc, lang, quarantine.
type: subreddit_moderator — subreddit+moderator info.
type: subreddit_rule or subreddit_rules_raw — subreddit rules in normalized or raw form.

Export as JSON, JSONL, or CSV from the dataset tab in Apify.

Run in Apify Console.

Extend the output

Enrich every item without forking the code using extendOutputFunction. Provide an async function (as a string) that receives { data, request, helpers } and returns extra fields to merge:

async ({ data, request, helpers }) => {
  // Add your own logic, e.g., language detection or custom tagging
  const isNSFW = data.over_18 === true;
  return { custom_tag: isNSFW ? 'nsfw' : 'safe' };
}

How it works under the hood

Uses Reddit’s public JSON endpoints and normalizes responses.
Auto-detects mode from URL, or constructs listing/search/user endpoints directly.
Employs a session pool with randomized headers and device IDs to reduce blocks.
Warms up each session and respects moderate RPM and concurrency.
Expands comment trees including "more" nodes (configurable depth and limits).

Safety, compliance, and care when using this actor

Respect Reddit’s Terms of Service and robots directives. Ensure your use case is allowed in your jurisdiction and by Reddit’s policies.
Rate limiting and access: Although this actor uses conservative defaults (e.g., requests per minute and concurrency), Reddit can still block with 403/429. If that happens, reduce maxRequestsPerMinute/maxConcurrency in the code or run fewer concurrent tasks.
Proxies: Use reliable proxies for higher volumes. The actor defaults to Apify Proxy; configure residential/geolocation as needed for your use case.
Sensitive content: The actor sets a cookie over18=1 to avoid age gates in some endpoints. Be mindful of NSFW content and handle it responsibly.
Personal data: Public user metadata can still be sensitive. Avoid building profiles or making decisions that might infringe on privacy or local regulations.
Legal and ethical use: Do not circumvent technical protection measures. Do not scrape private data or content requiring authentication.
Load management: Large comment trees and subreddit meta expansion can generate very big datasets. Use maxItems, maxCommentsPerPost, and maxCommentDepth to keep runs predictable.
Stability: Endpoints and response formats can change without notice. Pin versions and monitor runs with debugLog for troubleshooting.

Troubleshooting

Many 403/429 responses: Lower request rate, switch to residential proxies, or retry later. Ensure headers and sessions are not reused too aggressively.
Empty or partial results: Check that your URLs are valid and the target subreddit/post exists and is public. Try different timeRange/sortBy for listings.
Duplicates or limits hit early: Remember that maxItems is global across all item types produced during the run.
Need raw data: Some endpoints (e.g., rules) may be pushed as subreddit_rules_raw if normalization isn't possible.

Build powerful datasets without the hassle of brittle HTML scraping — and do it responsibly.

On this page

Reddit Advanced Scraper (Apify Actor)

Share Actor:

Scraper

code_crafter/scraper

Code Pioneer

9.3K

2.0

Reddit scraper

curious_coder/reddit-scraper

Scrape reddit posts and comments from reddit search and communities

Curious Coder

306

Reddit API Scraper

comchat/reddit-api-scraper

Reddit Scraper is a powerful tool that allows you to extract data from Reddit such as posts by keyword. With Reddit Scraper, you can easily gather valuable information from Reddit without the need to log in. You can easily use this Reddit scraper as an alternative API.