Reddit Scraper

Try for free

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex Claw

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Features

No API key needed -- uses Reddit's public JSON API
Multiple sort options -- hot, new, top (with time filters), rising
Post metadata -- title, author, score, upvote ratio, flair, awards, NSFW/spoiler flags
Comments -- optionally fetch comment trees with depth, scores, and reply counts
Pagination -- automatically pages through results up to your specified limit
Multi-subreddit -- scrape multiple subreddits in a single run
Rate-limit handling -- built-in delays and exponential backoff on 429 responses
Proxy support -- optional proxy configuration for large-scale scraping

Input

Parameter	Type	Default	Description
`subreddits`	array	required	Subreddit names (e.g., `["python", "programming"]`). No `r/` prefix needed.
`maxPostsPerSubreddit`	integer	100	Max posts to scrape per subreddit (1-5000)
`sort`	string	`"hot"`	Sort order: `hot`, `new`, `top`, `rising`
`topTimeFilter`	string	`"day"`	Time filter for `top` sort: `hour`, `day`, `week`, `month`, `year`, `all`
`includeComments`	boolean	false	Fetch comments for each post (increases run time)
`maxCommentsPerPost`	integer	50	Max comments per post (1-500, only used when `includeComments` is true)
`proxyConfiguration`	object	none	Proxy settings for requests

Example Input

{
    "subreddits": ["python", "programming", "learnpython"],
    "maxPostsPerSubreddit": 50,
    "sort": "top",
    "topTimeFilter": "week",
    "includeComments": true,
    "maxCommentsPerPost": 20
}

Output

Each post is saved as a dataset item:

{
    "subreddit": "python",
    "postId": "abc123",
    "title": "What's the best Python web framework in 2026?",
    "author": "pythonista42",
    "score": 1234,
    "upvoteRatio": 0.95,
    "numComments": 45,
    "createdUtc": "2026-02-24T10:00:00+00:00",
    "selfText": "I've been comparing Django, FastAPI, and...",
    "url": "https://www.reddit.com/r/python/comments/abc123/...",
    "permalink": "https://www.reddit.com/r/python/comments/abc123/...",
    "isVideo": false,
    "thumbnail": "self",
    "flair": "Discussion",
    "awards": 3,
    "postUrl": "https://www.reddit.com/r/python/comments/abc123/...",
    "domain": "self.python",
    "isNsfw": false,
    "isSpoiler": false,
    "isStickied": false,
    "comments": [
        {
            "commentId": "xyz789",
            "author": "webdev99",
            "body": "FastAPI for APIs, Django for full-stack...",
            "score": 567,
            "createdUtc": "2026-02-24T10:30:00+00:00",
            "depth": 0,
            "repliesCount": 12,
            "isStickied": false,
            "awards": 1
        }
    ]
}

When includeComments is false, the comments field is omitted.

How It Works

This actor uses Reddit's public JSON API, which is available by appending .json to any Reddit URL:

Subreddit listings: https://www.reddit.com/r/{subreddit}/{sort}.json
Post comments: https://www.reddit.com/r/{subreddit}/comments/{post_id}.json

No authentication is required. The actor uses a descriptive User-Agent header as recommended by Reddit's API guidelines.

Use Cases

Market research -- monitor discussions about your product, competitors, or industry
Content analysis -- find trending topics, popular content formats, engagement patterns
Sentiment analysis -- collect posts and comments for NLP/sentiment pipelines
Lead generation -- find users asking questions your product solves
Academic research -- collect public discourse data for analysis
SEO research -- discover what topics generate high engagement in your niche

Pricing

Pay per result: $2.00 per 1,000 posts scraped (comments included at no extra cost).

Important: Proxy Required

Reddit aggressively blocks datacenter IPs. Residential proxy is recommended for reliable scraping. Configure proxy in the input:

{
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

Limitations

Proxy recommended — Reddit blocks most datacenter IPs with 403 errors
Only works with public subreddits (private/quarantined subreddits are not accessible)
Reddit's pagination caps at approximately 1,000 posts per listing
Rate limiting: the actor respects Reddit's rate limits with built-in delays
Some posts/comments from deleted or suspended users may show [deleted]