Reddit Scraper - Posts, Comments, Communities & Users
Pricing
from $5.00 / 1,000 results
Reddit Scraper - Posts, Comments, Communities & Users
Scrape Reddit posts, comments, subreddits, and user profiles by URL or keyword search. No login required. Full comment trees, NSFW + date filters, pay only for what you scrape ($0.005 per result).
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Anas Nadeem
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Reddit Scraper — Posts, Comments, Communities & Users
Scrape Reddit at scale — posts, comments, communities (subreddits), and user profiles. Works by direct URL or keyword search, supports nested comment trees, NSFW + date filters, and global item caps. No Reddit account or API key needed.
What does Reddit Scraper do?
This actor pulls structured data from Reddit's public JSON API. Drop in any Reddit URL — a subreddit, post, user profile, or search results — and it returns clean rows ready for analytics, monitoring, or LLM ingestion. You can also run a keyword search across posts, comments, communities, and users.
It runs on a lightweight HTTP path (no browser), so it's fast and cheap. Comment trees are walked depth-first and more stubs are expanded against /api/morechildren automatically.
Key Features
- Multiple input modes — Start URLs, keyword search, or leaderboard fallback (popular subreddits)
- Mixed inputs in one run — Combine subreddit URLs, post URLs, and user profiles freely
- Full comment trees — Walks nested replies and expands collapsed branches via
/api/morechildren - 4 result categories — Posts (
t3), comments (t1), communities (t5), and users (t2) - Granular limits — Per-category caps (
maxPostCount,maxComments,maxCommunitiesCount,maxUserCount) plus a globalmaxItemsceiling - Date and NSFW filters —
postDateLimit,commentDateLimit,includeNSFW - Skip toggles —
skipComments,skipUserPosts,skipCommunityfor narrower runs - Apify residential proxy — Recommended for production; defaults are pre-wired
Input Modes
The actor picks one of three modes based on what you provide:
- Start URLs (preferred) — When
startUrlsis non-empty, every other input mode is ignored. - Search — When
startUrlsis empty butsearcheshas at least one query. - Leaderboard — When neither is set, the actor falls back to scraping
r/popular's top communities.
Supported URL Shapes
| URL pattern | What gets scraped |
|---|---|
reddit.com/r/<sub>/ | Subreddit posts (sort/time honored), optional community-about, optional comments per post |
reddit.com/r/<sub>/comments/<id>/ | Single post + its comment tree |
reddit.com/user/<name>/ | User profile + their submitted posts + their comment history |
reddit.com/search?q=... | Keyword search (post / comment / sr / user, depending on flags) |
reddit.com/r/<sub>/search?q=... | Search restricted to one subreddit |
old.reddit.com and www.reddit.com are both accepted; URLs are normalized internally.
Output Data
Every dataset row carries a dataType discriminator so you can split them downstream.
Post (dataType: "post")
| Field | Type | Description |
|---|---|---|
id | string | Reddit fullname (t3_xxx) |
parsedId | string | Base-36 id without prefix |
url | string | Permalink to the post (or external URL for link posts) |
username | string | Author |
title | string | Post title |
communityName | string | r/<subreddit> |
parsedCommunityName | string | Subreddit name without r/ prefix |
body | string | Self-text (or external URL for link posts) |
html | string | Rendered HTML for self-text |
numberOfComments | number | num_comments from Reddit |
upVotes | number | Score |
authorFlair | string | null | Author flair text |
isVideo | boolean | True for video posts |
isAd | boolean | True for promoted/ad posts |
over18 | boolean | NSFW flag |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Comment (dataType: "comment")
| Field | Type | Description |
|---|---|---|
id | string | t1_xxx |
parsedId | string | Base-36 id |
url | string | Permalink to the comment |
parentId | string | Parent fullname (t3_* for top-level, t1_* for replies) |
username | string | Author |
authorFlair | string | null | Flair text |
category | string | Subreddit name |
communityName | string | r/<subreddit> |
body | string | Comment text (markdown) |
html | string | Rendered HTML |
upVotes | number | Score |
numberOfReplies | number | Recursive count of t1 replies underneath |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Community (dataType: "community")
| Field | Type | Description |
|---|---|---|
id | string | t5_xxx |
name | string | Display name (no r/ prefix) |
title | string | Long-form community title |
headerImage | string | Banner / header image URL |
description | string | Public description |
over18 | boolean | NSFW community flag |
numberOfMembers | number | Subscribers |
url | string | Absolute permalink |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
User (dataType: "user")
| Field | Type | Description |
|---|---|---|
id | string | t2_xxx |
url | string | Profile permalink |
username | string | Reddit handle |
userIcon | string | Avatar URL |
postKarma | number | Link karma |
commentKarma | number | Comment karma |
description | string | Profile description |
over18 | boolean | NSFW profile flag |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Sample Output
{"dataType": "post","id": "t3_1t16uqd","parsedId": "1t16uqd","url": "https://www.reddit.com/r/AskReddit/comments/1t16uqd/...","username": "IIlustriousTea","title": "US birth rates just hit another record low...","communityName": "r/AskReddit","parsedCommunityName": "AskReddit","body": "","html": "","numberOfComments": 8892,"upVotes": 7657,"authorFlair": null,"isVideo": false,"isAd": false,"over18": false,"createdAt": "2026-05-01T21:40:45.000Z","scrapedAt": "2026-05-02T05:53:19.442Z"}
Input Parameters
Direct URLs
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array | [] | Reddit URLs to scrape. Mix any of: subreddit, post, user, or search URLs. |
ignoreStartUrls | boolean | false | Force-bypass the URLs field (helpful for tools like Zapier). |
Search
| Parameter | Type | Default | Description |
|---|---|---|---|
searches | string[] | [] | Keywords to search. Used only when startUrls is empty. |
searchCommunityName | string | "" | Restrict every search to one subreddit. |
searchPosts | boolean | true | Include posts in search results. |
searchComments | boolean | false | Include comments (best-effort — Reddit's comment search returns parent posts). |
searchCommunities | boolean | false | Include matching communities. |
searchUsers | boolean | false | Include matching user profiles. |
sort | enum | new | relevance / hot / top / new / rising / comments. |
time | enum | "" | all / hour / day / week / month / year. Most useful with sort=top. |
Filters
| Parameter | Type | Default | Description |
|---|---|---|---|
includeNSFW | boolean | true | Include adult-rated posts and subreddits. |
skipComments | boolean | false | Don't scrape comments when going through posts. |
skipUserPosts | boolean | false | Don't scrape a user's submitted posts when going through their profile. |
skipCommunity | boolean | false | Don't push community metadata when going through a subreddit. |
postDateLimit | ISO date | — | Only keep posts created after this date. |
commentDateLimit | ISO date | — | Only keep comments created after this date. |
Limits
| Parameter | Type | Default | Description |
|---|---|---|---|
maxItems | integer | 10 | Hard global cap on dataset rows across all categories. |
maxPostCount | integer | 10 | Per-listing cap on posts. |
maxComments | integer | 10 | Per-post cap on comments (or global cap on comment-search/user-comments). |
maxCommunitiesCount | integer | 2 | Cap on communities returned from search or leaderboard. |
maxUserCount | integer | 2 | Cap on user profiles returned from search. |
Advanced
| Parameter | Type | Default | Description |
|---|---|---|---|
proxy | object | Apify Residential | Apify proxy or your own proxy URLs. Residential is strongly recommended. |
debugMode | boolean | false | Verbose Crawlee logging. |
How It Works
The actor sends authenticated-style HTTP requests to reddit.com/*.json using a descriptive non-browser User-Agent — Reddit's anonymous JSON endpoints reject Chrome-like UAs without browser cookies, so we explicitly disable Crawlee's automatic browser-fingerprint header injection. This keeps unauthenticated rate limits at their generous default (~100 requests/min) instead of falling back to the strict ~10/min anti-bot tier.
Comment trees are walked depth-first up to maxComments. Collapsed more stubs are expanded by POSTing to /api/morechildren.json in batches of 100 children — no extra request per comment.
The crawler aborts as soon as maxItems is hit, so over-runs are not a concern even with deep trees.
Pricing
This actor uses pay-per-event pricing:
| Event | Price |
|---|---|
| Actor start | $0.00005 |
| Result extracted (per dataset row) | $0.005 |
You only pay for what you scrape. Apify platform compute and proxy usage are billed separately based on your plan.
Limitations
- Comment search returns parent posts only (Reddit's API behavior); the actor enqueues those posts so their comment trees are still scraped. Treat it as best-effort.
- Removed/deleted posts return a 404 envelope; they're logged and skipped without retry.
- Login-walled content (private subreddits, NSFW-locked content for unauth) is not accessible via the JSON API and is silently skipped.