Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV)
Pricing
from $5.00 / 1,000 results
Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV)
Scrape any subreddit, post, comment thread, or keyword search — no Reddit API key, no PRAW 1K cap, no Pushshift mod-gate. Bulk-export posts + nested comments to JSON/CSV for lead gen, RAG, monitoring.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Anas Nadeem
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 days ago
Last modified
Categories
Share
Reddit Scraper — Posts, Comments & Search (No API Key, JSON/CSV)
Built for lead-gen scrapes, RAG pipelines, and brand-monitoring jobs that the locked-down Reddit API can't handle. No PRAW 1K cap. No Pushshift mod-gate. Pay only for what you extract.
Scrape Reddit at scale — posts, comments, communities (subreddits), and user profiles. Works by direct URL or keyword search, supports nested comment trees, NSFW + date filters, and global item caps. No Reddit account or API key needed.
Why this scraper, not PRAW? PRAW caps you at ~1K posts per query and dies on Reddit's anonymous rate limits. Pushshift is mod-only since 2025. This actor walks every comment tree depth-first, expands collapsed more stubs in batched 100-children calls, and uses non-browser HTTP requests that keep you on the generous ~100 req/min tier.
What does Reddit Scraper do?
This actor pulls structured data from Reddit's public JSON API. Drop in any Reddit URL — a subreddit, post, user profile, or search results — and it returns clean rows ready for analytics, monitoring, or LLM ingestion. You can also run a keyword search across posts, comments, communities, and users.
It runs on a lightweight HTTP path (no browser), so it's fast and cheap. Comment trees are walked depth-first and more stubs are expanded against /api/morechildren automatically.
Key Features
- Multiple input modes — Start URLs, keyword search, or leaderboard fallback (popular subreddits)
- Mixed inputs in one run — Combine subreddit URLs, post URLs, and user profiles freely
- Full comment trees — Walks nested replies and expands collapsed branches via
/api/morechildren - 4 result categories — Posts (
t3), comments (t1), communities (t5), and users (t2) - Granular limits — Per-category caps (
maxPostCount,maxComments,maxCommunitiesCount,maxUserCount) plus a globalmaxItemsceiling - Date and NSFW filters —
postDateLimit,commentDateLimit,includeNSFW - Skip toggles —
skipComments,skipUserPosts,skipCommunityfor narrower runs - Apify residential proxy — Recommended for production; defaults are pre-wired
Who is this Reddit scraper for?
- SaaS founders monitoring brand mentions — Track every mention of your product, competitors, and target keywords across 20+ subreddits. Replace $50/mo Reddit lead-gen SaaS tools with a one-off scrape costing cents (r/SaaS).
- GTM engineers building lead-gen pipelines — Use as the Reddit data primitive in Clay, n8n, or Make workflows. Bypasses the PRAW rate limits and 1K-post caps that break automated pipelines (r/gtmengineering).
- Researchers blocked by ghosted API applications — Reddit API applications go unanswered for weeks. This actor uses Reddit's public JSON endpoints — no developer app approval, no mod-gate, no Pushshift dependency (r/redditdev).
- RAG / LLM teams needing nested comment trees — Flattened comment lists lose reply context. Every comment output row includes a
parentIdfield (t3_*for top-level,t1_*for replies), preserving thread shape for vector-store ingestion (r/apify). - Growth marketers and trend miners — Scrape entire subreddits for competitor tracking, AI-tool trend analysis, or dropshipping niche discovery. A 25K-comment scrape costs ~$125; the insight compounds (r/Entrepreneur).
Input Modes
The actor picks one of three modes based on what you provide:
- Start URLs (preferred) — When
startUrlsis non-empty, every other input mode is ignored. - Search — When
startUrlsis empty butsearcheshas at least one query. - Leaderboard — When neither is set, the actor falls back to scraping
r/popular's top communities.
Supported URL Shapes
| URL pattern | What gets scraped |
|---|---|
reddit.com/r/<sub>/ | Subreddit posts (sort/time honored), optional community-about, optional comments per post |
reddit.com/r/<sub>/comments/<id>/ | Single post + its comment tree |
reddit.com/user/<name>/ | User profile + their submitted posts + their comment history |
reddit.com/search?q=... | Keyword search (post / comment / sr / user, depending on flags) |
reddit.com/r/<sub>/search?q=... | Search restricted to one subreddit |
old.reddit.com and www.reddit.com are both accepted; URLs are normalized internally.
Output Data
Every dataset row carries a dataType discriminator so you can split them downstream.
Post (dataType: "post")
| Field | Type | Description |
|---|---|---|
id | string | Reddit fullname (t3_xxx) |
parsedId | string | Base-36 id without prefix |
url | string | Permalink to the post (or external URL for link posts) |
username | string | Author |
title | string | Post title |
communityName | string | r/<subreddit> |
parsedCommunityName | string | Subreddit name without r/ prefix |
body | string | Self-text (or external URL for link posts) |
html | string | Rendered HTML for self-text |
numberOfComments | number | num_comments from Reddit |
upVotes | number | Score |
authorFlair | string | null | Author flair text |
isVideo | boolean | True for video posts |
isAd | boolean | True for promoted/ad posts |
over18 | boolean | NSFW flag |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Comment (dataType: "comment")
| Field | Type | Description |
|---|---|---|
id | string | t1_xxx |
parsedId | string | Base-36 id |
url | string | Permalink to the comment |
parentId | string | Parent fullname (t3_* for top-level, t1_* for replies) |
username | string | Author |
authorFlair | string | null | Flair text |
category | string | Subreddit name |
communityName | string | r/<subreddit> |
body | string | Comment text (markdown) |
html | string | Rendered HTML |
upVotes | number | Score |
numberOfReplies | number | Recursive count of t1 replies underneath |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Community (dataType: "community")
| Field | Type | Description |
|---|---|---|
id | string | t5_xxx |
name | string | Display name (no r/ prefix) |
title | string | Long-form community title |
headerImage | string | Banner / header image URL |
description | string | Public description |
over18 | boolean | NSFW community flag |
numberOfMembers | number | Subscribers |
url | string | Absolute permalink |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
User (dataType: "user")
| Field | Type | Description |
|---|---|---|
id | string | t2_xxx |
url | string | Profile permalink |
username | string | Reddit handle |
userIcon | string | Avatar URL |
postKarma | number | Link karma |
commentKarma | number | Comment karma |
description | string | Profile description |
over18 | boolean | NSFW profile flag |
createdAt | string | ISO 8601 |
scrapedAt | string | ISO 8601 |
Sample Output
{"dataType": "post","id": "t3_1t16uqd","parsedId": "1t16uqd","url": "https://www.reddit.com/r/AskReddit/comments/1t16uqd/...","username": "IIlustriousTea","title": "US birth rates just hit another record low...","communityName": "r/AskReddit","parsedCommunityName": "AskReddit","body": "","html": "","numberOfComments": 8892,"upVotes": 7657,"authorFlair": null,"isVideo": false,"isAd": false,"over18": false,"createdAt": "2026-05-01T21:40:45.000Z","scrapedAt": "2026-05-02T05:53:19.442Z"}
Input Parameters
Direct URLs
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array | [] | Reddit URLs to scrape. Mix any of: subreddit, post, user, or search URLs. |
ignoreStartUrls | boolean | false | Force-bypass the URLs field (helpful for tools like Zapier). |
Search
| Parameter | Type | Default | Description |
|---|---|---|---|
searches | string[] | [] | Keywords to search. Used only when startUrls is empty. |
searchCommunityName | string | "" | Restrict every search to one subreddit. |
searchPosts | boolean | true | Include posts in search results. |
searchComments | boolean | false | Include comments (best-effort — Reddit's comment search returns parent posts). |
searchCommunities | boolean | false | Include matching communities. |
searchUsers | boolean | false | Include matching user profiles. |
sort | enum | new | relevance / hot / top / new / rising / comments. |
time | enum | "" | all / hour / day / week / month / year. Most useful with sort=top. |
Filters
| Parameter | Type | Default | Description |
|---|---|---|---|
includeNSFW | boolean | true | Include adult-rated posts and subreddits. |
skipComments | boolean | false | Don't scrape comments when going through posts. |
skipUserPosts | boolean | false | Don't scrape a user's submitted posts when going through their profile. |
skipCommunity | boolean | false | Don't push community metadata when going through a subreddit. |
postDateLimit | ISO date | — | Only keep posts created after this date. |
commentDateLimit | ISO date | — | Only keep comments created after this date. |
Limits
| Parameter | Type | Default | Description |
|---|---|---|---|
maxItems | integer | 10 | Hard global cap on dataset rows across all categories. |
maxPostCount | integer | 10 | Per-listing cap on posts. |
maxComments | integer | 10 | Per-post cap on comments (or global cap on comment-search/user-comments). |
maxCommunitiesCount | integer | 2 | Cap on communities returned from search or leaderboard. |
maxUserCount | integer | 2 | Cap on user profiles returned from search. |
Advanced
| Parameter | Type | Default | Description |
|---|---|---|---|
proxy | object | Apify Residential | Apify proxy or your own proxy URLs. Residential is strongly recommended. |
debugMode | boolean | false | Verbose Crawlee logging. |
How It Works
The actor sends authenticated-style HTTP requests to reddit.com/*.json using a descriptive non-browser User-Agent — Reddit's anonymous JSON endpoints reject Chrome-like UAs without browser cookies, so we explicitly disable Crawlee's automatic browser-fingerprint header injection. This keeps unauthenticated rate limits at their generous default (~100 requests/min) instead of falling back to the strict ~10/min anti-bot tier.
Comment trees are walked depth-first up to maxComments. Collapsed more stubs are expanded by POSTing to /api/morechildren.json in batches of 100 children — no extra request per comment.
The crawler aborts as soon as maxItems is hit, so over-runs are not a concern even with deep trees.
Pricing
This actor uses pay-per-event pricing:
| Event | Price |
|---|---|
| Actor start | $0.00005 |
| Result extracted (per dataset row) | $0.005 |
You only pay for what you scrape. Apify platform compute and proxy usage are billed separately based on your plan.
Integrations & Code Examples
bash (curl)
curl -X POST \"https://api.apify.com/v2/acts/whoareyouanas~reddit-scraper/run-sync-get-dataset-items" \-H "Authorization: Bearer YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrls":[{"url":"https://www.reddit.com/r/SaaS/"}],"maxItems":100}'
Python SDK
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("whoareyouanas/reddit-scraper").call(run_input={"startUrls": [{"url": "https://www.reddit.com/r/SaaS/"}],"maxItems": 100,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Node.js SDK
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('whoareyouanas/reddit-scraper').call({startUrls: [{ url: 'https://www.reddit.com/r/SaaS/' }],maxItems: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
n8n / Make.com / Zapier
Use the Apify trigger/module in n8n or Make.com and point to actor whoareyouanas/reddit-scraper. The structured JSON output (posts, comments, users with parentId reply chains) lands directly in your workflow — pipe into Airtable, Google Sheets, Slack, or an AI summarisation step. See Late to the Reddit API party? Here's the backdoor for an n8n workflow example. Full integration docs: Apify integrations.
Frequently Asked Questions (FAQ)
Do I need a Reddit API key or developer account?
No. This actor uses Reddit's public reddit.com/*.json endpoints — the same data your browser reads. No API key, no app approval, no mod-gate.
Will I hit the PRAW 1K post cap?
No. The JSON endpoint paginates beyond 1K via after cursors. The actor follows all pages up to your maxItems or maxPostCount limit.
How is this different from Pushshift / PMAW / PullPush? Pushshift is now mod-only and its shards die mid-run. This actor uses Reddit's own live data — what you see on reddit.com is what you get.
How does this avoid 403 blocks? The actor sends non-browser HTTP requests with a non-Chrome User-Agent. Reddit's anonymous JSON endpoints serve this tier at ~100 req/min. Apify residential proxy (default) handles IP rotation for high-volume runs.
Will Reddit ban my account? No account is used. The actor runs completely unauthenticated against public endpoints — there's nothing to ban.
How do I scrape comment threads with reply context?
Every comment row includes a parentId field: t3_* means it's a top-level reply to the post; t1_* means it replies to another comment. The depth-first walk preserves the full tree shape.
How much does it cost to scrape 100K results? ~$500 at $0.005/result + Apify compute charges. Compare to $50/mo Reddit lead-gen SaaS tools or Bright Data's $500+/mo enterprise contracts.
Can I integrate with n8n / Zapier / Make? Yes — full webhook + native integrations support. See the Code Examples section above.
Limitations
- Comment search returns parent posts only (Reddit's API behavior); the actor enqueues those posts so their comment trees are still scraped. Treat it as best-effort.
- Removed/deleted posts return a 404 envelope; they're logged and skipped without retry.
- Login-walled content (private subreddits, NSFW-locked content for unauth) is not accessible via the JSON API and is silently skipped.
Was this scraper useful?
If this scraper saved you time on Reddit research, lead gen, or data collection, please leave a review on the Apify Store. Reviews are the single biggest visibility lever — and they help other buyers find this actor over less capable alternatives.