Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV) avatar

Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV)

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV)

Reddit Scraper - Posts, Comments & Search(No API Key, JSON/CSV)

Scrape any subreddit, post, comment thread, or keyword search — no Reddit API key, no PRAW 1K cap, no Pushshift mod-gate. Bulk-export posts + nested comments to JSON/CSV for lead gen, RAG, monitoring.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Anas Nadeem

Anas Nadeem

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 days ago

Last modified

Share

Reddit Scraper — Posts, Comments & Search (No API Key, JSON/CSV)

Built for lead-gen scrapes, RAG pipelines, and brand-monitoring jobs that the locked-down Reddit API can't handle. No PRAW 1K cap. No Pushshift mod-gate. Pay only for what you extract.

Scrape Reddit at scale — posts, comments, communities (subreddits), and user profiles. Works by direct URL or keyword search, supports nested comment trees, NSFW + date filters, and global item caps. No Reddit account or API key needed.

Why this scraper, not PRAW? PRAW caps you at ~1K posts per query and dies on Reddit's anonymous rate limits. Pushshift is mod-only since 2025. This actor walks every comment tree depth-first, expands collapsed more stubs in batched 100-children calls, and uses non-browser HTTP requests that keep you on the generous ~100 req/min tier.

What does Reddit Scraper do?

This actor pulls structured data from Reddit's public JSON API. Drop in any Reddit URL — a subreddit, post, user profile, or search results — and it returns clean rows ready for analytics, monitoring, or LLM ingestion. You can also run a keyword search across posts, comments, communities, and users.

It runs on a lightweight HTTP path (no browser), so it's fast and cheap. Comment trees are walked depth-first and more stubs are expanded against /api/morechildren automatically.

Key Features

  • Multiple input modes — Start URLs, keyword search, or leaderboard fallback (popular subreddits)
  • Mixed inputs in one run — Combine subreddit URLs, post URLs, and user profiles freely
  • Full comment trees — Walks nested replies and expands collapsed branches via /api/morechildren
  • 4 result categories — Posts (t3), comments (t1), communities (t5), and users (t2)
  • Granular limits — Per-category caps (maxPostCount, maxComments, maxCommunitiesCount, maxUserCount) plus a global maxItems ceiling
  • Date and NSFW filterspostDateLimit, commentDateLimit, includeNSFW
  • Skip togglesskipComments, skipUserPosts, skipCommunity for narrower runs
  • Apify residential proxy — Recommended for production; defaults are pre-wired

Who is this Reddit scraper for?

  • SaaS founders monitoring brand mentions — Track every mention of your product, competitors, and target keywords across 20+ subreddits. Replace $50/mo Reddit lead-gen SaaS tools with a one-off scrape costing cents (r/SaaS).
  • GTM engineers building lead-gen pipelines — Use as the Reddit data primitive in Clay, n8n, or Make workflows. Bypasses the PRAW rate limits and 1K-post caps that break automated pipelines (r/gtmengineering).
  • Researchers blocked by ghosted API applications — Reddit API applications go unanswered for weeks. This actor uses Reddit's public JSON endpoints — no developer app approval, no mod-gate, no Pushshift dependency (r/redditdev).
  • RAG / LLM teams needing nested comment trees — Flattened comment lists lose reply context. Every comment output row includes a parentId field (t3_* for top-level, t1_* for replies), preserving thread shape for vector-store ingestion (r/apify).
  • Growth marketers and trend miners — Scrape entire subreddits for competitor tracking, AI-tool trend analysis, or dropshipping niche discovery. A 25K-comment scrape costs ~$125; the insight compounds (r/Entrepreneur).

Input Modes

The actor picks one of three modes based on what you provide:

  1. Start URLs (preferred) — When startUrls is non-empty, every other input mode is ignored.
  2. Search — When startUrls is empty but searches has at least one query.
  3. Leaderboard — When neither is set, the actor falls back to scraping r/popular's top communities.

Supported URL Shapes

URL patternWhat gets scraped
reddit.com/r/<sub>/Subreddit posts (sort/time honored), optional community-about, optional comments per post
reddit.com/r/<sub>/comments/<id>/Single post + its comment tree
reddit.com/user/<name>/User profile + their submitted posts + their comment history
reddit.com/search?q=...Keyword search (post / comment / sr / user, depending on flags)
reddit.com/r/<sub>/search?q=...Search restricted to one subreddit

old.reddit.com and www.reddit.com are both accepted; URLs are normalized internally.

Output Data

Every dataset row carries a dataType discriminator so you can split them downstream.

Post (dataType: "post")

FieldTypeDescription
idstringReddit fullname (t3_xxx)
parsedIdstringBase-36 id without prefix
urlstringPermalink to the post (or external URL for link posts)
usernamestringAuthor
titlestringPost title
communityNamestringr/<subreddit>
parsedCommunityNamestringSubreddit name without r/ prefix
bodystringSelf-text (or external URL for link posts)
htmlstringRendered HTML for self-text
numberOfCommentsnumbernum_comments from Reddit
upVotesnumberScore
authorFlairstring | nullAuthor flair text
isVideobooleanTrue for video posts
isAdbooleanTrue for promoted/ad posts
over18booleanNSFW flag
createdAtstringISO 8601
scrapedAtstringISO 8601

Comment (dataType: "comment")

FieldTypeDescription
idstringt1_xxx
parsedIdstringBase-36 id
urlstringPermalink to the comment
parentIdstringParent fullname (t3_* for top-level, t1_* for replies)
usernamestringAuthor
authorFlairstring | nullFlair text
categorystringSubreddit name
communityNamestringr/<subreddit>
bodystringComment text (markdown)
htmlstringRendered HTML
upVotesnumberScore
numberOfRepliesnumberRecursive count of t1 replies underneath
createdAtstringISO 8601
scrapedAtstringISO 8601

Community (dataType: "community")

FieldTypeDescription
idstringt5_xxx
namestringDisplay name (no r/ prefix)
titlestringLong-form community title
headerImagestringBanner / header image URL
descriptionstringPublic description
over18booleanNSFW community flag
numberOfMembersnumberSubscribers
urlstringAbsolute permalink
createdAtstringISO 8601
scrapedAtstringISO 8601

User (dataType: "user")

FieldTypeDescription
idstringt2_xxx
urlstringProfile permalink
usernamestringReddit handle
userIconstringAvatar URL
postKarmanumberLink karma
commentKarmanumberComment karma
descriptionstringProfile description
over18booleanNSFW profile flag
createdAtstringISO 8601
scrapedAtstringISO 8601

Sample Output

{
"dataType": "post",
"id": "t3_1t16uqd",
"parsedId": "1t16uqd",
"url": "https://www.reddit.com/r/AskReddit/comments/1t16uqd/...",
"username": "IIlustriousTea",
"title": "US birth rates just hit another record low...",
"communityName": "r/AskReddit",
"parsedCommunityName": "AskReddit",
"body": "",
"html": "",
"numberOfComments": 8892,
"upVotes": 7657,
"authorFlair": null,
"isVideo": false,
"isAd": false,
"over18": false,
"createdAt": "2026-05-01T21:40:45.000Z",
"scrapedAt": "2026-05-02T05:53:19.442Z"
}

Input Parameters

Direct URLs

ParameterTypeDefaultDescription
startUrlsarray[]Reddit URLs to scrape. Mix any of: subreddit, post, user, or search URLs.
ignoreStartUrlsbooleanfalseForce-bypass the URLs field (helpful for tools like Zapier).
ParameterTypeDefaultDescription
searchesstring[][]Keywords to search. Used only when startUrls is empty.
searchCommunityNamestring""Restrict every search to one subreddit.
searchPostsbooleantrueInclude posts in search results.
searchCommentsbooleanfalseInclude comments (best-effort — Reddit's comment search returns parent posts).
searchCommunitiesbooleanfalseInclude matching communities.
searchUsersbooleanfalseInclude matching user profiles.
sortenumnewrelevance / hot / top / new / rising / comments.
timeenum""all / hour / day / week / month / year. Most useful with sort=top.

Filters

ParameterTypeDefaultDescription
includeNSFWbooleantrueInclude adult-rated posts and subreddits.
skipCommentsbooleanfalseDon't scrape comments when going through posts.
skipUserPostsbooleanfalseDon't scrape a user's submitted posts when going through their profile.
skipCommunitybooleanfalseDon't push community metadata when going through a subreddit.
postDateLimitISO dateOnly keep posts created after this date.
commentDateLimitISO dateOnly keep comments created after this date.

Limits

ParameterTypeDefaultDescription
maxItemsinteger10Hard global cap on dataset rows across all categories.
maxPostCountinteger10Per-listing cap on posts.
maxCommentsinteger10Per-post cap on comments (or global cap on comment-search/user-comments).
maxCommunitiesCountinteger2Cap on communities returned from search or leaderboard.
maxUserCountinteger2Cap on user profiles returned from search.

Advanced

ParameterTypeDefaultDescription
proxyobjectApify ResidentialApify proxy or your own proxy URLs. Residential is strongly recommended.
debugModebooleanfalseVerbose Crawlee logging.

How It Works

The actor sends authenticated-style HTTP requests to reddit.com/*.json using a descriptive non-browser User-Agent — Reddit's anonymous JSON endpoints reject Chrome-like UAs without browser cookies, so we explicitly disable Crawlee's automatic browser-fingerprint header injection. This keeps unauthenticated rate limits at their generous default (~100 requests/min) instead of falling back to the strict ~10/min anti-bot tier.

Comment trees are walked depth-first up to maxComments. Collapsed more stubs are expanded by POSTing to /api/morechildren.json in batches of 100 children — no extra request per comment.

The crawler aborts as soon as maxItems is hit, so over-runs are not a concern even with deep trees.

Pricing

This actor uses pay-per-event pricing:

EventPrice
Actor start$0.00005
Result extracted (per dataset row)$0.005

You only pay for what you scrape. Apify platform compute and proxy usage are billed separately based on your plan.

Integrations & Code Examples

bash (curl)

curl -X POST \
"https://api.apify.com/v2/acts/whoareyouanas~reddit-scraper/run-sync-get-dataset-items" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"startUrls":[{"url":"https://www.reddit.com/r/SaaS/"}],"maxItems":100}'

Python SDK

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("whoareyouanas/reddit-scraper").call(run_input={
"startUrls": [{"url": "https://www.reddit.com/r/SaaS/"}],
"maxItems": 100,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

Node.js SDK

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('whoareyouanas/reddit-scraper').call({
startUrls: [{ url: 'https://www.reddit.com/r/SaaS/' }],
maxItems: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

n8n / Make.com / Zapier

Use the Apify trigger/module in n8n or Make.com and point to actor whoareyouanas/reddit-scraper. The structured JSON output (posts, comments, users with parentId reply chains) lands directly in your workflow — pipe into Airtable, Google Sheets, Slack, or an AI summarisation step. See Late to the Reddit API party? Here's the backdoor for an n8n workflow example. Full integration docs: Apify integrations.

Frequently Asked Questions (FAQ)

Do I need a Reddit API key or developer account? No. This actor uses Reddit's public reddit.com/*.json endpoints — the same data your browser reads. No API key, no app approval, no mod-gate.

Will I hit the PRAW 1K post cap? No. The JSON endpoint paginates beyond 1K via after cursors. The actor follows all pages up to your maxItems or maxPostCount limit.

How is this different from Pushshift / PMAW / PullPush? Pushshift is now mod-only and its shards die mid-run. This actor uses Reddit's own live data — what you see on reddit.com is what you get.

How does this avoid 403 blocks? The actor sends non-browser HTTP requests with a non-Chrome User-Agent. Reddit's anonymous JSON endpoints serve this tier at ~100 req/min. Apify residential proxy (default) handles IP rotation for high-volume runs.

Will Reddit ban my account? No account is used. The actor runs completely unauthenticated against public endpoints — there's nothing to ban.

How do I scrape comment threads with reply context? Every comment row includes a parentId field: t3_* means it's a top-level reply to the post; t1_* means it replies to another comment. The depth-first walk preserves the full tree shape.

How much does it cost to scrape 100K results? ~$500 at $0.005/result + Apify compute charges. Compare to $50/mo Reddit lead-gen SaaS tools or Bright Data's $500+/mo enterprise contracts.

Can I integrate with n8n / Zapier / Make? Yes — full webhook + native integrations support. See the Code Examples section above.

Limitations

  • Comment search returns parent posts only (Reddit's API behavior); the actor enqueues those posts so their comment trees are still scraped. Treat it as best-effort.
  • Removed/deleted posts return a 404 envelope; they're logged and skipped without retry.
  • Login-walled content (private subreddits, NSFW-locked content for unauth) is not accessible via the JSON API and is silently skipped.

Was this scraper useful?

If this scraper saved you time on Reddit research, lead gen, or data collection, please leave a review on the Apify Store. Reviews are the single biggest visibility lever — and they help other buyers find this actor over less capable alternatives.