Reddit Scraper - Posts, Comments, Communities & Users avatar

Reddit Scraper - Posts, Comments, Communities & Users

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Reddit Scraper - Posts, Comments, Communities & Users

Reddit Scraper - Posts, Comments, Communities & Users

Scrape Reddit posts, comments, subreddits, and user profiles by URL or keyword search. No login required. Full comment trees, NSFW + date filters, pay only for what you scrape ($0.005 per result).

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Anas Nadeem

Anas Nadeem

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Reddit Scraper — Posts, Comments, Communities & Users

Scrape Reddit at scale — posts, comments, communities (subreddits), and user profiles. Works by direct URL or keyword search, supports nested comment trees, NSFW + date filters, and global item caps. No Reddit account or API key needed.

What does Reddit Scraper do?

This actor pulls structured data from Reddit's public JSON API. Drop in any Reddit URL — a subreddit, post, user profile, or search results — and it returns clean rows ready for analytics, monitoring, or LLM ingestion. You can also run a keyword search across posts, comments, communities, and users.

It runs on a lightweight HTTP path (no browser), so it's fast and cheap. Comment trees are walked depth-first and more stubs are expanded against /api/morechildren automatically.

Key Features

  • Multiple input modes — Start URLs, keyword search, or leaderboard fallback (popular subreddits)
  • Mixed inputs in one run — Combine subreddit URLs, post URLs, and user profiles freely
  • Full comment trees — Walks nested replies and expands collapsed branches via /api/morechildren
  • 4 result categories — Posts (t3), comments (t1), communities (t5), and users (t2)
  • Granular limits — Per-category caps (maxPostCount, maxComments, maxCommunitiesCount, maxUserCount) plus a global maxItems ceiling
  • Date and NSFW filterspostDateLimit, commentDateLimit, includeNSFW
  • Skip togglesskipComments, skipUserPosts, skipCommunity for narrower runs
  • Apify residential proxy — Recommended for production; defaults are pre-wired

Input Modes

The actor picks one of three modes based on what you provide:

  1. Start URLs (preferred) — When startUrls is non-empty, every other input mode is ignored.
  2. Search — When startUrls is empty but searches has at least one query.
  3. Leaderboard — When neither is set, the actor falls back to scraping r/popular's top communities.

Supported URL Shapes

URL patternWhat gets scraped
reddit.com/r/<sub>/Subreddit posts (sort/time honored), optional community-about, optional comments per post
reddit.com/r/<sub>/comments/<id>/Single post + its comment tree
reddit.com/user/<name>/User profile + their submitted posts + their comment history
reddit.com/search?q=...Keyword search (post / comment / sr / user, depending on flags)
reddit.com/r/<sub>/search?q=...Search restricted to one subreddit

old.reddit.com and www.reddit.com are both accepted; URLs are normalized internally.

Output Data

Every dataset row carries a dataType discriminator so you can split them downstream.

Post (dataType: "post")

FieldTypeDescription
idstringReddit fullname (t3_xxx)
parsedIdstringBase-36 id without prefix
urlstringPermalink to the post (or external URL for link posts)
usernamestringAuthor
titlestringPost title
communityNamestringr/<subreddit>
parsedCommunityNamestringSubreddit name without r/ prefix
bodystringSelf-text (or external URL for link posts)
htmlstringRendered HTML for self-text
numberOfCommentsnumbernum_comments from Reddit
upVotesnumberScore
authorFlairstring | nullAuthor flair text
isVideobooleanTrue for video posts
isAdbooleanTrue for promoted/ad posts
over18booleanNSFW flag
createdAtstringISO 8601
scrapedAtstringISO 8601

Comment (dataType: "comment")

FieldTypeDescription
idstringt1_xxx
parsedIdstringBase-36 id
urlstringPermalink to the comment
parentIdstringParent fullname (t3_* for top-level, t1_* for replies)
usernamestringAuthor
authorFlairstring | nullFlair text
categorystringSubreddit name
communityNamestringr/<subreddit>
bodystringComment text (markdown)
htmlstringRendered HTML
upVotesnumberScore
numberOfRepliesnumberRecursive count of t1 replies underneath
createdAtstringISO 8601
scrapedAtstringISO 8601

Community (dataType: "community")

FieldTypeDescription
idstringt5_xxx
namestringDisplay name (no r/ prefix)
titlestringLong-form community title
headerImagestringBanner / header image URL
descriptionstringPublic description
over18booleanNSFW community flag
numberOfMembersnumberSubscribers
urlstringAbsolute permalink
createdAtstringISO 8601
scrapedAtstringISO 8601

User (dataType: "user")

FieldTypeDescription
idstringt2_xxx
urlstringProfile permalink
usernamestringReddit handle
userIconstringAvatar URL
postKarmanumberLink karma
commentKarmanumberComment karma
descriptionstringProfile description
over18booleanNSFW profile flag
createdAtstringISO 8601
scrapedAtstringISO 8601

Sample Output

{
"dataType": "post",
"id": "t3_1t16uqd",
"parsedId": "1t16uqd",
"url": "https://www.reddit.com/r/AskReddit/comments/1t16uqd/...",
"username": "IIlustriousTea",
"title": "US birth rates just hit another record low...",
"communityName": "r/AskReddit",
"parsedCommunityName": "AskReddit",
"body": "",
"html": "",
"numberOfComments": 8892,
"upVotes": 7657,
"authorFlair": null,
"isVideo": false,
"isAd": false,
"over18": false,
"createdAt": "2026-05-01T21:40:45.000Z",
"scrapedAt": "2026-05-02T05:53:19.442Z"
}

Input Parameters

Direct URLs

ParameterTypeDefaultDescription
startUrlsarray[]Reddit URLs to scrape. Mix any of: subreddit, post, user, or search URLs.
ignoreStartUrlsbooleanfalseForce-bypass the URLs field (helpful for tools like Zapier).
ParameterTypeDefaultDescription
searchesstring[][]Keywords to search. Used only when startUrls is empty.
searchCommunityNamestring""Restrict every search to one subreddit.
searchPostsbooleantrueInclude posts in search results.
searchCommentsbooleanfalseInclude comments (best-effort — Reddit's comment search returns parent posts).
searchCommunitiesbooleanfalseInclude matching communities.
searchUsersbooleanfalseInclude matching user profiles.
sortenumnewrelevance / hot / top / new / rising / comments.
timeenum""all / hour / day / week / month / year. Most useful with sort=top.

Filters

ParameterTypeDefaultDescription
includeNSFWbooleantrueInclude adult-rated posts and subreddits.
skipCommentsbooleanfalseDon't scrape comments when going through posts.
skipUserPostsbooleanfalseDon't scrape a user's submitted posts when going through their profile.
skipCommunitybooleanfalseDon't push community metadata when going through a subreddit.
postDateLimitISO dateOnly keep posts created after this date.
commentDateLimitISO dateOnly keep comments created after this date.

Limits

ParameterTypeDefaultDescription
maxItemsinteger10Hard global cap on dataset rows across all categories.
maxPostCountinteger10Per-listing cap on posts.
maxCommentsinteger10Per-post cap on comments (or global cap on comment-search/user-comments).
maxCommunitiesCountinteger2Cap on communities returned from search or leaderboard.
maxUserCountinteger2Cap on user profiles returned from search.

Advanced

ParameterTypeDefaultDescription
proxyobjectApify ResidentialApify proxy or your own proxy URLs. Residential is strongly recommended.
debugModebooleanfalseVerbose Crawlee logging.

How It Works

The actor sends authenticated-style HTTP requests to reddit.com/*.json using a descriptive non-browser User-Agent — Reddit's anonymous JSON endpoints reject Chrome-like UAs without browser cookies, so we explicitly disable Crawlee's automatic browser-fingerprint header injection. This keeps unauthenticated rate limits at their generous default (~100 requests/min) instead of falling back to the strict ~10/min anti-bot tier.

Comment trees are walked depth-first up to maxComments. Collapsed more stubs are expanded by POSTing to /api/morechildren.json in batches of 100 children — no extra request per comment.

The crawler aborts as soon as maxItems is hit, so over-runs are not a concern even with deep trees.

Pricing

This actor uses pay-per-event pricing:

EventPrice
Actor start$0.00005
Result extracted (per dataset row)$0.005

You only pay for what you scrape. Apify platform compute and proxy usage are billed separately based on your plan.

Limitations

  • Comment search returns parent posts only (Reddit's API behavior); the actor enqueues those posts so their comment trees are still scraped. Treat it as best-effort.
  • Removed/deleted posts return a 404 envelope; they're logged and skipped without retry.
  • Login-walled content (private subreddits, NSFW-locked content for unauth) is not accessible via the JSON API and is silently skipped.