Reddit Scraper avatar

Reddit Scraper

Pricing

from $8.00 / 1,000 post scrapeds

Go to Apify Store
Reddit Scraper

Reddit Scraper

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL. No API key required. Built for market research, brand monitoring, sentiment analysis, and AI/LLM training datasets.

Pricing

from $8.00 / 1,000 post scrapeds

Rating

0.0

(0)

Developer

Yuliia Kulakova

Yuliia Kulakova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Reddit Scraper — Posts, Comments & Profiles

Reddit Scraper

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL — no API key required. Built for market research, brand monitoring, sentiment analysis, competitor intelligence, and AI/LLM training datasets.


💰 Pricing

Pay only for what you extract — three separate billing events:

WhatCost
📄 Posts$8 per 1,000
💬 Comments$6 per 1,000
👤 User profiles$8 per 1,000

A small one-time actor-start fee applies per run. Posts filtered out by score, date, or comment count are not charged.


✨ Key Features

📄 Full Post Data

Every post record includes title, full body text, author, subreddit, score, upvote ratio, comment count, publish date, flair, NSFW flag, award count, thumbnail URL, and external link (for link posts). Rich structured JSON ready for analysis or AI pipelines.

💬 Comments with Full Thread Structure

Captures top-level comments and all nested replies in a single flat dataset. Each comment includes the full body text, author, score, depth level, and parentId for reconstructing threads. Deleted and removed comments are automatically skipped.

👤 User Profiles

Fetches the Reddit profile of each post author: total karma, link karma, comment karma, account age, gold status. Each unique author is fetched only once per run — no duplicate charges.

🔄 Four Input Types

  • Subreddit URLs — scrape posts from any public subreddit
  • Post URLs — scrape a specific post and optionally its comments
  • User profile URLs — scrape a specific Reddit user's profile
  • Search queries — find posts matching keywords across all of Reddit

🔃 Sort & Time Filters

Choose how posts are sorted: Hot, New, Top, or Rising. For Top posts, filter by time range: past hour, day, week, month, year, or all time.

🔍 Powerful Filters

  • Minimum score — skip low-engagement posts
  • Minimum comments — only posts with real discussion
  • Date from — only posts published after a specific date
  • Exclude NSFW — filter out adult content

🚀 Quick Start

Option 1 — Subreddit

Paste a subreddit URL to scrape its posts.

https://www.reddit.com/r/technology/
https://www.reddit.com/r/MachineLearning/
https://www.reddit.com/r/startups/

Option 2 — Specific Post

Paste a post URL to scrape that post and optionally all its comments.

https://www.reddit.com/r/technology/comments/abc123/post_title/

Option 3 — User Profile

Paste a user profile URL to scrape their Reddit profile data.

https://www.reddit.com/u/spez
https://www.reddit.com/user/spez

Option 4 — Search Queries

Provide one or more search terms as a list. The scraper returns the most relevant posts for each query.

["ChatGPT alternatives", "best productivity tools 2025", "startup advice"]

⚙️ Input Parameters

ParameterTypeDefaultDescription
startUrlsarrayReddit URLs: subreddit, post, or user profile
searchQueriesarrayKeywords to search across Reddit
maxPostsinteger50Maximum posts to scrape (hard cap: 1,000)
sortstringhotPost sort order: hot, new, top, rising
timeFilterstringweekTime range for Top sort: hour, day, week, month, year, all
scrapeCommentsbooleanfalseExtract comments for each post
maxCommentsPerPostinteger100Maximum comments per post (including replies)
scrapeUserProfilesbooleanfalseFetch author profile for each post
filterByMinScoreinteger0Skip posts with fewer upvotes than this
filterByMinCommentsinteger0Skip posts with fewer comments than this
filterByDateFromstringOnly posts published on or after YYYY-MM-DD
excludeNsfwbooleanfalseExclude NSFW (18+) posts

📦 Output Format

All results are saved to the Apify dataset. Three record types are mixed in a single dataset and can be filtered by the type field.


Post Record (type: "post")

One record per scraped post.

{
"type": "post",
"postId": "1abc23",
"url": "https://www.reddit.com/r/technology/comments/1abc23/title/",
"title": "OpenAI releases new model with 10x lower cost",
"body": "The full post body text goes here...",
"author": "tech_user_42",
"subreddit": "technology",
"subredditSubscribers": 14500000,
"score": 8420,
"upvoteRatio": 0.94,
"numComments": 312,
"createdAt": "2026-04-01",
"flair": "AI",
"isNsfw": false,
"isSelf": true,
"isStickied": false,
"isLocked": false,
"awards": 3,
"thumbnailUrl": null,
"externalUrl": null,
"scrapedAt": "2026-04-09T10:00:00.000Z"
}

Field reference:

FieldTypeDescription
postIdstringReddit post ID
urlstringFull URL to the post
titlestringPost title
bodystringPost body text (empty for link posts)
authorstringReddit username of the author
subredditstringSubreddit name
subredditSubscribersintegerNumber of subreddit members
scoreintegerNet upvotes (upvotes minus downvotes)
upvoteRationumberRatio of upvotes to total votes (0–1)
numCommentsintegerNumber of comments on the post
createdAtstringDate the post was published (YYYY-MM-DD)
flairstringPost flair tag set by the author or moderators
isNsfwbooleanTrue if the post is marked as 18+
isSelfbooleanTrue if this is a text post; false for link posts
isStickiedbooleanTrue if the post is pinned by moderators
isLockedbooleanTrue if comments are disabled
awardsintegerTotal number of awards received
thumbnailUrlstringThumbnail image URL (link posts only)
externalUrlstringExternal link URL (link posts only)
scrapedAtstringISO timestamp of when the record was created

Comment Record (type: "comment")

One record per comment or reply. Use depth and parentId to reconstruct the thread structure.

{
"type": "comment",
"commentId": "k1x2y3z",
"postId": "1abc23",
"postTitle": "OpenAI releases new model with 10x lower cost",
"parentId": "t3_1abc23",
"body": "This is a really interesting development. The cost reduction alone changes everything for startups.",
"author": "ml_engineer_99",
"score": 342,
"depth": 0,
"createdAt": "2026-04-01",
"isStickied": false,
"distinguished": null,
"awards": 1,
"scrapedAt": "2026-04-09T10:00:00.000Z"
}

Field reference:

FieldTypeDescription
commentIdstringUnique Reddit comment ID
postIdstringID of the parent post
postTitlestringTitle of the parent post
parentIdstringID of the parent comment or post (t1_... for comment parent, t3_... for top-level)
bodystringFull comment text
authorstringReddit username of the commenter
scoreintegerNet upvotes on the comment
depthintegerNesting level (0 = top-level, 1 = reply, 2 = reply to reply, etc.)
createdAtstringDate the comment was posted (YYYY-MM-DD)
isStickiedbooleanTrue if the comment is pinned by moderators
distinguishedstring"moderator" or "admin" if applicable, otherwise null
awardsintegerNumber of awards on the comment

Profile Record (type: "profile")

One record per unique user. Only created when scrapeUserProfiles is enabled.

{
"type": "profile",
"username": "tech_user_42",
"profileUrl": "https://www.reddit.com/user/tech_user_42",
"totalKarma": 48200,
"linkKarma": 12000,
"commentKarma": 36200,
"isGold": false,
"isEmployee": false,
"createdAt": "2019-03-15",
"iconUrl": "https://styles.redditmedia.com/...",
"scrapedAt": "2026-04-09T10:00:00.000Z"
}

Field reference:

FieldTypeDescription
usernamestringReddit username
profileUrlstringFull URL to the user's profile
totalKarmaintegerTotal karma (link + comment)
linkKarmaintegerKarma from posts
commentKarmaintegerKarma from comments
isGoldbooleanTrue if the user has Reddit Gold
isEmployeebooleanTrue if the user is a Reddit employee
createdAtstringAccount creation date (YYYY-MM-DD)
iconUrlstringProfile avatar image URL

🔍 Use Case Examples

Brand monitoring — find mentions of your product

{
"searchQueries": ["notion app", "notion review", "notion alternative"],
"maxPosts": 200,
"scrapeComments": true,
"maxCommentsPerPost": 200,
"filterByMinScore": 10
}

Competitor sentiment analysis

{
"searchQueries": ["linear vs jira", "figma vs sketch 2026", "shopify vs woocommerce"],
"maxPosts": 100,
"scrapeComments": true,
"maxCommentsPerPost": 300,
"filterByMinComments": 20
}
{
"startUrls": [{ "url": "https://www.reddit.com/r/MachineLearning/" }],
"maxPosts": 100,
"sort": "top",
"timeFilter": "week",
"filterByMinScore": 100,
"scrapeComments": true,
"maxCommentsPerPost": 100
}

AI/LLM training dataset from a subreddit

{
"startUrls": [{ "url": "https://www.reddit.com/r/personalfinance/" }],
"maxPosts": 1000,
"sort": "top",
"timeFilter": "year",
"scrapeComments": true,
"maxCommentsPerPost": 200,
"filterByMinScore": 50
}

Lead generation — find people asking for recommendations

{
"searchQueries": ["looking for CRM recommendation", "best project management tool", "need accounting software"],
"maxPosts": 100,
"scrapeComments": true,
"maxCommentsPerPost": 100,
"filterByMinComments": 5,
"scrapeUserProfiles": true
}

Monitor a subreddit for recent posts

{
"startUrls": [{ "url": "https://www.reddit.com/r/entrepreneur/" }],
"maxPosts": 50,
"sort": "new",
"filterByDateFrom": "2026-04-01",
"scrapeComments": false
}

Scrape a specific viral post with all comments

{
"startUrls": [{ "url": "https://www.reddit.com/r/AskReddit/comments/xyz123/post_title/" }],
"scrapeComments": true,
"maxCommentsPerPost": 500
}

Research influencers in a subreddit

{
"startUrls": [{ "url": "https://www.reddit.com/r/webdev/" }],
"maxPosts": 100,
"sort": "top",
"timeFilter": "month",
"scrapeUserProfiles": true,
"filterByMinScore": 200
}

📊 Who Uses This

Use CaseWhoWhat They Get
Brand monitoringMarketing teamsAll Reddit mentions of a brand or product in structured JSON
Competitor researchProduct managersWhat users say about competitor products across relevant subreddits
Sentiment analysisAnalystsComment corpora with scores, dates, and thread context
Lead generationSales teamsPosts where people ask for product/service recommendations
LLM training dataAI & ML teamsHigh-quality discussion threads from expert communities
Trend discoveryMarketers & creatorsWhat's going viral in a niche before it hits mainstream
Academic researchResearchersPublic discussion datasets for NLP and social science
Influencer identificationAgenciesTop contributors in niche subreddits with karma and activity
Market researchConsultantsConsumer opinions, pain points, and demand signals
Financial researchInvestorsRetail investor sentiment from finance subreddits

💡 Pro Tips

1. Use Top + time filter for the best content Set sort: "top" with timeFilter: "month" or "year" to get the highest-quality, most-upvoted posts in a subreddit. These tend to have the most valuable comments and discussion.

2. Combine subreddits and search in one run You can mix startUrls (subreddits) and searchQueries in a single run. Results from all sources are deduplicated — each post is processed only once.

3. Filter by minimum score to skip noise Set filterByMinScore: 50 or higher to skip low-engagement posts that have few votes and are likely low quality. This reduces cost and improves dataset quality.

4. Author profiles are deduplicated automatically When scrapeUserProfiles is enabled, each unique author is fetched only once — even if they authored multiple posts in the run. You are only charged once per author.

5. Use search for cross-subreddit coverage A search query like "best CRM tool" finds posts from r/sales, r/startups, r/smallbusiness, and more — all in one run. More comprehensive than scraping individual subreddits.

6. Nested comments via parentId Comment records include a parentId field. If parentId starts with t3_, the comment is a top-level reply to the post. If it starts with t1_, it is a reply to another comment. Use depth to quickly filter by nesting level.

7. Schedule weekly incremental runs Use Apify Scheduler with filterByDateFrom set to the previous Monday. This way each run only picks up new posts and you never scrape the same content twice.

8. NSFW filtering Enable excludeNsfw when scraping general-topic subreddits (like r/AskReddit or r/funny) to keep datasets clean for professional or academic use.


❓ FAQ

Q: Do I need a Reddit API key or account? No. The scraper uses Reddit's public JSON API — accessible by appending .json to any Reddit URL. No API key, OAuth token, or Reddit account is required.

Q: Why is there a 1,000 post limit? This is a hard limit enforced by Reddit's API. Regardless of pagination, Reddit's listing endpoints return a maximum of 1,000 posts per sort category. This limit cannot be bypassed. For most use cases 1,000 posts provides more than enough data.

Q: Can I scrape private subreddits? No. The scraper only accesses publicly available content — the same content visible to any logged-out user. Private, quarantined, and banned subreddits return an error and are skipped.

Q: Can I scrape NSFW subreddits? NSFW subreddit content requires Reddit account authentication, which this scraper does not use. NSFW content from public feeds (mixed in with regular posts) is accessible, but dedicated NSFW subreddits are not.

Q: Why might some posts show score: 0? Reddit applies vote fuzzing to all posts — the displayed score is slightly randomized to prevent vote manipulation detection. Posts with very few votes may show 0 even if they have some upvotes.

Q: How are comments structured? Comments are returned as a flat list. Use depth (0 = top-level, 1 = reply, 2 = reply to reply) and parentId to reconstruct the full thread tree in your own code.

Q: Are deleted comments included? No. Comments where the body is [deleted] or [removed] are automatically skipped. Only comments with actual text content are saved.

Q: How does billing work? You are charged per event: $8 per 1,000 posts, $6 per 1,000 comments, and $8 per 1,000 user profiles. Posts that are filtered out by score, date, or comment count are not billed. A small one-time actor-start fee applies per run.

Q: Can I run this on a schedule? Yes. Use Apify Scheduler to run the actor daily or weekly. Set filterByDateFrom to avoid re-scraping old content. Each run only processes newly published posts from the specified date onward.

Q: What happens if Reddit rate-limits the scraper? The scraper automatically reads Reddit's rate-limit headers (X-Ratelimit-Remaining, X-Ratelimit-Reset) and pauses when the quota is nearly exhausted. On HTTP 429 responses, it backs off with increasing delays before retrying. You will never lose data due to rate limiting.


⚠️ Limits & Notes

  • 1,000 post cap — Reddit's API hard limit per listing endpoint. Documented honestly; cannot be bypassed.
  • Public content only — Private, quarantined, and banned subreddits are not accessible without authentication.
  • Vote fuzzing — Reddit randomizes vote counts slightly; score values may differ slightly from what you see in the browser.
  • Comment depth — Reddit limits comment thread depth to 10 levels. Deeply nested replies beyond level 10 are not returned by the API.
  • [deleted] content — Posts or comments where the author deleted their account show author: "[deleted]". The content may still be present or also deleted.
  • Relative dates — All dates are converted to YYYY-MM-DD format from Unix timestamps for consistency.
  • NSFW subreddits — Dedicated adult-content subreddits require OAuth authentication and are not accessible with this scraper.

This scraper accesses publicly available data on Reddit — the same data visible to any user without logging in. Use it for legitimate research, content analysis, and data science purposes.

Always comply with:

Do not use scraped data to harass individual users, build spam systems, or engage in vote manipulation.


🛠️ Technical Notes

  • Built on the Apify SDK with pay-per-event billing (Actor.charge())
  • Uses Reddit's public JSON API via www.reddit.com — no browser automation, pure HTTP requests
  • No browser automation required — pure HTTP requests for speed and low resource usage
  • Automatically reads X-Ratelimit-* response headers and pauses before quota exhaustion
  • Exponential backoff on HTTP 429 (rate limit) and transient HTTP errors
  • Residential proxy routing on all requests for reliable access
  • Comment threads are fully flattened recursively — all nested replies are captured regardless of depth
  • Author profiles are deduplicated per run — each unique username is fetched at most once