Reddit Scraper
Pricing
from $8.00 / 1,000 post scrapeds
Reddit Scraper
Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL. No API key required. Built for market research, brand monitoring, sentiment analysis, and AI/LLM training datasets.
Pricing
from $8.00 / 1,000 post scrapeds
Rating
0.0
(0)
Developer
Yuliia Kulakova
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Reddit Scraper — Posts, Comments & Profiles

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL — no API key required. Built for market research, brand monitoring, sentiment analysis, competitor intelligence, and AI/LLM training datasets.
💰 Pricing
Pay only for what you extract — three separate billing events:
| What | Cost |
|---|---|
| 📄 Posts | $8 per 1,000 |
| 💬 Comments | $6 per 1,000 |
| 👤 User profiles | $8 per 1,000 |
A small one-time actor-start fee applies per run. Posts filtered out by score, date, or comment count are not charged.
✨ Key Features
📄 Full Post Data
Every post record includes title, full body text, author, subreddit, score, upvote ratio, comment count, publish date, flair, NSFW flag, award count, thumbnail URL, and external link (for link posts). Rich structured JSON ready for analysis or AI pipelines.
💬 Comments with Full Thread Structure
Captures top-level comments and all nested replies in a single flat dataset. Each comment includes the full body text, author, score, depth level, and parentId for reconstructing threads. Deleted and removed comments are automatically skipped.
👤 User Profiles
Fetches the Reddit profile of each post author: total karma, link karma, comment karma, account age, gold status. Each unique author is fetched only once per run — no duplicate charges.
🔄 Four Input Types
- Subreddit URLs — scrape posts from any public subreddit
- Post URLs — scrape a specific post and optionally its comments
- User profile URLs — scrape a specific Reddit user's profile
- Search queries — find posts matching keywords across all of Reddit
🔃 Sort & Time Filters
Choose how posts are sorted: Hot, New, Top, or Rising. For Top posts, filter by time range: past hour, day, week, month, year, or all time.
🔍 Powerful Filters
- Minimum score — skip low-engagement posts
- Minimum comments — only posts with real discussion
- Date from — only posts published after a specific date
- Exclude NSFW — filter out adult content
🚀 Quick Start
Option 1 — Subreddit
Paste a subreddit URL to scrape its posts.
https://www.reddit.com/r/technology/https://www.reddit.com/r/MachineLearning/https://www.reddit.com/r/startups/
Option 2 — Specific Post
Paste a post URL to scrape that post and optionally all its comments.
https://www.reddit.com/r/technology/comments/abc123/post_title/
Option 3 — User Profile
Paste a user profile URL to scrape their Reddit profile data.
https://www.reddit.com/u/spezhttps://www.reddit.com/user/spez
Option 4 — Search Queries
Provide one or more search terms as a list. The scraper returns the most relevant posts for each query.
["ChatGPT alternatives", "best productivity tools 2025", "startup advice"]
⚙️ Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array | — | Reddit URLs: subreddit, post, or user profile |
searchQueries | array | — | Keywords to search across Reddit |
maxPosts | integer | 50 | Maximum posts to scrape (hard cap: 1,000) |
sort | string | hot | Post sort order: hot, new, top, rising |
timeFilter | string | week | Time range for Top sort: hour, day, week, month, year, all |
scrapeComments | boolean | false | Extract comments for each post |
maxCommentsPerPost | integer | 100 | Maximum comments per post (including replies) |
scrapeUserProfiles | boolean | false | Fetch author profile for each post |
filterByMinScore | integer | 0 | Skip posts with fewer upvotes than this |
filterByMinComments | integer | 0 | Skip posts with fewer comments than this |
filterByDateFrom | string | — | Only posts published on or after YYYY-MM-DD |
excludeNsfw | boolean | false | Exclude NSFW (18+) posts |
📦 Output Format
All results are saved to the Apify dataset. Three record types are mixed in a single dataset and can be filtered by the type field.
Post Record (type: "post")
One record per scraped post.
{"type": "post","postId": "1abc23","url": "https://www.reddit.com/r/technology/comments/1abc23/title/","title": "OpenAI releases new model with 10x lower cost","body": "The full post body text goes here...","author": "tech_user_42","subreddit": "technology","subredditSubscribers": 14500000,"score": 8420,"upvoteRatio": 0.94,"numComments": 312,"createdAt": "2026-04-01","flair": "AI","isNsfw": false,"isSelf": true,"isStickied": false,"isLocked": false,"awards": 3,"thumbnailUrl": null,"externalUrl": null,"scrapedAt": "2026-04-09T10:00:00.000Z"}
Field reference:
| Field | Type | Description |
|---|---|---|
postId | string | Reddit post ID |
url | string | Full URL to the post |
title | string | Post title |
body | string | Post body text (empty for link posts) |
author | string | Reddit username of the author |
subreddit | string | Subreddit name |
subredditSubscribers | integer | Number of subreddit members |
score | integer | Net upvotes (upvotes minus downvotes) |
upvoteRatio | number | Ratio of upvotes to total votes (0–1) |
numComments | integer | Number of comments on the post |
createdAt | string | Date the post was published (YYYY-MM-DD) |
flair | string | Post flair tag set by the author or moderators |
isNsfw | boolean | True if the post is marked as 18+ |
isSelf | boolean | True if this is a text post; false for link posts |
isStickied | boolean | True if the post is pinned by moderators |
isLocked | boolean | True if comments are disabled |
awards | integer | Total number of awards received |
thumbnailUrl | string | Thumbnail image URL (link posts only) |
externalUrl | string | External link URL (link posts only) |
scrapedAt | string | ISO timestamp of when the record was created |
Comment Record (type: "comment")
One record per comment or reply. Use depth and parentId to reconstruct the thread structure.
{"type": "comment","commentId": "k1x2y3z","postId": "1abc23","postTitle": "OpenAI releases new model with 10x lower cost","parentId": "t3_1abc23","body": "This is a really interesting development. The cost reduction alone changes everything for startups.","author": "ml_engineer_99","score": 342,"depth": 0,"createdAt": "2026-04-01","isStickied": false,"distinguished": null,"awards": 1,"scrapedAt": "2026-04-09T10:00:00.000Z"}
Field reference:
| Field | Type | Description |
|---|---|---|
commentId | string | Unique Reddit comment ID |
postId | string | ID of the parent post |
postTitle | string | Title of the parent post |
parentId | string | ID of the parent comment or post (t1_... for comment parent, t3_... for top-level) |
body | string | Full comment text |
author | string | Reddit username of the commenter |
score | integer | Net upvotes on the comment |
depth | integer | Nesting level (0 = top-level, 1 = reply, 2 = reply to reply, etc.) |
createdAt | string | Date the comment was posted (YYYY-MM-DD) |
isStickied | boolean | True if the comment is pinned by moderators |
distinguished | string | "moderator" or "admin" if applicable, otherwise null |
awards | integer | Number of awards on the comment |
Profile Record (type: "profile")
One record per unique user. Only created when scrapeUserProfiles is enabled.
{"type": "profile","username": "tech_user_42","profileUrl": "https://www.reddit.com/user/tech_user_42","totalKarma": 48200,"linkKarma": 12000,"commentKarma": 36200,"isGold": false,"isEmployee": false,"createdAt": "2019-03-15","iconUrl": "https://styles.redditmedia.com/...","scrapedAt": "2026-04-09T10:00:00.000Z"}
Field reference:
| Field | Type | Description |
|---|---|---|
username | string | Reddit username |
profileUrl | string | Full URL to the user's profile |
totalKarma | integer | Total karma (link + comment) |
linkKarma | integer | Karma from posts |
commentKarma | integer | Karma from comments |
isGold | boolean | True if the user has Reddit Gold |
isEmployee | boolean | True if the user is a Reddit employee |
createdAt | string | Account creation date (YYYY-MM-DD) |
iconUrl | string | Profile avatar image URL |
🔍 Use Case Examples
Brand monitoring — find mentions of your product
{"searchQueries": ["notion app", "notion review", "notion alternative"],"maxPosts": 200,"scrapeComments": true,"maxCommentsPerPost": 200,"filterByMinScore": 10}
Competitor sentiment analysis
{"searchQueries": ["linear vs jira", "figma vs sketch 2026", "shopify vs woocommerce"],"maxPosts": 100,"scrapeComments": true,"maxCommentsPerPost": 300,"filterByMinComments": 20}
Trending topics in a niche subreddit
{"startUrls": [{ "url": "https://www.reddit.com/r/MachineLearning/" }],"maxPosts": 100,"sort": "top","timeFilter": "week","filterByMinScore": 100,"scrapeComments": true,"maxCommentsPerPost": 100}
AI/LLM training dataset from a subreddit
{"startUrls": [{ "url": "https://www.reddit.com/r/personalfinance/" }],"maxPosts": 1000,"sort": "top","timeFilter": "year","scrapeComments": true,"maxCommentsPerPost": 200,"filterByMinScore": 50}
Lead generation — find people asking for recommendations
{"searchQueries": ["looking for CRM recommendation", "best project management tool", "need accounting software"],"maxPosts": 100,"scrapeComments": true,"maxCommentsPerPost": 100,"filterByMinComments": 5,"scrapeUserProfiles": true}
Monitor a subreddit for recent posts
{"startUrls": [{ "url": "https://www.reddit.com/r/entrepreneur/" }],"maxPosts": 50,"sort": "new","filterByDateFrom": "2026-04-01","scrapeComments": false}
Scrape a specific viral post with all comments
{"startUrls": [{ "url": "https://www.reddit.com/r/AskReddit/comments/xyz123/post_title/" }],"scrapeComments": true,"maxCommentsPerPost": 500}
Research influencers in a subreddit
{"startUrls": [{ "url": "https://www.reddit.com/r/webdev/" }],"maxPosts": 100,"sort": "top","timeFilter": "month","scrapeUserProfiles": true,"filterByMinScore": 200}
📊 Who Uses This
| Use Case | Who | What They Get |
|---|---|---|
| Brand monitoring | Marketing teams | All Reddit mentions of a brand or product in structured JSON |
| Competitor research | Product managers | What users say about competitor products across relevant subreddits |
| Sentiment analysis | Analysts | Comment corpora with scores, dates, and thread context |
| Lead generation | Sales teams | Posts where people ask for product/service recommendations |
| LLM training data | AI & ML teams | High-quality discussion threads from expert communities |
| Trend discovery | Marketers & creators | What's going viral in a niche before it hits mainstream |
| Academic research | Researchers | Public discussion datasets for NLP and social science |
| Influencer identification | Agencies | Top contributors in niche subreddits with karma and activity |
| Market research | Consultants | Consumer opinions, pain points, and demand signals |
| Financial research | Investors | Retail investor sentiment from finance subreddits |
💡 Pro Tips
1. Use Top + time filter for the best content
Set sort: "top" with timeFilter: "month" or "year" to get the highest-quality, most-upvoted posts in a subreddit. These tend to have the most valuable comments and discussion.
2. Combine subreddits and search in one run
You can mix startUrls (subreddits) and searchQueries in a single run. Results from all sources are deduplicated — each post is processed only once.
3. Filter by minimum score to skip noise
Set filterByMinScore: 50 or higher to skip low-engagement posts that have few votes and are likely low quality. This reduces cost and improves dataset quality.
4. Author profiles are deduplicated automatically
When scrapeUserProfiles is enabled, each unique author is fetched only once — even if they authored multiple posts in the run. You are only charged once per author.
5. Use search for cross-subreddit coverage
A search query like "best CRM tool" finds posts from r/sales, r/startups, r/smallbusiness, and more — all in one run. More comprehensive than scraping individual subreddits.
6. Nested comments via parentId
Comment records include a parentId field. If parentId starts with t3_, the comment is a top-level reply to the post. If it starts with t1_, it is a reply to another comment. Use depth to quickly filter by nesting level.
7. Schedule weekly incremental runs
Use Apify Scheduler with filterByDateFrom set to the previous Monday. This way each run only picks up new posts and you never scrape the same content twice.
8. NSFW filtering
Enable excludeNsfw when scraping general-topic subreddits (like r/AskReddit or r/funny) to keep datasets clean for professional or academic use.
❓ FAQ
Q: Do I need a Reddit API key or account?
No. The scraper uses Reddit's public JSON API — accessible by appending .json to any Reddit URL. No API key, OAuth token, or Reddit account is required.
Q: Why is there a 1,000 post limit? This is a hard limit enforced by Reddit's API. Regardless of pagination, Reddit's listing endpoints return a maximum of 1,000 posts per sort category. This limit cannot be bypassed. For most use cases 1,000 posts provides more than enough data.
Q: Can I scrape private subreddits? No. The scraper only accesses publicly available content — the same content visible to any logged-out user. Private, quarantined, and banned subreddits return an error and are skipped.
Q: Can I scrape NSFW subreddits? NSFW subreddit content requires Reddit account authentication, which this scraper does not use. NSFW content from public feeds (mixed in with regular posts) is accessible, but dedicated NSFW subreddits are not.
Q: Why might some posts show score: 0? Reddit applies vote fuzzing to all posts — the displayed score is slightly randomized to prevent vote manipulation detection. Posts with very few votes may show 0 even if they have some upvotes.
Q: How are comments structured?
Comments are returned as a flat list. Use depth (0 = top-level, 1 = reply, 2 = reply to reply) and parentId to reconstruct the full thread tree in your own code.
Q: Are deleted comments included?
No. Comments where the body is [deleted] or [removed] are automatically skipped. Only comments with actual text content are saved.
Q: How does billing work? You are charged per event: $8 per 1,000 posts, $6 per 1,000 comments, and $8 per 1,000 user profiles. Posts that are filtered out by score, date, or comment count are not billed. A small one-time actor-start fee applies per run.
Q: Can I run this on a schedule?
Yes. Use Apify Scheduler to run the actor daily or weekly. Set filterByDateFrom to avoid re-scraping old content. Each run only processes newly published posts from the specified date onward.
Q: What happens if Reddit rate-limits the scraper?
The scraper automatically reads Reddit's rate-limit headers (X-Ratelimit-Remaining, X-Ratelimit-Reset) and pauses when the quota is nearly exhausted. On HTTP 429 responses, it backs off with increasing delays before retrying. You will never lose data due to rate limiting.
⚠️ Limits & Notes
- 1,000 post cap — Reddit's API hard limit per listing endpoint. Documented honestly; cannot be bypassed.
- Public content only — Private, quarantined, and banned subreddits are not accessible without authentication.
- Vote fuzzing — Reddit randomizes vote counts slightly;
scorevalues may differ slightly from what you see in the browser. - Comment depth — Reddit limits comment thread depth to 10 levels. Deeply nested replies beyond level 10 are not returned by the API.
[deleted]content — Posts or comments where the author deleted their account showauthor: "[deleted]". The content may still be present or also deleted.- Relative dates — All dates are converted to
YYYY-MM-DDformat from Unix timestamps for consistency. - NSFW subreddits — Dedicated adult-content subreddits require OAuth authentication and are not accessible with this scraper.
⚖️ Legal & Ethical Use
This scraper accesses publicly available data on Reddit — the same data visible to any user without logging in. Use it for legitimate research, content analysis, and data science purposes.
Always comply with:
- Reddit Terms of Service
- Reddit Privacy Policy
- Applicable data protection regulations (GDPR, CCPA, etc.)
Do not use scraped data to harass individual users, build spam systems, or engage in vote manipulation.
🛠️ Technical Notes
- Built on the Apify SDK with pay-per-event billing (
Actor.charge()) - Uses Reddit's public JSON API via
www.reddit.com— no browser automation, pure HTTP requests - No browser automation required — pure HTTP requests for speed and low resource usage
- Automatically reads
X-Ratelimit-*response headers and pauses before quota exhaustion - Exponential backoff on HTTP 429 (rate limit) and transient HTTP errors
- Residential proxy routing on all requests for reliable access
- Comment threads are fully flattened recursively — all nested replies are captured regardless of depth
- Author profiles are deduplicated per run — each unique username is fetched at most once