Reddit Post & Comment Scraper
Pricing
$10.00/month + usage
Reddit Post & Comment Scraper
Scrape unlimited comments from any posts with 99% accuracy (highest of the Apify Store). Input any Reddit post URL and get complete, rich JSON data, including deeply nested comment threads, scores, author details, and awards. Comments tree is already built for you.
Pricing
$10.00/month + usage
Rating
0.0
(0)
Developer

Ion Belei
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 hours ago
Last modified
Categories
Share
Reddit Post & Comment Scraper — Scrape Reddit with 99% Accuracy
Scrape unlimited comments from any posts with the highest comment accuracy on the Apify Store. Input any Reddit post URL and get complete, rich JSON data, including deeply nested comment threads, scores, author details, and awards.
Unlike other Reddit scrapers, this Actor does not use a browser. That means lower compute costs for you and faster extraction times. You get the full data Reddit has on a post, not just what's visible on the web page, in tree formated way just like on reddit.
Why This Reddit Scraper?
| Feature | This Actor | Most Actors |
|---|---|---|
| Comment scraping accuracy | 99% (100% for posts under 500 comments) | 40-80% (miss nested replies) |
| Data depth | Full Reddit JSON (80+ fields per post) | Surface-level web scrape |
| Browser required | No (lightweight HTTP) | Yes (Playwright/Puppeteer) |
| Compute cost | Low | 3-5x higher |
| Comment Tree | Already Created | Missing, need to do it yourself |
What Data Can You Extract from Reddit?
Each scraped post includes 80+ data fields, far more than what you see on the Reddit website.
Post Data
- Content: title, selftext, URL, domain, permalink
- Metrics: score, upvotes, downvotes, upvote_ratio, num_comments, num_crossposts
- Author: username, author_fullname, author_premium, author_flair
- Subreddit: subreddit name, subscriber count, subreddit type
- Metadata: created_utc, edited, archived, locked, spoiler, over_18 (NSFW)
- Awards: all_awardings, total_awards_received, gilded
- Media: media, media_embed, is_video, thumbnail
Comment Data (nested with full reply threads)
- Content: body, body_html
- Metrics: score, ups, downs, controversiality
- Author: author, author_fullname, author_premium, author_flair
- Structure: parent_id, depth, link_id, is_submitter
- Metadata: created_utc, edited, stickied, distinguished, collapsed
- Nested replies: Full recursive reply threads preserved in tree structure
Use Cases of the Reddit Scraper
- Sentiment analysis — Extract thousands of comments with scores and metadata to analyze public opinion on any topic, product, or brand.
- AI/ML training data — Build training datasets from Reddit's rich comment threads for NLP models, chatbots, and language research.
- Brand monitoring — Track what people say about your brand or product by scraping comment threads from relevant subreddits.
- Market research — Analyze discussions in niche subreddits to understand customer pain points, feature requests, and competitor perception.
- Content research — Find high-engagement discussions and trending topics to fuel your content strategy.
- Academic research — Collect structured Reddit data for social science, linguistics, and behavioral research.
How to Scrape Reddit Posts and Comments
Step 1: Prepare Your Input
Add Reddit post URLs to the input. You can scrape one post or many at once.
{"urls": [{"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/"},{"url": "https://www.reddit.com/r/technology/comments/example_post/"}],"sort_type": "top","max_comments": null,"lite_mode": false}
Step 2: Configure Options
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | Array | Required | List of Reddit post URLs to scrape. |
sort_type | String | "top" | How to sort comments: top, best, new, controversial, old, qa. |
max_comments | Number | null | Maximum number of comments to extract per post. Set to null for all comments. |
lite_mode | Boolean | false | If true, returns lightweight comment objects with only core visible fields. |
Step 3: Run and Export
Click Start to begin scraping. Once finished, go to the Dataset tab to view results. Export options:
- JSON — Full structured data, ideal for programmatic use
- CSV — Flattened data for spreadsheets (note: nested comments are best consumed as JSON)
- Excel — Direct download for quick analysis
- XML — For XML-based workflows
- API — Access results programmatically via the Apify REST API
Output Schema
Each dataset item represents one scraped Reddit post with its complete comment tree.
- Default mode (
lite_mode: false): returns full Reddit JSON fields for post + comments. - Lightweight mode (
lite_mode: true): keeps full post fields but returns reduced comment objects.
Additional fields may appear depending on the post type.
Post Fields
| Field | Type | Description |
|---|---|---|
title | String | The title of the Reddit post |
author | String | Username of the post author |
author_fullname | String | Reddit's internal fullname (e.g., t2_abc123) |
author_premium | Boolean | Whether the author has Reddit Premium |
subreddit | String | Subreddit name (without r/ prefix) |
subreddit_id | String | Reddit's internal ID for the subreddit |
subreddit_subscribers | Integer | Number of subscribers in the subreddit |
id | String | The post's unique ID |
name | String | The post's fullname (e.g., t3_abc123) |
selftext | String | Post text content (empty string for link posts) |
selftext_html | String/Null | HTML-rendered version of the post text |
score | Integer | Net score (upvotes minus downvotes) |
ups | Integer | Number of upvotes |
upvote_ratio | Number | Ratio of upvotes to total votes (0.0–1.0) |
num_comments | Integer | Total comment count (as reported by Reddit) |
created_utc | Number | Unix timestamp of creation |
edited | Boolean/Number | false if not edited, or Unix timestamp of last edit |
permalink | String | Relative URL path to the post |
url | String | The original post URL or the submitted link URL |
domain | String | Domain of the URL (e.g., self.AskReddit, i.imgur.com) |
over_18 | Boolean | Whether the post is NSFW |
spoiler | Boolean | Whether the post is a spoiler |
locked | Boolean | Whether the post is locked |
archived | Boolean | Whether the post is archived |
stickied | Boolean | Whether the post is pinned |
is_video | Boolean | Whether the post contains a Reddit-hosted video |
thumbnail | String | Thumbnail URL or keyword (self, default, nsfw) |
total_awards_received | Integer | Total number of awards received |
all_awardings | Array | Detailed list of all awards |
gilded | Integer | Number of times gilded |
link_flair_text | String/Null | Post flair tag text |
num_crossposts | Integer | Number of crossposts |
comments | Array | Full nested comment tree (see below) |
success | Boolean | Whether the post was successfully scraped |
error | String/Null | Error message if scraping failed, null otherwise |
Conditional Post Fields
These fields appear depending on the post type:
| Field | Type | When present |
|---|---|---|
media | Object/Null | Video/embed posts — structure varies by media type (Reddit video, YouTube embed, etc.) |
media_embed | Object | Posts with embedded media |
preview | Object | Image/link posts — contains images array with source and resolutions |
post_hint | String | Certain post types: image, link, hosted:video, rich:video, self |
url_overridden_by_dest | String | Link posts — the destination URL |
crosspost_parent_list | Array | Crossposted content — includes the original post data |
Comment Fields (Default Mode)
Each comment in the comments array contains these fields, plus a replies array with the same structure (recursively for the full thread):
| Field | Type | Description |
|---|---|---|
id | String | The comment's unique ID |
author | String | Username of the comment author |
author_fullname | String | Reddit's internal fullname for the author |
body | String | Comment text in Markdown |
body_html | String | HTML-rendered comment body |
score | Integer | Net score |
ups | Integer | Number of upvotes |
downs | Integer | Number of downvotes (usually 0 due to vote fuzzing) |
created_utc | Number | Unix timestamp of creation |
edited | Boolean/Number | false if not edited, or Unix timestamp of last edit |
parent_id | String | Parent fullname (t3_postid for top-level, t1_commentid for replies) |
link_id | String | Parent post fullname (t3_postid) |
depth | Integer | Nesting depth (0 = top-level) |
is_submitter | Boolean | Whether the commenter is the OP |
permalink | String | Relative URL path to the comment |
controversiality | Integer | Whether the comment is controversial (0 or 1) |
distinguished | String/Null | moderator or admin if distinguished, null otherwise |
stickied | Boolean | Whether the comment is pinned |
collapsed | Boolean | Whether the comment is collapsed by default |
author_premium | Boolean | Whether the author has Reddit Premium |
subreddit | String | Subreddit name |
total_awards_received | Integer | Total awards on this comment |
all_awardings | Array | Detailed list of all awards |
replies | Array | Nested reply comments (same structure, recursively) |
Comment Fields (Lightweight Mode)
When lite_mode is enabled, each comment (including nested replies) contains only:
| Field | Type | Description |
|---|---|---|
id | String | The comment's unique ID |
parent_id | String | Parent fullname (t3_postid for top-level, t1_commentid for replies) |
author | String | Username of the comment author |
body | String | Comment text in Markdown |
created_utc | Number | Unix timestamp of creation |
ups | Integer | Number of upvotes |
replies | Array | Nested reply comments in the same lightweight structure |
Output Example
{"title": "People 40+, what actually mattered in the long run and what didn't?","author": "Psychological_Sky_58","subreddit": "AskReddit","score": 9965,"upvote_ratio": 0.95,"num_comments": 4987,"created_utc": 1771888793,"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/...","selftext": "","permalink": "/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/","over_18": false,"is_video": false,"comments": [{"author": "Right-Breakfast444","body": "Jesus Christ...that looks a little bit like Jason Bourne!","score": 63,"depth": 6,"created_utc": 1771938473,"parent_id": "t1_o74gb87","permalink": "/r/AskReddit/comments/1rcxhjq/.../o74ohw9/","replies": [{"author": "Alarming-Research-42","body": "lol.\nWe went from the Bourne Supremacy to Bourne Approximation.","score": 7,"depth": 7,"replies": []}]}],"success": true,"error": null}
Note: The example above shows key fields for readability. The actual output contains 80+ fields per post and per comment, as documented in the tables above.
Roadmap
Planned updates for this Actor:
- 100% comment accuracy — Not miss a single comment.
- Subreddit scraping — Input a subreddit URL to get all posts
- User profile scraping — Extract post and comment history from Reddit user profiles
- Speed improvements — Batch processing for faster runs
Integrations
JavaScript / TypeScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor("YOUR_ACTOR_ID").call({urls: [{ url: "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/" }],sort_type: "top",max_comments: null});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(post => {console.log(`${post.title} — ${post.num_comments} comments, score: ${post.score}`);console.log(`Comments extracted: ${post.comments.length}`);});
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("YOUR_ACTOR_ID").call(run_input={"urls": [{"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/"}],"sort_type": "top","max_comments": None})for post in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{post['title']} — {post['num_comments']} comments")for comment in post.get("comments", []):print(f" [{comment['score']}] {comment['author']}: {comment['body'][:80]}...")
FAQs for Reddit Scraper
How is this different from other Reddit scrapers?
Three key differences:
- Comment accuracy — Most scrapers miss deeply nested replies. This Actor follows the full comment tree.
- Rich data — You get 80+ fields per post/comment (the full data Reddit stores), not just what's visible on the webpage.
- No browser — Runs without Playwright or Puppeteer, which means lower costs and faster runs.
Does it scrape entire subreddits?
Currently, this Actor scrapes individual Reddit posts with their full comment threads. Subreddit-level scraping (listing all posts from a subreddit) is on the roadmap and coming soon.
Can it scrape user profiles?
User profile scraping is planned for a future update. Currently, the Actor extracts author information (username, flair, premium status) from within the posts and comments it scrapes.
Do I need Reddit API keys?
No. This Actor does not require any Reddit API keys, OAuth tokens, or authentication. It works out of the box.
Can I export Reddit data to CSV or Excel?
Yes. Export directly from the Apify console as JSON, CSV, Excel, or XML.
What does the sort_type parameter do?
It controls how comments are sorted before extraction, matching Reddit's own sort options:
top— Highest scored comments firstbest— Reddit's "best" algorithm (balances score and recency)new— Most recent comments firstcontroversial— Most debated comments firstold— Oldest comments firstqa— Q&A format (OP replies prioritized)
Is Reddit scraping legal?
This Actor accesses only publicly available Reddit data. Users are responsible for complying with applicable laws, Reddit's Terms of Service, and their own jurisdiction's data regulations.
Legal and Compliance
This Actor extracts only publicly available data from Reddit. Users are responsible for:
- Complying with Reddit's Terms of Service and API Terms
- Following applicable data protection laws (GDPR, CCPA, etc.)
- Using extracted data ethically and in accordance with their jurisdiction's regulations
- Not using the data to harass, dox, or target individual Reddit users
This Actor does not bypass authentication, access private subreddits, or extract data that requires a logged-in session.