Reddit Post & Comment Scraper avatar

Reddit Post & Comment Scraper

Pricing

$10.00/month + usage

Go to Apify Store
Reddit Post & Comment Scraper

Reddit Post & Comment Scraper

Scrape unlimited comments from any posts with 99% accuracy (highest of the Apify Store). Input any Reddit post URL and get complete, rich JSON data, including deeply nested comment threads, scores, author details, and awards. Comments tree is already built for you.

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

Ion Belei

Ion Belei

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 hours ago

Last modified

Share

Reddit Post & Comment Scraper — Scrape Reddit with 99% Accuracy

Scrape unlimited comments from any posts with the highest comment accuracy on the Apify Store. Input any Reddit post URL and get complete, rich JSON data, including deeply nested comment threads, scores, author details, and awards.

Unlike other Reddit scrapers, this Actor does not use a browser. That means lower compute costs for you and faster extraction times. You get the full data Reddit has on a post, not just what's visible on the web page, in tree formated way just like on reddit.

Why This Reddit Scraper?

FeatureThis ActorMost Actors
Comment scraping accuracy99% (100% for posts under 500 comments)40-80% (miss nested replies)
Data depthFull Reddit JSON (80+ fields per post)Surface-level web scrape
Browser requiredNo (lightweight HTTP)Yes (Playwright/Puppeteer)
Compute costLow3-5x higher
Comment TreeAlready CreatedMissing, need to do it yourself

What Data Can You Extract from Reddit?

Each scraped post includes 80+ data fields, far more than what you see on the Reddit website.

Post Data

  • Content: title, selftext, URL, domain, permalink
  • Metrics: score, upvotes, downvotes, upvote_ratio, num_comments, num_crossposts
  • Author: username, author_fullname, author_premium, author_flair
  • Subreddit: subreddit name, subscriber count, subreddit type
  • Metadata: created_utc, edited, archived, locked, spoiler, over_18 (NSFW)
  • Awards: all_awardings, total_awards_received, gilded
  • Media: media, media_embed, is_video, thumbnail

Comment Data (nested with full reply threads)

  • Content: body, body_html
  • Metrics: score, ups, downs, controversiality
  • Author: author, author_fullname, author_premium, author_flair
  • Structure: parent_id, depth, link_id, is_submitter
  • Metadata: created_utc, edited, stickied, distinguished, collapsed
  • Nested replies: Full recursive reply threads preserved in tree structure

Use Cases of the Reddit Scraper

  • Sentiment analysis — Extract thousands of comments with scores and metadata to analyze public opinion on any topic, product, or brand.
  • AI/ML training data — Build training datasets from Reddit's rich comment threads for NLP models, chatbots, and language research.
  • Brand monitoring — Track what people say about your brand or product by scraping comment threads from relevant subreddits.
  • Market research — Analyze discussions in niche subreddits to understand customer pain points, feature requests, and competitor perception.
  • Content research — Find high-engagement discussions and trending topics to fuel your content strategy.
  • Academic research — Collect structured Reddit data for social science, linguistics, and behavioral research.

How to Scrape Reddit Posts and Comments

Step 1: Prepare Your Input

Add Reddit post URLs to the input. You can scrape one post or many at once.

{
"urls": [
{
"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/"
},
{
"url": "https://www.reddit.com/r/technology/comments/example_post/"
}
],
"sort_type": "top",
"max_comments": null,
"lite_mode": false
}

Step 2: Configure Options

ParameterTypeDefaultDescription
urlsArrayRequiredList of Reddit post URLs to scrape.
sort_typeString"top"How to sort comments: top, best, new, controversial, old, qa.
max_commentsNumbernullMaximum number of comments to extract per post. Set to null for all comments.
lite_modeBooleanfalseIf true, returns lightweight comment objects with only core visible fields.

Step 3: Run and Export

Click Start to begin scraping. Once finished, go to the Dataset tab to view results. Export options:

  • JSON — Full structured data, ideal for programmatic use
  • CSV — Flattened data for spreadsheets (note: nested comments are best consumed as JSON)
  • Excel — Direct download for quick analysis
  • XML — For XML-based workflows
  • API — Access results programmatically via the Apify REST API

Output Schema

Each dataset item represents one scraped Reddit post with its complete comment tree.

  • Default mode (lite_mode: false): returns full Reddit JSON fields for post + comments.
  • Lightweight mode (lite_mode: true): keeps full post fields but returns reduced comment objects.

Additional fields may appear depending on the post type.

Post Fields

FieldTypeDescription
titleStringThe title of the Reddit post
authorStringUsername of the post author
author_fullnameStringReddit's internal fullname (e.g., t2_abc123)
author_premiumBooleanWhether the author has Reddit Premium
subredditStringSubreddit name (without r/ prefix)
subreddit_idStringReddit's internal ID for the subreddit
subreddit_subscribersIntegerNumber of subscribers in the subreddit
idStringThe post's unique ID
nameStringThe post's fullname (e.g., t3_abc123)
selftextStringPost text content (empty string for link posts)
selftext_htmlString/NullHTML-rendered version of the post text
scoreIntegerNet score (upvotes minus downvotes)
upsIntegerNumber of upvotes
upvote_ratioNumberRatio of upvotes to total votes (0.0–1.0)
num_commentsIntegerTotal comment count (as reported by Reddit)
created_utcNumberUnix timestamp of creation
editedBoolean/Numberfalse if not edited, or Unix timestamp of last edit
permalinkStringRelative URL path to the post
urlStringThe original post URL or the submitted link URL
domainStringDomain of the URL (e.g., self.AskReddit, i.imgur.com)
over_18BooleanWhether the post is NSFW
spoilerBooleanWhether the post is a spoiler
lockedBooleanWhether the post is locked
archivedBooleanWhether the post is archived
stickiedBooleanWhether the post is pinned
is_videoBooleanWhether the post contains a Reddit-hosted video
thumbnailStringThumbnail URL or keyword (self, default, nsfw)
total_awards_receivedIntegerTotal number of awards received
all_awardingsArrayDetailed list of all awards
gildedIntegerNumber of times gilded
link_flair_textString/NullPost flair tag text
num_crosspostsIntegerNumber of crossposts
commentsArrayFull nested comment tree (see below)
successBooleanWhether the post was successfully scraped
errorString/NullError message if scraping failed, null otherwise

Conditional Post Fields

These fields appear depending on the post type:

FieldTypeWhen present
mediaObject/NullVideo/embed posts — structure varies by media type (Reddit video, YouTube embed, etc.)
media_embedObjectPosts with embedded media
previewObjectImage/link posts — contains images array with source and resolutions
post_hintStringCertain post types: image, link, hosted:video, rich:video, self
url_overridden_by_destStringLink posts — the destination URL
crosspost_parent_listArrayCrossposted content — includes the original post data

Comment Fields (Default Mode)

Each comment in the comments array contains these fields, plus a replies array with the same structure (recursively for the full thread):

FieldTypeDescription
idStringThe comment's unique ID
authorStringUsername of the comment author
author_fullnameStringReddit's internal fullname for the author
bodyStringComment text in Markdown
body_htmlStringHTML-rendered comment body
scoreIntegerNet score
upsIntegerNumber of upvotes
downsIntegerNumber of downvotes (usually 0 due to vote fuzzing)
created_utcNumberUnix timestamp of creation
editedBoolean/Numberfalse if not edited, or Unix timestamp of last edit
parent_idStringParent fullname (t3_postid for top-level, t1_commentid for replies)
link_idStringParent post fullname (t3_postid)
depthIntegerNesting depth (0 = top-level)
is_submitterBooleanWhether the commenter is the OP
permalinkStringRelative URL path to the comment
controversialityIntegerWhether the comment is controversial (0 or 1)
distinguishedString/Nullmoderator or admin if distinguished, null otherwise
stickiedBooleanWhether the comment is pinned
collapsedBooleanWhether the comment is collapsed by default
author_premiumBooleanWhether the author has Reddit Premium
subredditStringSubreddit name
total_awards_receivedIntegerTotal awards on this comment
all_awardingsArrayDetailed list of all awards
repliesArrayNested reply comments (same structure, recursively)

Comment Fields (Lightweight Mode)

When lite_mode is enabled, each comment (including nested replies) contains only:

FieldTypeDescription
idStringThe comment's unique ID
parent_idStringParent fullname (t3_postid for top-level, t1_commentid for replies)
authorStringUsername of the comment author
bodyStringComment text in Markdown
created_utcNumberUnix timestamp of creation
upsIntegerNumber of upvotes
repliesArrayNested reply comments in the same lightweight structure

Output Example

{
"title": "People 40+, what actually mattered in the long run and what didn't?",
"author": "Psychological_Sky_58",
"subreddit": "AskReddit",
"score": 9965,
"upvote_ratio": 0.95,
"num_comments": 4987,
"created_utc": 1771888793,
"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/...",
"selftext": "",
"permalink": "/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/",
"over_18": false,
"is_video": false,
"comments": [
{
"author": "Right-Breakfast444",
"body": "Jesus Christ...that looks a little bit like Jason Bourne!",
"score": 63,
"depth": 6,
"created_utc": 1771938473,
"parent_id": "t1_o74gb87",
"permalink": "/r/AskReddit/comments/1rcxhjq/.../o74ohw9/",
"replies": [
{
"author": "Alarming-Research-42",
"body": "lol.\nWe went from the Bourne Supremacy to Bourne Approximation.",
"score": 7,
"depth": 7,
"replies": []
}
]
}
],
"success": true,
"error": null
}

Note: The example above shows key fields for readability. The actual output contains 80+ fields per post and per comment, as documented in the tables above.

Roadmap

Planned updates for this Actor:

  • 100% comment accuracy — Not miss a single comment.
  • Subreddit scraping — Input a subreddit URL to get all posts
  • User profile scraping — Extract post and comment history from Reddit user profiles
  • Speed improvements — Batch processing for faster runs

Integrations

JavaScript / TypeScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor("YOUR_ACTOR_ID").call({
urls: [
{ url: "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/" }
],
sort_type: "top",
max_comments: null
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(post => {
console.log(`${post.title}${post.num_comments} comments, score: ${post.score}`);
console.log(`Comments extracted: ${post.comments.length}`);
});

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_ACTOR_ID").call(run_input={
"urls": [
{"url": "https://www.reddit.com/r/AskReddit/comments/1rcxhjq/people_40_what_actually_mattered_in_the_long_run/"}
],
"sort_type": "top",
"max_comments": None
})
for post in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{post['title']}{post['num_comments']} comments")
for comment in post.get("comments", []):
print(f" [{comment['score']}] {comment['author']}: {comment['body'][:80]}...")

FAQs for Reddit Scraper

How is this different from other Reddit scrapers?

Three key differences:

  1. Comment accuracy — Most scrapers miss deeply nested replies. This Actor follows the full comment tree.
  2. Rich data — You get 80+ fields per post/comment (the full data Reddit stores), not just what's visible on the webpage.
  3. No browser — Runs without Playwright or Puppeteer, which means lower costs and faster runs.

Does it scrape entire subreddits?

Currently, this Actor scrapes individual Reddit posts with their full comment threads. Subreddit-level scraping (listing all posts from a subreddit) is on the roadmap and coming soon.

Can it scrape user profiles?

User profile scraping is planned for a future update. Currently, the Actor extracts author information (username, flair, premium status) from within the posts and comments it scrapes.

Do I need Reddit API keys?

No. This Actor does not require any Reddit API keys, OAuth tokens, or authentication. It works out of the box.

Can I export Reddit data to CSV or Excel?

Yes. Export directly from the Apify console as JSON, CSV, Excel, or XML.

What does the sort_type parameter do?

It controls how comments are sorted before extraction, matching Reddit's own sort options:

  • top — Highest scored comments first
  • best — Reddit's "best" algorithm (balances score and recency)
  • new — Most recent comments first
  • controversial — Most debated comments first
  • old — Oldest comments first
  • qa — Q&A format (OP replies prioritized)

This Actor accesses only publicly available Reddit data. Users are responsible for complying with applicable laws, Reddit's Terms of Service, and their own jurisdiction's data regulations.

This Actor extracts only publicly available data from Reddit. Users are responsible for:

  • Complying with Reddit's Terms of Service and API Terms
  • Following applicable data protection laws (GDPR, CCPA, etc.)
  • Using extracted data ethically and in accordance with their jurisdiction's regulations
  • Not using the data to harass, dox, or target individual Reddit users

This Actor does not bypass authentication, access private subreddits, or extract data that requires a logged-in session.