Reddit Post Comments Scraper | Bulk Thread & Reply Export avatar

Reddit Post Comments Scraper | Bulk Thread & Reply Export

Pricing

from $2.99 / 1,000 posts

Go to Apify Store
Reddit Post Comments Scraper | Bulk Thread & Reply Export

Reddit Post Comments Scraper | Bulk Thread & Reply Export

Scrape Reddit posts with full comment trees. 6 sort orders, Q&A filtering, and deep sub-thread expansion. Bulk URLs, CSV upload, any format.

Pricing

from $2.99 / 1,000 posts

Rating

0.0

(0)

Developer

ClearPath

ClearPath

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

a day ago

Last modified

Share

Reddit Post & Comments Scraper | Bulk Thread Export (2026)

10,000 comments in under 2 minutes — full comment tree expansion, 6 sort orders, bulk processing.

Paste one post URL or upload thousands. Get back every comment, including deeply nested reply chains, sorted exactly how you need them.

Clearpath Reddit Suite   •  Search, analyze, and monitor Reddit at scale
 Post & Comments
➤ You are here
 Profile Scraper
Bulk profile & karma lookup
 User Content
Posts & comments history
 Answers API
Reddit answers for LLMs

Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start building with this data.

Reddit Post & Comments Scraper (clearpath/reddit-post-comments-bulk-scraper) on Apify scrapes Reddit posts with full comment trees in bulk. Processes multiple posts concurrently. Returns structured data: post metadata (title, subreddit, comment count) plus every comment with author, score, timestamp, depth, parent ID, body text, flair, stickied/locked status, and child count. Supports 6 sort orders: best, top, new, old, controversial, Q&A. Q&A/AMA filtering: answered-by-OP only, or unanswered only. Optional deep expansion recursively opens all collapsed reply chains to capture every comment in a thread. Input: single post URL, array of URLs, or uploaded CSV/TXT file. Accepts any format: full URL, short link (redd.it/abc123), or post ID. Max comments per post configurable (default 200, 0 for unlimited). Output: JSON array, one object per post/comment. Pricing: $2.99 per 1,000 posts + $1.00 per 1,000 comments (PPE). Free tier: 5 lifetime runs, 50 comments per run. Apify token required.

Key Features

  • Fast bulk processing — processes multiple posts concurrently, so bulk runs finish in minutes instead of hours
  • Full comment tree expansion — optionally scrape every comment in a thread, including deeply nested reply chains that Reddit collapses behind "load more" links
  • 6 sort orders — best, top, new, old, controversial, Q&A. Each sort order returns a different ranking of the same comments
  • Q&A/AMA filtering — isolate comments answered by the original poster, or find unanswered questions only. Built for extracting structured knowledge from AMA threads
  • Bulk input, any format — paste URLs one by one, use bulk edit to paste a list, or upload a CSV/TXT file with thousands of post links. Accepts full URLs, short links, and bare post IDs

How to Scrape Reddit Post Comments

Single post, default settings

Paste any Reddit post URL. The actor fetches the post metadata and up to 200 comments sorted by best.

{
"postUrl": "https://www.reddit.com/r/AskReddit/comments/1k2vnz0/"
}

Short links and bare IDs also work:

{
"postUrl": "redd.it/1k2vnz0"
}
{
"postUrl": "1k2vnz0"
}

You can use any of these formats interchangeably. The actor normalizes all inputs before processing.

All comments, sorted by top

Set maxCommentsPerPost to 0 and enable expandAllComments to get the complete comment tree. Use sort to control the ordering.

{
"postUrl": "https://www.reddit.com/r/SeveranceAppleTVPlus/comments/1k2vnz0/",
"sort": "top",
"maxCommentsPerPost": 0,
"expandAllComments": true
}

This configuration captures every single comment in the thread, including replies nested 10+ levels deep that Reddit hides behind "continue this thread" links. Expect longer run times for threads with thousands of comments.

Multiple posts in one run

Use the postUrls array for small batches. Mix any URL format freely.

{
"postUrls": [
"https://www.reddit.com/r/AskReddit/comments/1k2vnz0/",
"redd.it/abc123",
"t3_xyz789"
],
"sort": "new",
"maxCommentsPerPost": 500
}

The actor processes posts in parallel, so adding more posts doesn't proportionally increase run time.

Bulk from file

Upload a .txt or .csv file through the Apify Console, or point to a hosted file URL. TXT files expect one URL per line. CSV files auto-detect a url or permalink column.

{
"postUrlsFile": "https://example.com/my-post-urls.csv",
"sort": "best",
"maxCommentsPerPost": 100
}

You can also drag and drop a file directly into the "Post URLs file" field in the Apify Console. This is the fastest way to run large batches without writing any code.

Q&A/AMA thread: only OP answers

Use sort: "qa" with filter: "answered" to get only comments that the original poster replied to. This turns a sprawling AMA with thousands of comments into a clean, structured Q&A dataset.

{
"postUrl": "https://www.reddit.com/r/IAmA/comments/abc123/",
"sort": "qa",
"filter": "answered",
"maxCommentsPerPost": 0,
"expandAllComments": true
}

Find unanswered questions

The opposite of the above. Set filter: "unanswered" to find questions in a Q&A thread that the OP never responded to.

{
"postUrl": "https://www.reddit.com/r/IAmA/comments/abc123/",
"sort": "qa",
"filter": "unanswered"
}

Controversial comments only

Get comments sorted by controversy. Reddit's controversial algorithm surfaces comments with a roughly equal number of upvotes and downvotes.

{
"postUrl": "https://www.reddit.com/r/politics/comments/abc123/",
"sort": "controversial",
"maxCommentsPerPost": 50
}

Chronological order

Sort by old to get comments in the order they were posted. Useful for analyzing how a discussion evolved over time.

{
"postUrl": "https://www.reddit.com/r/worldnews/comments/abc123/",
"sort": "old",
"maxCommentsPerPost": 0,
"expandAllComments": true
}

Input Parameters

ParameterTypeDefaultDescription
postUrlstringA single Reddit post URL, short link (redd.it/abc), or post ID
postUrlsstring[][]Multiple post URLs or IDs. Use "Bulk edit" to paste a list
postUrlsFilestringUpload a .txt/.csv file, or paste a URL to a hosted file
sortenumbestComment sort order: best, top, new, old, controversial, qa
filterenum(all)Comment filter: all comments, answered (OP replies only), unanswered
maxCommentsPerPostinteger200Max comments to scrape per post. Set to 0 for unlimited
expandAllCommentsbooleanfalseRecursively expand every collapsed reply chain. Slower but captures the full tree

At least one of postUrl, postUrls, or postUrlsFile is required.

Sort order reference

ValueBehavior
bestReddit's default. Confidence-weighted ranking that balances score and vote count
topHighest score first (upvotes minus downvotes)
newMost recent comments first
oldOldest comments first (chronological)
controversialComments with roughly equal upvotes and downvotes
qaPrioritizes comments from the original poster. Combine with filter for Q&A extraction

What Data Can You Extract from Reddit Posts?

Output Example

The output contains two row types: post rows and comment rows. Each post URL you provide produces one post row followed by its comment rows. The _post_id field links comments back to their parent post, so you can easily group and filter results when scraping multiple posts.

Post row

One post row per URL, containing the thread metadata:

{
"_type": "post",
"_post_id": "t3_1k2vnz0",
"_status": "found",
"title": "Mark and Helly's relationship is kinda strange",
"subreddit": "SeveranceAppleTVPlus",
"commentCount": 227
}

Comment row (top-level)

Top-level comments have depth: 0 and parentId: null. These are direct replies to the post.

{
"_type": "comment",
"_post_id": "t3_1k2vnz0",
"_status": "found",
"id": "t1_mnx5qs6",
"author": "leninzen",
"score": 1202,
"createdAt": "2025-04-19T13:03:54.022000+0000",
"editedAt": null,
"depth": 0,
"parentId": null,
"permalink": "/r/SeveranceAppleTVPlus/comments/1k2vnz0/.../mnx5qs6/",
"body": "You do see it slowly build tbh. Especially after Mark's response to Helly's hanging attempt...",
"isStickied": false,
"isLocked": false,
"isScoreHidden": false,
"distinguishedAs": null,
"authorFlair": null,
"isDeleted": false,
"childCount": 11
}

Comment row (nested reply)

Replies have depth > 0 and a parentId pointing to the comment they're replying to. You can reconstruct the full thread tree from these two fields.

{
"_type": "comment",
"_post_id": "t3_1k2vnz0",
"_status": "found",
"id": "t1_mnxbiet",
"author": "Lanky_Perception_136",
"score": 408,
"createdAt": "2025-04-19T13:42:11.000000+0000",
"editedAt": null,
"depth": 1,
"parentId": "t1_mnx5qs6",
"permalink": "/r/SeveranceAppleTVPlus/comments/1k2vnz0/.../mnxbiet/",
"body": "Like her slight smirk when he tells her he's happy she's here...",
"isStickied": false,
"isLocked": false,
"isScoreHidden": false,
"distinguishedAs": null,
"authorFlair": "Mr. Milkshake",
"isDeleted": false,
"childCount": 2
}

Full field reference

Post fields:

FieldTypeDescription
_typestringAlways "post"
_post_idstringReddit post ID (t3_ prefixed)
_statusstringfound, not_found, or unavailable
titlestringPost title text
subredditstringSubreddit name without the r/ prefix
commentCountintegerTotal comment count reported by Reddit

Comment fields:

FieldTypeDescription
_typestringAlways "comment"
_post_idstringParent post ID (t3_ prefixed), links this comment to its post
_statusstringfound, not_found, or unavailable
idstringComment ID (t1_ prefixed)
authorstringUsername of the commenter, or null for deleted comments
scoreintegerUpvotes minus downvotes
createdAtstringISO 8601 creation timestamp
editedAtstring/nullISO 8601 edit timestamp, or null if never edited
depthintegerNesting depth. 0 = top-level reply to the post, 1 = reply to a top-level comment, etc.
parentIdstring/nullParent comment ID (t1_ prefixed), or null for top-level comments
permalinkstringRelative URL path to the comment on Reddit
bodystringComment text in markdown format
isStickiedbooleantrue if pinned by a moderator
isLockedbooleantrue if replies are disabled
isScoreHiddenbooleantrue if the score is hidden by subreddit rules (usually for recent comments)
distinguishedAsstring/null"moderator", "admin", or null for regular users
authorFlairstring/nullUser's flair text in the subreddit, or null
isDeletedbooleantrue if the comment was deleted by the author or removed by moderators
childCountintegerNumber of direct replies to this comment

Pricing — Pay Per Event (PPE)

$2.99 per 1,000 posts  •  $1.00 per 1,000 comments

Charged separately for posts and comments. A run scraping 5 posts with 200 comments each costs ~$1.02.

Use Cases

Sentiment analysis. Scrape comments from product launch posts, company announcements, or brand mentions to analyze public opinion. The score field provides a built-in signal for community agreement, and depth/parentId let you analyze how discussions branch.

Training data for LLMs. Extract large volumes of structured discussion data. Comments include the full reply chain hierarchy, so you can build conversation trees for fine-tuning dialogue models. The Q&A filter is especially useful for extracting clean question-answer pairs from AMA threads.

Market research. Monitor how Reddit communities discuss products, services, or trends. Scrape comments from relevant subreddit posts to understand what real users think, what complaints come up repeatedly, and what features people request.

Academic research. Study online discourse patterns, community dynamics, or information propagation. The chronological sort (old) combined with depth and parentId lets you reconstruct exactly how conversations developed over time.

Content curation. Extract top-rated comments from popular threads to curate highlights, summaries, or "best of" collections. Sort by top and set a low maxCommentsPerPost to get only the highest-rated responses.

Competitive intelligence. Track discussions about competitors, industry news, or market events across multiple subreddits. Upload a CSV of relevant post URLs and scrape them all in one run.

FAQ

How many comments can I scrape per post? Set maxCommentsPerPost to 0 and enable expandAllComments to get every comment in a thread. Reddit posts can have tens of thousands of comments. The actor handles all of them, including deeply nested reply chains.

What does "Expand all sub-threads" actually do? Reddit collapses deeply nested reply chains and shows "load more comments" links. When expandAllComments is enabled, the actor recursively opens every one of these collapsed chains so you get the complete comment tree. When disabled, you get the comments visible on the default page load, which is faster but incomplete for threads with many replies.

How fast is it? Posts are processed in parallel. A single post with 200 comments finishes in about 3 seconds. Bulk runs with 100 posts at default settings complete in about a minute. Full expansion takes longer proportional to the number of collapsed reply chains.

What URL formats are accepted? Full URLs (reddit.com/r/sub/comments/abc123/title), short links (redd.it/abc123), bare post IDs (abc123), and prefixed IDs (t3_abc123). You can mix formats freely in the same run. The actor normalizes everything before processing.

What happens if a post is deleted or private? Deleted, removed, and private posts are included in the output with "_status": "not_found" or "_status": "unavailable". You're only charged for posts and comments that were successfully scraped.

How does the Q&A filter work? Set sort to qa and filter to answered to get only comment threads where the original poster replied. Set filter to unanswered to get threads with no OP response. This works best on AMA threads and support posts where the OP actively responds to questions.

Can I reconstruct the comment tree from the output? Yes. Every comment includes depth (0 for top-level, 1 for first reply, etc.) and parentId (the ID of the comment it replies to, or null for top-level). Walk these two fields to rebuild the full tree structure in any programming language.

What's the difference between "best" and "top" sort? "Top" ranks comments purely by score (upvotes minus downvotes). "Best" uses a confidence-weighted algorithm that accounts for both score and vote count. This means newer comments with fewer but mostly positive votes can rank above older comments with more total votes. "Best" is Reddit's default for good reason: it surfaces quality content that hasn't had time to accumulate raw vote counts.

How are deleted comments handled? Deleted comments appear in the output with isDeleted: true, author: null, and body containing "[deleted]" or "[removed]". They're still part of the comment tree and preserve the thread structure. They count toward your comment total and billing.

Is there a limit on how many posts I can scrape in one run? No hard limit. You can process thousands of posts in a single run. The actor scales linearly — doubling the number of posts roughly doubles the run time, not more.

Can I use this with the Apify API or integrations? Yes. Call the actor via the Apify API, schedule recurring runs, or connect it to integrations (webhooks, Zapier, Make, Google Sheets). The output is standard JSON that works with any downstream pipeline.

What's the output format? A flat JSON array. Post rows and comment rows are interleaved: first a post row, then all its comments, then the next post row, and so on. Each row has a _type field ("post" or "comment") and a _post_id field that links comments to their parent post.

Support

Extracts publicly available data. Users must comply with Reddit terms and data protection regulations (GDPR, CCPA).


Bulk Reddit post and comment extraction. Full trees, sorted and filtered, from one URL or thousands.