Reddit Thread & Comments Scraper
Pricing
from $3.00 / 1,000 results
Reddit Thread & Comments Scraper
Scrape any Reddit post and its complete comment thread — including deeply nested replies — in seconds. Supports bulk URLs, cursor-based pagination for large threads, flat or nested output, score filtering, and depth capping. Perfect for sentiment analysis, AI training data, and community research.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Datara
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Extract any Reddit post and its full comment tree — including nested replies at any depth — in seconds. Supports cursor-based pagination for threads with hundreds or thousands of comments.
What This Actor Does
Given one or more Reddit post URLs, this actor:
- Fetches the post metadata (title, author, score, upvote ratio, subscriber count, etc.)
- Fetches the full comment tree including nested replies, paginating through all available comment pages
- Pushes each post and comment as a clean, structured dataset record
Output records are typed (recordType: "post" or "comment") and immediately usable in spreadsheets, databases, AI pipelines, or downstream automations.
Use Cases
- Sentiment analysis — analyse how communities respond to products, brands, or announcements
- AI training data — collect high-quality human conversation threads for LLM fine-tuning or RLHF
- Community research — surface recurring themes, pain points, and opinions across subreddits
- Qualitative market research — understand what real users say about your category
- Content strategy — identify high-scoring discussions to inform editorial direction
Input Fields
| Field | Type | Default | Description |
|---|---|---|---|
postUrl | string | — | Single Reddit post URL to scrape |
postUrls | array | [] | List of Reddit post URLs for bulk scraping (overrides postUrl) |
maxPages | integer | 3 | Max comment pages to fetch per post (cursor pagination, ~25 comments/page) |
flattenComments | boolean | true | Output comments as individual flat records (true) or with nested replies arrays (false) |
includePostRecord | boolean | true | Include the post as a separate dataset record |
minCommentScore | integer | 0 | Skip comments below this score threshold |
maxCommentDepth | integer | 10 | Maximum reply nesting depth to include (0 = top-level only) |
maxCommentsPerPost | integer | 200 | Cap on total comment records per post (1–5000) |
Bulk mode: If
postUrlsis non-empty, the singlepostUrlfield is ignored. Duplicate URLs are automatically deduplicated.
Single URL Example Input
{"postUrl": "https://www.reddit.com/r/startups/comments/1abc23/we_just_hit_10k_mrr_heres_what_worked/","maxPages": 5,"flattenComments": true,"includePostRecord": true,"minCommentScore": 5,"maxCommentDepth": 5,"maxCommentsPerPost": 500}
Bulk URL Example Input
{"postUrls": ["https://www.reddit.com/r/SaaS/comments/1abc11/thoughts_on_pricing_models/","https://www.reddit.com/r/entrepreneur/comments/1abc22/bootstrapped_to_1m_ama/","https://www.reddit.com/r/startups/comments/1abc33/why_we_shut_down/"],"maxPages": 3,"flattenComments": true,"includePostRecord": true,"minCommentScore": 2,"maxCommentDepth": 10,"maxCommentsPerPost": 300}
Output Schema
Post Record (recordType: "post")
| Field | Type | Description |
|---|---|---|
recordType | string | Always "post" |
id | string | Reddit short ID (e.g. 1lfbo7u) |
name | string | Reddit fullname, prefixed t3_ (e.g. t3_1lfbo7u) |
title | string | Post title |
author | string | Username of the poster |
authorFullname | string | Reddit internal author ID (e.g. t2_16syu27ar1) |
subreddit | string | Subreddit name (without r/) |
url | string | Full post URL |
score | integer | Net vote score |
ups | integer | Upvote count (fuzzy-rounded by Reddit) |
downs | integer | Downvote count (almost always 0) |
upvoteRatio | number | Ratio of upvotes to total votes (0–1) |
numComments | integer | Total comment count as reported by Reddit |
subredditSubscribers | integer | Subscriber count of the subreddit |
isVideo | boolean | True if the post contains a Reddit-hosted video |
totalAwardsReceived | integer | Number of Reddit awards |
createdUtc | number | Unix timestamp (UTC seconds) of post creation |
createdAt | string | ISO 8601 datetime of post creation |
scrapedAt | string | ISO 8601 datetime when the record was scraped |
Example post record:
{"recordType": "post","id": "1lfbo7u","name": "t3_1lfbo7u","title": "What is a thing you love that lots of people hate?","author": "Vetro_Nodulare2","authorFullname": "t2_16syu27ar1","subreddit": "AskReddit","url": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/","score": 47,"ups": 47,"downs": 0,"upvoteRatio": 0.91,"numComments": 353,"subredditSubscribers": 56146601,"isVideo": false,"totalAwardsReceived": 0,"createdUtc": 1750341959,"createdAt": "2025-06-19T14:05:59.000Z","scrapedAt": "2025-06-20T09:45:12.000Z"}
Comment Record (recordType: "comment")
| Field | Type | Description |
|---|---|---|
recordType | string | Always "comment" |
id | string | Reddit short ID (e.g. mymupxb) |
name | string | Reddit fullname, prefixed t1_ (e.g. t1_mymupxb) |
postId | string | Short ID of the parent post |
postUrl | string | Full URL of the parent post |
author | string | Username of the commenter |
authorFullname | string | Reddit internal author ID |
body | string | Plain-text comment body |
score | integer | Net vote score |
ups | integer | Upvote count |
downs | integer | Downvote count |
depth | integer | Nesting depth (0 = top-level, 1 = reply to top-level, etc.) |
parentId | string | Fullname of the parent (t3_... if replying to post, t1_... if replying to comment) |
linkId | string | Fullname of the parent post (always t3_...) |
subreddit | string | Subreddit name |
url | string | Full URL of this comment |
permalink | string | Relative permalink path |
gilded | integer | Number of times gilded |
stickied | boolean | True if pinned by a moderator |
locked | boolean | True if the comment thread is locked |
archived | boolean | True if too old to receive votes |
controversiality | integer | 0 or 1; 1 = high vote split |
totalAwardsReceived | integer | Number of Reddit awards |
createdUtc | number | Unix timestamp (UTC seconds) of comment creation |
createdAt | string | ISO 8601 datetime of comment creation |
scrapedAt | string | ISO 8601 datetime when scraped |
replies | array | [Nested mode only] Child comment records |
Example comment record (flat mode):
{"recordType": "comment","id": "mymupxb","name": "t1_mymupxb","postId": "1lfbo7u","postUrl": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/","author": "Background-Emu-2890","authorFullname": "t2_efdlposp6","body": "Black cat — I have one and I love her so much!","score": 75,"ups": 75,"downs": 0,"depth": 0,"parentId": "t3_1lfbo7u","linkId": "t3_1lfbo7u","subreddit": "AskReddit","url": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/mymupxb/","permalink": "/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/mymupxb/","gilded": 0,"stickied": false,"locked": false,"archived": false,"controversiality": 0,"totalAwardsReceived": 0,"createdUtc": 1750342221,"createdAt": "2025-06-19T14:10:21.000Z","scrapedAt": "2025-06-20T09:45:12.000Z"}
Flat vs Nested Comment Output
Flat mode (flattenComments: true, default):
- Every comment and reply is pushed as its own dataset record
- Use
depthto understand nesting level (0 = top-level) - Use
parentIdto reconstruct the tree (t1_XXX= parent comment,t3_XXX= direct reply to post) - Best for: spreadsheet analysis, databases, ML pipelines, CSV export
Nested mode (flattenComments: false):
- Top-level comment records contain a
repliesarray with child records embedded - Each child also contains its own
repliesarray, forming a full tree - Best for: JSON tree processing, displaying thread structure
Pagination
The ScrapeCreators API uses cursor-based pagination. Each page returns approximately 25 top-level comments. Set maxPages to control how many pages to fetch:
maxPages: 1— fast, ~25 top-level commentsmaxPages: 5— ~125 top-level comments + all their repliesmaxPages: 20— comprehensive extraction for large threads
The actor stops pagination early if maxCommentsPerPost is reached or the API signals no more pages available.
Error Handling
- Failed URLs push an error record (
error: true) and processing continues for remaining URLs - Posts with no comments push a warning record
- The dataset always contains at least one record per run
Pricing
This actor uses Pay Per Event (PPE) pricing:
- $0.30 per 100 records (each post and each comment count as one record)
- A thread with 1 post + 199 comments = 200 records ≈ $0.60
- Bulk run: 10 threads × 200 comments = ~2,010 records ≈ $6.03
Support
For questions or feature requests, contact the actor publisher via the Apify Store messaging system.