Pricing

from $3.00 / 1,000 results

Reddit Thread & Comments Scraper

Scrape any Reddit post and its complete comment thread — including deeply nested replies — in seconds. Supports bulk URLs, cursor-based pagination for large threads, flat or nested output, score filtering, and depth capping. Perfect for sentiment analysis, AI training data, and community research.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Datara

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

What This Actor Does

Given one or more Reddit post URLs, this actor:

Fetches the post metadata (title, author, score, upvote ratio, subscriber count, etc.)
Fetches the full comment tree including nested replies, paginating through all available comment pages
Pushes each post and comment as a clean, structured dataset record

Output records are typed (recordType: "post" or "comment") and immediately usable in spreadsheets, databases, AI pipelines, or downstream automations.

Use Cases

Sentiment analysis — analyse how communities respond to products, brands, or announcements
AI training data — collect high-quality human conversation threads for LLM fine-tuning or RLHF
Community research — surface recurring themes, pain points, and opinions across subreddits
Qualitative market research — understand what real users say about your category
Content strategy — identify high-scoring discussions to inform editorial direction

Input Fields

Field	Type	Default	Description
`postUrl`	string	—	Single Reddit post URL to scrape
`postUrls`	array	`[]`	List of Reddit post URLs for bulk scraping (overrides `postUrl`)
`maxPages`	integer	`3`	Max comment pages to fetch per post (cursor pagination, ~25 comments/page)
`flattenComments`	boolean	`true`	Output comments as individual flat records (true) or with nested replies arrays (false)
`includePostRecord`	boolean	`true`	Include the post as a separate dataset record
`minCommentScore`	integer	`0`	Skip comments below this score threshold
`maxCommentDepth`	integer	`10`	Maximum reply nesting depth to include (0 = top-level only)
`maxCommentsPerPost`	integer	`200`	Cap on total comment records per post (1–5000)

Bulk mode: If postUrls is non-empty, the single postUrl field is ignored. Duplicate URLs are automatically deduplicated.

Single URL Example Input

{
  "postUrl": "https://www.reddit.com/r/startups/comments/1abc23/we_just_hit_10k_mrr_heres_what_worked/",
  "maxPages": 5,
  "flattenComments": true,
  "includePostRecord": true,
  "minCommentScore": 5,
  "maxCommentDepth": 5,
  "maxCommentsPerPost": 500
}

Bulk URL Example Input

{
  "postUrls": [
    "https://www.reddit.com/r/SaaS/comments/1abc11/thoughts_on_pricing_models/",
    "https://www.reddit.com/r/entrepreneur/comments/1abc22/bootstrapped_to_1m_ama/",
    "https://www.reddit.com/r/startups/comments/1abc33/why_we_shut_down/"
  ],
  "maxPages": 3,
  "flattenComments": true,
  "includePostRecord": true,
  "minCommentScore": 2,
  "maxCommentDepth": 10,
  "maxCommentsPerPost": 300
}

Output Schema

Post Record (`recordType: "post"`)

Field	Type	Description
`recordType`	string	Always `"post"`
`id`	string	Reddit short ID (e.g. `1lfbo7u`)
`name`	string	Reddit fullname, prefixed `t3_` (e.g. `t3_1lfbo7u`)
`title`	string	Post title
`author`	string	Username of the poster
`authorFullname`	string	Reddit internal author ID (e.g. `t2_16syu27ar1`)
`subreddit`	string	Subreddit name (without `r/`)
`url`	string	Full post URL
`score`	integer	Net vote score
`ups`	integer	Upvote count (fuzzy-rounded by Reddit)
`downs`	integer	Downvote count (almost always 0)
`upvoteRatio`	number	Ratio of upvotes to total votes (0–1)
`numComments`	integer	Total comment count as reported by Reddit
`subredditSubscribers`	integer	Subscriber count of the subreddit
`isVideo`	boolean	True if the post contains a Reddit-hosted video
`totalAwardsReceived`	integer	Number of Reddit awards
`createdUtc`	number	Unix timestamp (UTC seconds) of post creation
`createdAt`	string	ISO 8601 datetime of post creation
`scrapedAt`	string	ISO 8601 datetime when the record was scraped

Example post record:

{
  "recordType": "post",
  "id": "1lfbo7u",
  "name": "t3_1lfbo7u",
  "title": "What is a thing you love that lots of people hate?",
  "author": "Vetro_Nodulare2",
  "authorFullname": "t2_16syu27ar1",
  "subreddit": "AskReddit",
  "url": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/",
  "score": 47,
  "ups": 47,
  "downs": 0,
  "upvoteRatio": 0.91,
  "numComments": 353,
  "subredditSubscribers": 56146601,
  "isVideo": false,
  "totalAwardsReceived": 0,
  "createdUtc": 1750341959,
  "createdAt": "2025-06-19T14:05:59.000Z",
  "scrapedAt": "2025-06-20T09:45:12.000Z"
}

Comment Record (`recordType: "comment"`)

Field	Type	Description
`recordType`	string	Always `"comment"`
`id`	string	Reddit short ID (e.g. `mymupxb`)
`name`	string	Reddit fullname, prefixed `t1_` (e.g. `t1_mymupxb`)
`postId`	string	Short ID of the parent post
`postUrl`	string	Full URL of the parent post
`author`	string	Username of the commenter
`authorFullname`	string	Reddit internal author ID
`body`	string	Plain-text comment body
`score`	integer	Net vote score
`ups`	integer	Upvote count
`downs`	integer	Downvote count
`depth`	integer	Nesting depth (0 = top-level, 1 = reply to top-level, etc.)
`parentId`	string	Fullname of the parent (`t3_...` if replying to post, `t1_...` if replying to comment)
`linkId`	string	Fullname of the parent post (always `t3_...`)
`subreddit`	string	Subreddit name
`url`	string	Full URL of this comment
`permalink`	string	Relative permalink path
`gilded`	integer	Number of times gilded
`stickied`	boolean	True if pinned by a moderator
`locked`	boolean	True if the comment thread is locked
`archived`	boolean	True if too old to receive votes
`controversiality`	integer	0 or 1; 1 = high vote split
`totalAwardsReceived`	integer	Number of Reddit awards
`createdUtc`	number	Unix timestamp (UTC seconds) of comment creation
`createdAt`	string	ISO 8601 datetime of comment creation
`scrapedAt`	string	ISO 8601 datetime when scraped
`replies`	array	[Nested mode only] Child comment records

Example comment record (flat mode):

{
  "recordType": "comment",
  "id": "mymupxb",
  "name": "t1_mymupxb",
  "postId": "1lfbo7u",
  "postUrl": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/",
  "author": "Background-Emu-2890",
  "authorFullname": "t2_efdlposp6",
  "body": "Black cat — I have one and I love her so much!",
  "score": 75,
  "ups": 75,
  "downs": 0,
  "depth": 0,
  "parentId": "t3_1lfbo7u",
  "linkId": "t3_1lfbo7u",
  "subreddit": "AskReddit",
  "url": "https://www.reddit.com/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/mymupxb/",
  "permalink": "/r/AskReddit/comments/1lfbo7u/what_is_a_thing_you_love_that_lots_of_people_hate/mymupxb/",
  "gilded": 0,
  "stickied": false,
  "locked": false,
  "archived": false,
  "controversiality": 0,
  "totalAwardsReceived": 0,
  "createdUtc": 1750342221,
  "createdAt": "2025-06-19T14:10:21.000Z",
  "scrapedAt": "2025-06-20T09:45:12.000Z"
}

Flat vs Nested Comment Output

Flat mode (flattenComments: true, default):

Every comment and reply is pushed as its own dataset record
Use depth to understand nesting level (0 = top-level)
Use parentId to reconstruct the tree (t1_XXX = parent comment, t3_XXX = direct reply to post)
Best for: spreadsheet analysis, databases, ML pipelines, CSV export

Nested mode (flattenComments: false):

Top-level comment records contain a replies array with child records embedded
Each child also contains its own replies array, forming a full tree
Best for: JSON tree processing, displaying thread structure

Pagination

The ScrapeCreators API uses cursor-based pagination. Each page returns approximately 25 top-level comments. Set maxPages to control how many pages to fetch:

maxPages: 1 — fast, ~25 top-level comments
maxPages: 5 — ~125 top-level comments + all their replies
maxPages: 20 — comprehensive extraction for large threads

The actor stops pagination early if maxCommentsPerPost is reached or the API signals no more pages available.

Error Handling

Failed URLs push an error record (error: true) and processing continues for remaining URLs
Posts with no comments push a warning record
The dataset always contains at least one record per run

Pricing

This actor uses Pay Per Event (PPE) pricing:

$0.30 per 100 records (each post and each comment count as one record)
A thread with 1 post + 199 comments = 200 records ≈ $0.60
Bulk run: 10 threads × 200 comments = ~2,010 records ≈ $6.03

Support

For questions or feature requests, contact the actor publisher via the Apify Store messaging system.

Reddit Comments Scraper

easyapi/reddit-comments-scraper

Extract Reddit comments with their complete thread structure, including nested replies, user information, and engagement metrics. Perfect for analyzing discussions, sentiment analysis, and tracking community engagement on Reddit posts.

EasyApi

278

5.0

Reddit Posts & Comments Scraper — Full Thread Extraction

maged120/reddit-scraper

Scrape Reddit posts and full comment threads from any post URL. Extract title, score, author, timestamp, and all nested comments without login.

Maged

5.0

Reddit Comments Scraper

mysteriousshadow/reddit-comments-scraper

Easily extract Reddit comments with customizable depth. Retrieve top-level comments, direct replies, or entire thread hierarchies in flat or nested formats for seamless analysis and research.

Mysterious Shadow

Reddit Comments Scraper

scrapesmith/reddit-comments-scraper

Extract every comment from any Reddit post URL — including collapsed, hidden, and deeply nested replies. Get comment text, author, upvotes, depth level, controversiality, reply counts, and timestamps. Filter by date. Flattened output with full thread context. No login or cookies needed.

Scrape Smith

Reddit Thread Details Scraper

ecomscrape/reddit-thread-details-scraper

Reddit Thread Details Scraper automates extraction of comprehensive thread metadata including post content, engagement metrics, author information, and moderation data. Efficiently collect detailed Reddit data for social listening, market research, sentiment analysis, and community insights.

ecomscrape

Reddit Comments Scraper: Body, Author, Score & Depth

scrapers_lat/reddit-comments-scraper

Scrape every comment from any Reddit post or subreddit, including nested replies. Extract body text, author, score, awards, controversiality, depth and parentId to rebuild the thread. Export to JSON, CSV or Excel. No API key.

Scrapers Lat

Reddit Comment Scraper

scrapium/reddit-comment-scraper

Scrape Reddit comments with ease 💬👽 Extract comment text, usernames, scores, timestamps, replies, and thread details from Reddit posts. Perfect for sentiment analysis, audience research, trend tracking, and community insights. Turn Reddit conversations into actionable data fast 🚀

Scrapium

Reddit Scraper

optimus-fulcria/reddit-scraper

Scrape Reddit posts, comments, and subreddit data. Full nested comment threads, search queries, user profiles.

Fulcria Labs

Reddit Post & Comment Scraper

fluxcurulin/reddit-scraper

Scrape posts and comments from any subreddit via old.reddit.com. Extract titles, scores, authors, timestamps, comment threads, and nested replies. Ideal for sentiment analysis, trend tracking, brand monitoring, and academic research.

Josh Pinkerton

Reddit Post & Comment Scraper

ionbelei549/reddit-parsed-posts

Scrape unlimited comments from any posts with 99% accuracy (highest of the Apify Store). Input any Reddit post URL and get complete, rich JSON data, including deeply nested comment threads, scores, author details, and awards. Comments tree is already built for you.