Reddit Comments Scraper
Pricing
from $1.99 / 1,000 results
Reddit Comments Scraper
🔍 Reddit Comments Scraper pulls insightful Reddit comment threads fast—clean, structured data for sentiment, trend & community analysis. 🧠 Great for research, marketing insights, and competitive intelligence. 🚀 Easy to run, export-ready results.
Pricing
from $1.99 / 1,000 results
Rating
0.0
(0)
Developer
ScrapeCraze
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Reddit Comments Scraper 🚀
Struggling to collect Reddit thread insights fast enough for your research and outreach workflows? Reddit Comments Scraper pulls comments (including nested replies) from one or more Reddit posts and returns a flat record per comment. With a Reddit comments scraper, you can scrape Reddit comments for analysis and export ready-to-use data. It’s built for marketers, data analysts, and researchers who want structured Reddit comment extraction without manual copy-pasting. In a single run, you can extract hundreds of comment records per post—quickly enough to iterate on your dataset the same day.
See the Data: Sample Output
Here's a real record from a single run:
{"postUrl": "https://www.reddit.com/r/AskMec/comments/14990m6/les_applications_de_rencontres_fonctionnent_telles/","postTitle": "Les applications de rencontres fonctionnent-elles vraiment ?","postAuthor": "u/relationship_nerd","postScore": 1834,"subreddit": "AskMec","commentDepth": 0,"commentAuthor": "u/curious_user_27","commentText": "From my experience, the apps work best when you treat them like conversations—not like instant dating slots.","commentTimestamp": "2024-05-12T09:21:17.000Z","commentPath": "12","parentPath": null,"isTopLevel": true,"replyCount": 3,"scrapedAt": "2026-06-07T12:34:56.000Z"}
| Field | Type | What It Tells You |
|---|---|---|
postUrl | string | The Reddit post the comment belongs to, so you can trace context. |
postTitle | string | The post title for reporting and dataset labeling. |
postAuthor | string | Who authored the Reddit post you scraped. |
postScore | number | A useful signal for prioritizing threads (score at scrape time). |
subreddit | string | The community the post lives in, handy for segmentation. |
commentDepth | number | Whether the comment is top-level or nested within the thread tree. |
commentAuthor | string | The Reddit user who posted the comment (useful for author-level analysis). |
commentText | string | The actual comment content text for NLP, sentiment, or thematic coding. |
commentTimestamp | string | UTC timestamp (ISO-8601) of when the comment was created. |
commentPath | string | Encoded position in the tree so you can reconstruct the thread structure in analysis. |
parentPath | string | null | The parent’s commentPath (or null for top-level comments). |
isTopLevel | boolean | Quick flag to separate top-level comments from replies. |
replyCount | number | Count of immediate replies (direct children) for each comment. |
scrapedAt | string | UTC timestamp (ISO-8601) when the record was scraped. |
status | (not in dataset) | Failures are handled per post via logs; successful comment records are pushed to the dataset. |
error_message | (not in dataset) | Error details are logged when a post fails after retries; successful posts still contribute records. |
Export your full dataset as JSON, CSV, or Excel from the Apify dashboard.
Setting It Up
Drop this into your input.json and you're ready to go:
{"postUrls": ["https://www.reddit.com/r/AskMec/comments/14990m6/les_applications_de_rencontres_fonctionnent_telles/","https://www.reddit.com/r/programming/comments/abcdef/example_thread_title/"],"maxComments": 500,"includeNestedReplies": true,"sortBy": "top","maxConcurrentPosts": 2}
| Parameter | Required | What It Does |
|---|---|---|
postUrls | ✅ | One or more Reddit post URLs to scrape comments from. |
maxComments | ⬜ | Maximum number of comments to extract per post (counts nested replies too). |
includeNestedReplies | ⬜ | When enabled, replies to comments are also extracted (the full thread tree). When disabled, only top-level comments are returned. |
sortBy | ⬜ | How Reddit orders the comments before they are collected (top, best, new, controversial, old, qa). |
maxConcurrentPosts | ⬜ | How many posts to scrape in parallel. Each post runs its own browser, so higher values need more memory. |
What It Does
Reddit Comments Scraper is a comment extraction tool that returns one flat dataset record per Reddit comment, with path/depth metadata for easy downstream analysis.
Scrape Reddit comments with thread structure metadata
Each extracted comment includes commentDepth, commentPath, parentPath, and isTopLevel, so you can analyze conversations as threads—not just a list of messages. If you enable nested replies, you’ll capture the full reply tree and preserve where every comment sits in the discussion.
Control volume to match your research goals
Use maxComments to cap how many comments you extract per post (nested replies are counted too). This makes it easier to build focused datasets for topic modeling, Reddit comments sentiment analysis, or content/theme research without overwhelming your pipeline.
Choose comment ordering for consistent comparisons
Set sortBy to control how comments are ordered before collection (such as top or new). This is helpful when you want repeatable datasets for A/B analyses or for comparing different subreddits and time windows.
Built for export-ready, integration-friendly output
The actor pushes structured records to the Apify dataset via Actor.push_data(records, charged_event_name="result"). That means your output is immediately usable for reporting, ETL jobs, or further processing (like tagging by subreddit or reconstructing reply chains).
Includes nested reply extraction when you need it
With includeNestedReplies: true, replies are collected and flattened into records with correct replyCount for immediate children. With nested replies disabled, you get a simpler dataset containing only top-level comments, which can speed up review and annotation.
Overall, Reddit Comments Scraper turns Reddit comment extraction into a clean, analysis-ready dataset you can collect and export repeatedly.
Why Reddit Comments Scraper?
There are plenty of ways to pull data from Reddit—here’s why Reddit Comments Scraper stands out.
Flat records per comment, with usable path context
Instead of delivering nested blobs that are painful to process, the Reddit Comments Scraper flattens the thread into consistent records and keeps commentPath/parentPath so you can still reason about the conversation structure.
Flexible collection depth without losing control
You decide whether to include nested replies (includeNestedReplies) and cap output with maxComments. That’s especially useful for Reddit comments data mining when you need predictable dataset sizes.
Built for resilience on each post
The actor includes multiple attempts per post (up to 4) and logs warnings when it can’t fetch expected data. When a post succeeds, its comments are pushed to the dataset right away, so partial results don’t stall your entire run.
Real-World Use Cases
Here's how different teams put Reddit Comments Scraper to work:
Marketing Analysts
A marketing team needs to understand what people actually say about a product category in specific subreddits. They scrape Reddit comments from a handful of high-signal posts, turn commentText into themes, and use subreddit plus commentDepth to separate top-level opinions from reply-driven nuance—without manual scraping.
Research Teams & Community Managers
A researcher running qualitative analysis wants a thread-level view of how discussions evolve. They run the actor with nested replies enabled to collect a complete conversation tree, then use commentPath and parentPath to reconstruct conversation order during coding.
Sales & Outreach Ops
An outreach coordinator wants to find objection patterns and recurring concerns mentioned in comments across multiple threads. They use sortBy and maxComments to standardize what gets collected, then export the dataset and apply filtering logic downstream to focus on the most actionable comment segments.
Data Engineers / Automation Specialists
A data engineer integrates scraping into an ETL schedule and needs consistent schema fields. They trigger the Reddit comment extraction job with a controlled maxConcurrentPosts, then pipe the dataset records into their warehouse—leveraging the consistent output fields (postUrl, commentTimestamp, scrapedAt) for incremental refresh.
NLP Practitioners
An applied ML workflow needs clean text inputs and timestamps for time-aware models. They collect comment datasets for Reddit comments sentiment analysis or summarization, using commentTimestamp and commentText to prepare training and evaluation splits.
How to Run It
No code required. Here's how to get your first results in under 5 minutes:
-
Open the actor on Apify
Go to the actor page in Apify Console: https://console.apify.com. -
Enter your inputs
ProvidepostUrls(required). Optionally setmaxComments,includeNestedReplies,sortBy, andmaxConcurrentPosts. -
Configure proxy settings (if needed)
Use the Apify proxy configuration options to improve reliability for larger scraping jobs. -
Start the run and watch the live log
The run logs show the progress per post and report extraction counts when successful. -
Open the Dataset tab
Comments records appear as the actor pushesrecordsfor each processed post. -
Export your results
Download your dataset from the Apify dashboard in your preferred format (JSON, CSV, or Excel).
The whole setup takes under 5 minutes — results start appearing within seconds of launch.
Export & Integration Options
Once your data is collected, Reddit Comments Scraper fits directly into your existing workflow.
Export your dataset from the Apify dashboard as JSON, CSV, or Excel. That makes it easy to build dashboards, perform manual review, or feed the results into your own analysis notebooks.
If you automate reporting or pipelines, you can integrate via Apify’s API, webhooks, and no-code tools like Zapier/Make to move results into your CRM, storage, or analytics stack. For deeper integration details, use the Apify developer documentation.
Pricing
Reddit Comments Scraper runs on Apify, which includes a free tier—no credit card needed to start.
You can use free credits to test runs (including different sortBy settings and nested reply collection) and then scale up as your dataset needs grow. When you move beyond the free tier, pricing is based on Apify compute usage (Actor compute units), with subscription options available for heavier workloads. There’s no per-row markup—so you pay for platform compute, not for each comment record returned.
Start free at apify.com — scale up when you need to.
Reliability & Limitations
| What We Handle | How |
|---|---|
| Rate limits & anti-bot friction | Uses a real browser session approach to stay robust on the platform. |
| Proxy reliability | Supports proxy configuration for more consistent scraping runs. |
| Post-level failures | Retries up to 4 attempts per post before giving up. |
| Partial run recovery | If a post succeeds, its comment records are pushed to the dataset. |
| Error handling | Unexpected response formats and non-200 fetch outcomes are logged. |
| Scale within bounds | maxConcurrentPosts is capped by input constraints and memory needs. |
Limitations: This actor works on public Reddit post pages. It can’t scrape login-gated/private content or scenarios where the platform blocks access completely for a given run.
For enterprise-scale needs or custom configurations, reach out and we'll help.
Frequently Asked Questions
Is there a free plan?
Yes, Apify offers a free tier with credits for trying out Reddit Comments Scraper. You can run small batches first (for example, a couple of postUrls) and validate the dataset schema before scaling.
Do I need to log in or create an account on Reddit?
No. The actor is designed to scrape publicly available Reddit post data without requiring you to log in through the actor.
How accurate is the extracted data?
The extracted fields come directly from the scraped Reddit content and metadata included in each comment record (for example commentText, commentTimestamp, and the thread position fields). Accuracy reflects what is publicly available on the page at the time of the scrape.
How many results can I get per run?
You control volume with maxComments (maximum number of comments extracted per post, counting nested replies too). For multiple postUrls, the total dataset size scales with the number of posts and the per-post cap.
How fresh is the data?
The actor stamps every record with scrapedAt in UTC ISO-8601 format. Freshness depends on when you run the actor, so schedule runs when you need up-to-date commentary.
Is this legal? Does it comply with GDPR / CCPA?
You should treat it as collecting publicly available data only. Whether your use complies with GDPR, CCPA, Reddit’s Terms of Service, and other applicable regulations is your responsibility; review your workflows and data handling practices accordingly.
Can I export to Google Sheets or Excel?
Yes. You can download the dataset from Apify (JSON, CSV, or Excel) and import it into Google Sheets or any spreadsheet tool you use.
Can I schedule this to run automatically?
Yes. You can schedule Apify actors to run automatically (for example, recurring collection for research). Configure scheduling in the Apify platform based on your needs.
Can I access results via the API?
Yes. Apify supports programmatic access to runs and datasets via the Apify API, so you can integrate Reddit Comments Scraper into your automation stack.
What happens when the actor encounters an error?
If a post fails to fetch or has an unexpected response format, the actor logs warnings and retries (up to 4 attempts) before moving on. Successful posts still push their extracted comment records to the dataset, so you can keep working with partial results.
Get Help & Use Responsibly
Got a question about Reddit Comments Scraper or a feature you'd like added? Reach out at dataforleads@gmail.com. We welcome ideas like improving output usefulness for thread analysis and adding more dataset-friendly fields for research workflows—we actively maintain this actor based on user feedback.
Publicly available data: This actor collects data only from publicly available data on Reddit. It does not access private accounts, login-gated pages, or password-protected content. You are responsible for complying with GDPR, CCPA, and applicable platform Terms of Service when using and storing the data. For data removal requests, contact dataforleads@gmail.com. Use responsibly, ethically, and only for lawful purposes.