Reddit Comments Scraper
Pricing
from $2.99 / 1,000 results
Reddit Comments Scraper
🔎 Extract valuable Reddit comments with this Comments Scraper—fast, accurate, and built for research, sentiment, and community insights. 📊✨ Perfect for marketers, analysts, and data teams wanting actionable results.
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
SolidScraper
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Reddit Comments Scraper 📣
Reddit Comments Scraper automatically collects comments (including nested replies, when enabled) from one or more Reddit posts and returns a flat record per comment, complete with path and depth metadata. If you’re looking to scrape reddit comments, extract reddit thread comments for analysis, or build a bulk reddit comment scraper workflow, this tool helps you get structured comment data at scale—without manually copying threads one by one. Whether you’re a marketer, data analyst, researcher, or developer, you can use this reddit comments extraction actor to speed up collection and save you hours of manual work.
Why choose Reddit Comments Scraper?
| Feature | Benefit |
|---|---|
| ✅ Comments + Nested Replies Collection | Extracts top-level comments and (optionally) the full reply tree for each post |
| ✅ All-in-One Batch Input | Lets you scrape comments from multiple post URLs in a single run |
| ✅ Reliable Scraping with Fallback Logic | Includes retries and handles access challenges using a real browser session |
| ✅ Proxy Support for Stability | Supports configurable proxy settings to improve scraping reliability |
| ✅ Structured Flat Output | Returns one JSON record per comment with path/depth metadata for easy downstream processing |
| ✅ Scales with Concurrency Controls | Uses configurable parallelism via maximum concurrent posts to fit your throughput needs |
Key features
- 📊 Flat comment data with tree metadata: Produces one record per comment with
commentPathandcommentDepthso you can analyze conversation structure. - 💬 Optional nested reply extraction: When enabled, replies to comments are also collected (full thread tree); when disabled, only top-level comments are returned.
- 🔍 Sort-controlled comment ordering: Supports
top,best,new,controversial,old, andqasorting to match your research needs. - 🧠 Top-level vs reply awareness: Adds
isTopLevelandparentPathso you can distinguish roots from replies in your analysis. - 🛡️ Resilient runs with retries: Uses multiple attempts per post to reduce the chance of partial failures.
- 🌐 Post URL support: Accepts one or more Reddit post URLs and normalizes them for collection.
- 💾 Dataset-ready results: Pushes extracted comment records to the Apify dataset as JSON (one item list per successful post).
- ⚙️ Concurrency controls: Uses
maxConcurrentPostsso you can balance speed against memory usage.
Input
Provide input via an input.json file. Example structure:
{"postUrls": ["https://www.reddit.com/r/AskMec/comments/14990m6/les_applications_de_rencontres_fonctionnent_telles/"],"maxComments": 500,"includeNestedReplies": true,"sortBy": "top","maxConcurrentPosts": 2,"proxyConfiguration": {"useApifyProxy": false}}
Input Fields
| Field | Required | Description |
|---|---|---|
postUrls | Yes | One or more Reddit post URLs to scrape comments from. |
maxComments | No | Maximum number of comments to extract per post (counts nested replies too). Default is 500. Must be at least 1. |
includeNestedReplies | No | When enabled, replies to comments are also extracted (the full thread tree). When disabled, only top-level comments are returned. Default is true. |
sortBy | No | How Reddit should order the comments before they are collected. Options: top, best, new, controversial, old, qa. Default is top. |
maxConcurrentPosts | No | How many posts to scrape in parallel. Each post runs its own browser, so higher values need more memory. Default is 2 (min 1, max 10). |
proxyConfiguration | No | Proxy settings for the scraper. If provided, the actor uses your configuration; otherwise it creates a default proxy configuration with residential groups. |
Output
The actor saves extracted comments in JSON format by pushing a list of comment records to the Apify dataset (charged_event_name="result") for each successfully processed post.
Example output record:
[{"postUrl": "https://www.reddit.com/r/.../comments/.../","postTitle": "Example post title","postAuthor": "example_author","postScore": 12345,"subreddit": "examplesubreddit","commentDepth": 0,"commentAuthor": "comment_author","commentText": "This is a comment body.","commentTimestamp": "2024-01-15T10:22:33.000Z","commentPath": "0","parentPath": null,"isTopLevel": true,"replyCount": 2,"scrapedAt": "2024-01-15T10:30:00.000Z"}]
Output Fields
| Field | Type | Description |
|---|---|---|
postUrl | string | The normalized Reddit post URL for which the comment was scraped. |
postTitle | string | The post title. |
postAuthor | string | The post author username. |
postScore | number | The post score at the time of collection. |
subreddit | string | The subreddit name. |
commentDepth | number | Depth of the comment in the thread tree (top-level is 0). |
commentAuthor | string | The comment author username. |
commentText | string | The comment body text. |
commentTimestamp | string | UTC timestamp (ISO-8601 with milliseconds and trailing Z) for when the comment was created. |
commentPath | string | Encoded position of the comment within the tree (e.g., "0", "0/1", "0/1/0"). |
parentPath | string | null | The parent comment’s commentPath (or null for top-level comments). |
isTopLevel | boolean | true when commentDepth is 0; otherwise false. |
replyCount | number | Count of direct replies to this comment. |
scrapedAt | string | UTC timestamp (ISO-8601 with milliseconds and trailing Z) indicating when the scraping happened. |
error_message | string | Not provided in the dataset schema emitted by this actor. Failures are logged and posts that succeed will push records. |
You can export the resulting dataset from Apify as JSON or CSV (depending on your chosen export settings in the Apify UI).
How to use Reddit Comments Scraper (via Apify Console)
- Open Apify Console: Go to console.apify.com and log in.
- Find the actor: Search for Reddit Comments Scraper in the Actors marketplace and open the actor page.
- Open the INPUT panel: In the actor run screen, locate the INPUT section.
- Add your post URLs: Paste one or more Reddit post URLs into
postUrls. - Choose your comment limits and structure:
SetmaxComments(per post), enable/disableincludeNestedReplies, and picksortByif you need a specific ordering. - Set concurrency for your budget: Adjust
maxConcurrentPosts(each parallel post uses its own browser, so higher values use more memory). - Configure proxy (optional): If you have
proxyConfiguration, add it; otherwise the actor creates a default residential proxy configuration. - Run & monitor: Click Run. Watch logs for progress, extraction counts, and any retry attempts.
- Open the OUTPUT dataset: After completion, go to the dataset tab to preview the extracted reddit comments data and export it to JSON/CSV.
No coding required—get reddit comments extraction results in minutes.
Advanced features & SEO optimization
- 🔁 Engineered for “Reddit Comments Scraper” workflows: The actor is designed for reddit comments to csv scraper style pipelines where you need a clean, flat structure for analysis and BI.
- 🧩 Thread-aware output for conversation mining: Each comment includes
commentPath,parentPath,commentDepth,replyCount, andisTopLevel, making reddit comments mining and scrape reddit thread comments workflows much easier. - 🕒 Consistent UTC timestamps: Uses ISO-8601
scrapedAtandcommentTimestampvalues for reliable time-based analysis. - 🧰 Input-friendly sorting: With
sortBy, you can align collection with your research question (for example, focusing on most upvoted or most recent discussions). - 🔍 Resilience for public web data: Includes retries and supports configurable proxy settings for stable scraping of publicly available data.
Best use cases
- 📈 Marketing teams: Collect reddit comments data from multiple posts to find recurring themes and messaging angles for outreach campaigns.
- 🧠 Researchers: Gather structured reddit comments extraction for qualitative coding and quantifying sentiment or discussion depth.
- 💬 Community managers: Monitor how conversations evolve by scraping threads with
sortByand analyzing commentDepth distributions. - 🏗️ Data analysts: Build a conversation graph using
commentPath,parentPath, andreplyCountfrom a bulk reddit comment scraper run. - 🧪 Product teams: Compare feedback across communities by scraping reddit comments from posts in relevant subreddits and exporting to CSV.
- 💻 Developer pipelines: Feed structured results into downstream systems (ETL, dashboards, or CRM enrichment steps) with predictable fields per comment.
- 🎯 Content strategists: Scrape comments from posts to identify what users actually respond to—then iterate your content based on real discussion threads.
Technical specifications
-
Supported Input Formats
- ✅
postUrls: array of Reddit post URLs - ✅
maxComments: integer (default500, minimum1) - ✅
includeNestedReplies: boolean (defaulttrue) - ✅
sortBy: string enum (top,best,new,controversial,old,qa) - ✅
maxConcurrentPosts: integer (default2, range1to10) - ✅ Optional
proxyConfiguration
- ✅
-
Proxy Support
- ✅ Configurable proxy support via
proxyConfiguration - ✅ Default residential proxy configuration when
proxyConfigurationis not provided
- ✅ Configurable proxy support via
-
Retry Mechanism
- ✅ Retries are built in for each post (multiple attempts per post)
-
Dataset Structure
- ✅ JSON records pushed to the dataset with one flat record per comment
- ✅ Includes
commentPath/parentPath/commentDepthfor thread reconstruction
-
Rate Limits & Performance
- ✅ Designed for batch processing with configurable concurrency using
maxConcurrentPosts - ⚠️ Each concurrent post uses its own browser session, so higher concurrency can increase memory usage
- ✅ Designed for batch processing with configurable concurrency using
-
Limitations
- ❌ Mod/bot-pinned comments are skipped (
stickieditems are not included) - ❌ Only publicly accessible comment data from the provided posts is collected
- ❌ Mod/bot-pinned comments are skipped (
FAQ
What does Reddit Comments Scraper return?
✅ It returns a flat list of JSON records—one record per comment—with thread metadata like commentPath, parentPath, and commentDepth, plus comment content (commentText) and timestamps (commentTimestamp).
Can it scrape nested replies?
✅ Yes. With includeNestedReplies enabled, replies to comments are also extracted so you get the full thread tree. If you disable it, only top-level comments are returned.
How many comments can I extract from each post?
You control it with maxComments. It sets the maximum number of comments extracted per post and counts nested replies too.
Can I control the order of comments?
✅ Yes. Use sortBy to choose how comments are ordered before they are collected: top, best, new, controversial, old, or qa.
Does it support scraping multiple Reddit posts at once?
✅ Yes. Provide multiple links in postUrls. You can also control parallelism with maxConcurrentPosts to balance speed and resource usage.
Is there a dataset export format other than JSON?
Apify datasets can be exported after the run. The actor pushes JSON-formatted records to the dataset, and you can export to CSV from the Apify UI depending on your settings.
Do I need to use a proxy?
❌ You don’t have to, but you can. If you provide proxyConfiguration, the actor will use it; otherwise it creates a default residential proxy configuration to improve scraping reliability.
Is this compliant with privacy rules?
✅ The actor only collects data from publicly accessible sources. You’re responsible for using the results in accordance with applicable laws (including privacy and platform rules) for your specific use case.
Support & feature requests
If you’re using Reddit Comments Scraper for reddit comments web scraper or reddit comments data extraction workflows, we’d love to hear how it’s working for you.
- 💡 Feature Requests: Examples include additional export controls, adding more post-level metadata fields, or enhancements tailored for bulk reddit comments mining pipelines.
- 📧 Contact: For questions, support, or feedback, reach out at dataforleads@gmail.com.
Your feedback helps shape the roadmap for this reddit comment scraper tool.
Use the Reddit Comments Scraper to collect reddit comments extraction results with structured, thread-aware output—so you can scale analysis without the manual grind.
Disclaimer
This tool only accesses publicly accessible sources. It does not access private profiles, authenticated data, or password-protected content.
You are responsible for ensuring your use complies with applicable laws (for example, GDPR/CCPA), spam regulations, and the relevant platform terms of service. For data removal requests, contact dataforleads@gmail.com. Always use Reddit Comments Scraper responsibly, ethically, and for legitimate purposes.