Reddit Comment Scraper
Pricing
$19.99/month + usage
Reddit Comment Scraper
🔎 Reddit Comment Scraper (reddit-comment-scraper) scrapes comments from threads and subreddits — with timestamps, authors, scores, and permalinks. 📈 Export to CSV/JSON for sentiment, keyword, and trend analysis. ⚡ Ideal for market research, community insights, and competitive intelligence.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapAPI
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Reddit Comment Scraper
The Reddit Comment Scraper is a fast, reliable Reddit comment scraper tool that extracts structured comment data from Reddit threads for analysis and reporting. It solves the manual, time-consuming process of collecting Reddit comments at scale by acting as a Reddit comment extractor and Reddit thread comments scraper that outputs clean JSON you can export to CSV. Built for marketers, developers, data analysts, and researchers, it enables automated “Scrape Reddit comments” workflows for sentiment, keyword analysis, and trend tracking.
What data / output can you get?
Below are the exact fields saved to the Apify dataset during a run. You can export the dataset to CSV, JSON, or Excel. The actor also saves a grouped JSON to the key-value store under the OUTPUT key.
| Data type | Description | Example value |
|---|---|---|
| url | Source Reddit post URL for the comment record | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/ |
| comment_id | Unique comment identifier | lhk1f7n |
| post_id | Post identifier (link thing ID) | t3_1epeshq |
| author | Comment author username or [deleted] | AutoModerator |
| permalink | Direct link to the specific comment | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/ |
| upvotes | Number of upvotes (score) | 42 |
| content_type | Content type marker | text |
| parent_id | Parent thing ID, normalized without the t1_/t3_ prefix; null for none | 1epeshq |
| author_avatar | Author avatar URL if available (empty string by default) | |
| userUrl | Link to the author’s Reddit profile (empty if [deleted]) | https://www.reddit.com/user/AutoModerator/ |
| contentText | Comment text (newlines flattened) | Thanks for sharing this… |
| created_time | Timestamp (empty string by default) | |
| replies | Array of nested reply objects kept under each comment; flattened comments still appear as individual rows | [] |
Note:
- The dataset contains one row per comment with a url field for easy filtering and export. You can export to CSV/JSON in one click.
- A grouped JSON is also stored in the key-value store (key: OUTPUT) as a map of post URL → array of comment objects (without the url field).
Key features
-
⚡ Automatic proxy fallback Robust reliability out of the box. The scraper first tries a direct connection, then automatically falls back to Apify datacenter proxy, and finally to residential proxy with retries if needed. This keeps collection stable even under rate limits.
-
💬 Flattened + nested replies Every discovered reply is traversed and emitted as a flattened comment record, while the replies array on each parent is kept up to a configurable depth via replyLimit. This gives you both row-by-row analytics and hierarchical context.
-
🐍 Async Python engine Built with aiohttp and the Apify SDK for performance and resilience. Ideal for “Reddit comments scraper Python” pipelines and programmatic automation.
-
📦 Structured export-ready data Clean fields for author, permalink, upvotes, parent-child relations, and content text. Perfect for Reddit comment export to CSV, JSON, or Excel from the Apify dataset.
-
🔌 Developer-friendly outputs In addition to the dataset, a grouped JSON (key-value store entry OUTPUT) is produced for API-centric workflows like downstream enrichment or storage.
-
🚫 No login or API keys required Works as a “Reddit comment scraper without API” by leveraging Reddit’s public JSON endpoints—no OAuth required.
-
🧰 Production-ready infrastructure Designed for repeatable runs, consistent fields, and scalable extraction—ideal for teams who need a reliable Reddit comments scraping tool rather than a brittle script.
How to use Reddit Comment Scraper - step by step
- Sign up or log in to Apify.
- Open the actor named reddit-comment-scraper.
- Paste your Reddit post URLs into startUrls (string list). Both standard thread URLs and …/comments/
- Set maxComments to control how many comments to collect per URL (range 1–10,000; default 100).
- Set replyLimit to cap how many nested replies are kept under each parent’s replies array (0 = unlimited; default 2). All replies still appear in the flattened dataset regardless of this cap.
- (Optional) Configure proxyConfiguration. The scraper will try direct first, then auto-fallback to datacenter and residential proxies as needed.
- Click Run. Watch progress logs as comments are saved to the dataset in real time.
- Download your results from the Dataset tab (CSV, JSON, Excel) and find the grouped JSON under the Key-value store (key: OUTPUT).
Pro Tip: Use the Apify API to trigger runs programmatically and pipe results into your BI warehouse or NLP stack. It’s a robust alternative to a one-off Reddit comment downloader or a DIY Reddit comment scraper GitHub script.
Use cases
| Use case name | Description |
|---|---|
| Market research + voice-of-customer | Aggregate and analyze comment sentiment around brands and products to inform messaging and positioning. |
| Community insights for marketers | Track recurring topics and pain points across discussion threads to guide content strategy and engagement. |
| Competitive intelligence | Monitor comment-level feedback on competitor launches to identify differentiators and opportunities. |
| Academic & NLP research | Collect clean, structured corpora of Reddit comments for sentiment models, topic modeling, and trend analysis. |
| Product feedback mining | Extract granular comments on features and UX to prioritize backlog based on real user language. |
| Data pipeline ingestion (API) | Automate runs via the Apify API and export comments to CSV/JSON for downstream ETL and analytics. |
| Trend & keyword tracking | Download Reddit comments over time to detect emerging topics and keywords in your niche. |
Why choose Reddit Comment Scraper?
This production-ready Reddit comments scraping tool balances stability, data quality, and developer usability.
- 🎯 Accurate, structured fields for analytics and NLP
- 🌱 Flattened output plus nested replies for context-rich analysis
- 📈 Scales across multiple post URLs in a single run
- 💻 Developer-friendly outputs: Apify Dataset (CSV/JSON) + OUTPUT JSON for APIs
- 🛡️ Reliable with automatic proxy fallback (direct → datacenter → residential)
- 💸 Cost-effective with an included trial (120 free trial minutes available on this listing)
- 🔄 A hosted alternative to brittle browser extensions and ad-hoc scripts
Bottom line: a stable Reddit API comment scraper approach without OAuth complexity, ready to integrate into real workflows.
Is it legal / ethical to use Reddit Comment Scraper?
Yes—when used responsibly. This actor extracts publicly available data from Reddit threads and does not access private or authenticated content.
Guidelines for compliant use:
- Only collect data from publicly accessible Reddit posts and comments.
- Review and respect Reddit’s Terms of Service.
- Ensure your use complies with applicable data protection regulations (e.g., GDPR, CCPA).
- Do not collect private profiles or password-protected content.
- Consult your legal team for edge cases and jurisdiction-specific requirements.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"],"maxComments": 100,"replyLimit": 2,"proxyConfiguration": {"useApifyProxy": false}}
Input fields
- startUrls (array, required): Paste one or more Reddit post URLs to scrape. Example: https://www.reddit.com/r/subreddit/comments/post_id/title/
- Default: none
- maxComments (integer, optional): Maximum number of comments to extract per URL. Range: 1–10,000.
- Default: 100
- replyLimit (integer, optional): Depth of nested replies stored in the replies field. Set to 0 for unlimited depth. Note: All replies are still collected in the flattened output regardless of this setting.
- Default: 2
- proxyConfiguration (object, optional): Proxy settings for reliable scraping. By default, no proxy is used. If requests are blocked, the scraper automatically falls back to datacenter, then residential proxies with smart retries.
- Default: { "useApifyProxy": false }
Example dataset record (one comment per row)
{"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/","comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}
Example grouped OUTPUT (key-value store, key = "OUTPUT")
{"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [{"comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}]}
Notes:
- author may be “[deleted]”, in which case userUrl is an empty string.
- author_avatar and created_time are set to empty strings by default.
- replies are included in dataset rows (subject to replyLimit for the nested array), while every reply is still emitted as its own flattened row for analysis.
FAQ
Is there a free trial?
✅ Yes. This actor listing includes 120 free trial minutes so you can test end-to-end before subscribing.
Do I need Reddit login or API keys?
✅ No. This is a Reddit comment scraper without API OAuth. It uses Reddit’s public JSON endpoints and works without logging in.
Can it scrape entire subreddits?
❌ No. This tool is a Reddit thread comments scraper. Provide specific Reddit post (thread) URLs in startUrls to extract their comments.
How many comments can I collect per URL?
✅ You can set maxComments from 1 to 10,000 per Reddit post URL. The actor saves comments in real time until the limit is reached.
Can I control nested replies?
✅ Yes. Use replyLimit to control how many nested replies are kept under each parent’s replies array. Set 0 for unlimited depth. All replies still appear as flattened rows in the dataset.
What data fields are included in each record?
✅ Each dataset row includes url, comment_id, post_id, author, permalink, upvotes, content_type, parent_id, author_avatar, userUrl, contentText, created_time, and replies. See the Output section for examples.
Can I export Reddit comments to CSV or JSON?
✅ Yes. Results are saved to the Apify dataset, which you can export to CSV, JSON, or Excel. A grouped JSON is also saved in the key-value store (key: OUTPUT).
What happens if requests are blocked or rate-limited?
✅ The actor automatically falls back from direct connection to datacenter proxy, and then to residential proxy with retries. This improves reliability for large collections.
Is this suitable for Python or API-based workflows?
✅ Yes. It’s built with Python (aiohttp + Apify SDK) and produces dataset/KVS outputs ideal for API ingestion, making it a strong alternative to a DIY Reddit comment scraper GitHub script.
Closing CTA / Final thoughts
The Reddit Comment Scraper is built for clean, scalable extraction of Reddit thread comments. It delivers structured fields for analysis, real-time saving to datasets, and robust proxy fallback—ideal for marketers, analysts, developers, and researchers.
Use it to extract Reddit comments, export to CSV/JSON, and automate pipelines via the Apify API. Whether you’re running sentiment models, keyword analysis, or building dashboards, this Reddit comments scraping tool helps you move from raw threads to actionable insight fast.
Start extracting smarter Reddit insights today.
