Reddit Comment Scraper
Pricing
$19.99/month + usage
Reddit Comment Scraper
Scrape Reddit comments with ease 💬👽 Extract comment text, usernames, scores, timestamps, replies, and thread details from Reddit posts. Perfect for sentiment analysis, audience research, trend tracking, and community insights. Turn Reddit conversations into actionable data fast 🚀
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
Scrapium
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
16 days ago
Last modified
Categories
Share
Reddit Comment Scraper
Reddit Comment Scraper is a Python-based tool that extracts structured comment data from Reddit post threads — a fast, reliable Reddit comment extractor for marketers, developers, analysts, and researchers. It solves the tedious task of trying to scrape Reddit comments by turning sprawling discussions into a clean Reddit comments dataset you can analyze, export, and integrate. Built as a Reddit API comment scraper using public JSON endpoints, it helps you download Reddit comments at scale for sentiment analysis, trend tracking, and audience insights.
What data / output can you get?
Below are the exact fields this Reddit thread comment scraper collects and stores. Each record represents a single comment associated with a post URL.
| Data field | Description | Example value |
|---|---|---|
| url | The Reddit post URL the comment belongs to (dataset only) | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/ |
| comment_id | Unique comment identifier | lhk1f7n |
| post_id | Post identifier (Reddit thing ID, prefixed with t3_) | t3_1epeshq |
| author | Comment author username; “[deleted]” if removed | AutoModerator |
| permalink | Direct link to the specific comment | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/ |
| upvotes | Number of upvotes (score) on the comment | 1 |
| content_type | Type of content; set to “text” | text |
| parent_id | Parent ID if it’s a reply; normalized without prefix or null | t3_1epeshq |
| author_avatar | Author avatar URL if available (empty string otherwise) | "" |
| userUrl | Link to user’s Reddit profile; empty if “[deleted]” | https://www.reddit.com/user/AutoModerator/ |
| contentText | The plain-text comment (newlines normalized) | Comment text here... |
| created_time | Created time if available (empty string otherwise) | "" |
| replies | Nested replies captured under each comment (array; see notes) | [] |
Notes:
- Results are available as a structured dataset (one row per comment) and as a grouped JSON (comments array per URL) saved to the Key-Value Store.
- You can export to JSON, CSV, or Excel directly from the Apify dataset.
- The “replies” array is stored per comment to reflect thread structure; all comments are also emitted as flattened records.
Key features
- 🔁 Automatic proxy fallback — Robust access strategy that tries direct connection first, then falls back to datacenter and finally residential proxies with retries for reliability.
- 📚 Bulk URL processing — Add multiple Reddit post URLs to scrape comments across many threads in a single run.
- 🧵 Nested replies support — Replies are traversed and emitted as individual records; the per-comment “replies” array is controlled by a configurable replyLimit.
- 🧱 Structured JSON output — Clean, ready-to-analyze fields including author, text, scores, permalinks, parent IDs, and nested replies.
- 📦 Dual output formats — Get individual comment records in the Dataset and grouped comments-per-URL in the Key-Value Store (under the OUTPUT key).
- 🐍 Python-based reliability — Built with aiohttp and the Apify SDK for stability, clear logging, and scalable runs.
- 🧹 Smart deduplication — Comment IDs are deduplicated defensively to ensure tidy datasets.
- 📤 Easy exporting — Download Reddit comments as JSON/CSV/Excel from Apify, or fetch programmatically via API to power your Reddit comment scraper tool workflows.
How to use Reddit Comment Scraper - step by step
- Create or log in to your Apify account.
- Open the Apify Console and navigate to Actors, then find “reddit-comment-scraper”.
- Paste one or more Reddit post URLs into the startUrls field (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
- Set maxComments to control how many comments you collect per URL (1–10,000).
- Set replyLimit to control how many replies are kept in each comment’s nested replies array (0 = unlimited).
- (Optional) Configure proxyConfiguration if you want to force proxy use. The actor will automatically attempt fallback if requests are blocked.
- Click Run and monitor real-time logs (you’ll see progress updates like “Collected X comments so far”).
- Access your results in the OUTPUT tab: download the Dataset as CSV/JSON/Excel or fetch the grouped JSON from the Key-Value Store.
Pro tip: Use the Apify API to trigger runs and stream results into your analysis stack or data pipeline — ideal for building a repeatable Reddit comment exporter.
Use cases
| Use case | Description |
|---|---|
| Market research – discussion mining | Analyze community opinions and themes across threads to inform positioning, messaging, and product strategy. |
| Sentiment analysis – NLP-ready data | Collect contentText at scale to train or evaluate models and dashboards that gauge public sentiment. |
| Trend tracking – topic monitoring | Track engagement and comment patterns on specific posts to surface emerging topics and narratives. |
| Community monitoring – moderation intel | Export comment-level data for oversight, reporting, and community health insights. |
| Academic research – social datasets | Build reproducible Reddit comments dataset samples for qualitative/quantitative studies. |
| Data engineering – API pipeline | Orchestrate automated runs, export JSON/CSV, and feed downstream data warehouses and BI tools. |
Why choose Reddit Comment Scraper?
- 🎯 Precision-first extraction focused on clean comment fields, parent/child relationships, and identifiers.
- ⚡ Built for scale: process multiple post URLs and collect up to 10,000 comments per URL.
- 🧪 Developer-friendly Python actor with structured outputs for easy ETL into analytics stacks.
- 🛡️ Public data only: designed to collect from publicly available Reddit content.
- 🌐 Reliable infrastructure with direct → datacenter → residential proxy fallback and retry logic.
- 💾 Dual outputs (Dataset + grouped JSON) make it a flexible Reddit comment exporter for varied workflows.
- 💸 Try before you buy with available trial minutes on Apify, then scale to production with a simple monthly plan.
Is it legal / ethical to use Reddit Comment Scraper?
Yes — when used responsibly. This actor is designed to collect publicly available Reddit content only and does not access private or authenticated data.
Guidelines for compliant use:
- Only scrape publicly accessible Reddit posts and comments.
- Do not attempt to access private communities or protected content.
- Respect Reddit’s terms and applicable laws (e.g., GDPR, CCPA) for your use case.
- Use the data for analysis, research, and insights — avoid spam or misuse.
- Consult your legal team for edge cases and jurisdiction-specific requirements.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"],"maxComments": 1000,"replyLimit": 0,"proxyConfiguration": {"useApifyProxy": false}}
Input fields
- startUrls
- Type: array
- Required: Yes
- Default: —
- Description: List one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/). Accepts an array of strings; the actor also handles objects with a url property.
- maxComments
- Type: integer
- Required: No
- Default: 1000
- Description: Maximum number of comments to fetch per URL. Minimum 1, maximum 10000.
- replyLimit
- Type: integer
- Required: No
- Default: 0
- Description: Maximum number of replies to store per comment in the nested replies field. Set to 0 for unlimited. All replies are still traversed and emitted as flattened output records.
- proxyConfiguration
- Type: object
- Required: No
- Default: { "useApifyProxy": false }
- Description: Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it will fallback to datacenter proxy, then residential proxy with retries.
Example dataset item (one comment per row)
{"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/","comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "t3_1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}
Example grouped output (Key-Value Store, key: OUTPUT)
{"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [{"comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "t3_1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}]}
Notes on fields that may be empty:
- author may be “[deleted]” for removed accounts.
- userUrl is empty for “[deleted]” authors.
- created_time and author_avatar are empty strings when not available.
- parent_id is null for top-level comments.
- The replies array may be large; replyLimit controls how many are stored per comment, but all replies are still emitted as individual records in the dataset.
FAQ
Is there a free tier or trial?
Yes. The actor offers trial minutes on Apify (e.g., 120 trial minutes) so you can test before subscribing. For ongoing use, a simple monthly plan is available.
Do I need to log in to Reddit or provide cookies?
No. The actor uses publicly available JSON endpoints and does not require Reddit login or cookies to scrape Reddit comments.
How many comments can I collect per URL?
You can set maxComments from 1 up to 10,000 per URL. The actor trims the output to your specified limit.
Does it capture nested replies?
Yes. All replies are traversed and emitted in the flattened output. The replyLimit parameter controls how many replies are stored per comment in the nested replies array.
What formats can I export?
You can export the Dataset to JSON, CSV, or Excel from the Apify Console, or access results programmatically via the Apify API. A grouped JSON is also saved in the Key-Value Store under the OUTPUT key.
Does this use the official Reddit API or PRAW?
No. It fetches public JSON endpoints on reddit.com directly (a Reddit API comment scraper approach without PRAW). It does not require OAuth.
What happens if Reddit blocks my requests?
The actor automatically falls back from direct connection to datacenter proxies and then to residential proxies with retries, maximizing the chance of success.
Can I scrape subreddit comments across multiple threads?
Provide the specific Reddit post URLs you want to process. You can list multiple post URLs to crawl many threads in one run.
Closing CTA / Final thoughts
Reddit Comment Scraper is built to turn Reddit conversations into clean, structured data for analysis. With automatic proxy fallback, bulk post URL processing, and dual-format outputs, it’s a dependable Reddit comment scraper Python tool for marketers, developers, analysts, and researchers. Export to CSV/JSON for dashboards or connect via the Apify API to automate your pipeline. Start extracting smarter Reddit insights and build your next Reddit comments dataset with confidence.
