Reddit Comment Scraper
Pricing
$19.99/month + usage
Reddit Comment Scraper
💬 Reddit Comment Scraper (reddit-comment-scraper) captures comments from posts & subreddits—text, authors, scores, timestamps, permalinks & nesting. 🔎 Export CSV/JSON for research, social listening, sentiment & trend analysis. ⚡ Fast, reliable, API-ready.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
Scraply
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
16 days ago
Last modified
Categories
Share
Reddit Comment Scraper
Reddit Comment Scraper is a production-ready Apify actor that collects structured comment data from public Reddit post URLs — a fast, reliable reddit comment extractor to scrape reddit comments at scale for research, social listening, and analytics. Built in Python, it works as a focused reddit thread comment scraper to capture text, authors, scores, permalinks, and nesting, and it’s API-ready for teams that need to export reddit comments to CSV or JSON. Perfect for marketers, developers, data analysts, and researchers, it enables large-scale monitoring and insight generation across subreddits.
What data / output can you get?
Below are the exact fields pushed to the Apify dataset for each comment record. You can export results to JSON, CSV, or Excel directly from the Apify dataset UI or via API.
| Data field | Description | Example value |
|---|---|---|
| url | The source Reddit post URL this comment belongs to | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/ |
| comment_id | Unique Reddit comment identifier | lhk1f7n |
| post_id | Reddit post identifier (thing id format) | t3_1epeshq |
| author | Comment author username (or “[deleted]”) | AutoModerator |
| userUrl | Direct link to the author’s Reddit profile (empty for “[deleted]”) | https://www.reddit.com/user/AutoModerator/ |
| permalink | Direct link to the specific comment | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/ |
| upvotes | Comment score (upvotes) | 1 |
| content_type | Content type label | text |
| parent_id | Parent thing ID (comment or post); null if none | t3_1epeshq |
| contentText | Cleaned text content of the comment | Comment text here... |
| created_time | Timestamp (if present in source; may be empty) | |
| author_avatar | Author avatar URL if available (empty by default) | |
| replies | Array of nested replies kept per replyLimit (each reply object has the same fields) | [] |
Notes:
- Dataset items are flattened at the comment level for easy analysis, while each item also includes a “replies” array to preserve conversation structure up to your configured reply limit.
- In addition to the dataset, the actor saves a grouped JSON to the key-value store under key “OUTPUT” in the shape: { "<post_url>": [
Key features
-
⚡ Automatic proxy fallback for reliability
Built-in smart fallback from direct connection to datacenter and then residential proxies with retries, so your reddit comment scraping bot stays resilient under blocking. -
📦 Scalable bulk URL processing
Feed multiple Reddit post URLs in one run and handle large threads — ideal for a reddit post comments downloader or reddit comment scraping tool workflows. -
🧵 Nested replies with depth control
Capture comment threads with a configurable replyLimit that controls how many replies are stored per comment in the nested “replies” field. -
🚀 Async, high-throughput architecture
Implemented with aiohttp and async/await to collect more comments faster and reduce latency across large jobs. -
🔌 API-ready, easy exporting
Access results via the Apify API and export reddit comments to CSV, JSON, or Excel — great for pipelines and dashboards. -
🔒 No API keys or login required
Works on publicly available Reddit JSON responses; a practical reddit comment scraper without API credentials. -
🧪 Flexible sort orders
Supports Reddit’s standard sort orders (hot, new, top, controversial, old) for more control over comment retrieval. -
🛠️ Production-grade logging and progress tracking
Clear progress updates (e.g., “Collected N comments so far”) and a final scraping summary for auditability.
How to use Reddit Comment Scraper - step by step
- Create or log in to your Apify account.
- Open the Apify Console and navigate to Actors, then find “reddit-comment-scraper”.
- Add input data:
- Paste one or more Reddit post URLs into startUrls.
- Optionally set maxComments (per URL) and replyLimit.
- Optionally configure proxyConfiguration.
- Click Run to start the job. The actor will fetch the post’s JSON, follow “more” comment placeholders, and expand nested threads.
- Monitor logs and progress in real-time to see how many comments have been collected.
- When finished, open the Dataset tab to review individual comment records.
- Export results to CSV, JSON, or Excel, or pull data via the Apify API for downstream workflows.
Pro Tip: Use the Apify API to integrate the dataset into analytics stacks or automations (e.g., schedule recurring runs for social listening and sentiment tracking).
Use cases
| Use case | Description |
|---|---|
| Market research + topic mining | Aggregate large volumes of thread comments to quantify opinions and extract themes around products, competitors, or trends. |
| Sentiment analysis for social listening | Feed comment text and metadata into NLP models to track sentiment shifts and emerging narratives. |
| Community & subreddit monitoring | Monitor discussions across specific subreddits by scraping Reddit comments from key threads regularly. |
| Academic & policy research | Collect structured comment-level datasets for behavioral studies and qualitative analysis. |
| Developer API pipeline | Use the Apify API to automate a reddit comment scraper Python workflow and stream datasets into your systems. |
| Content aggregation & curation | Capture insightful comments and reply threads to curate quotes, FAQs, or knowledge bases. |
| Competitive/brand analysis | Track brand mentions, upvotes, and discussion depth around campaigns or launches. |
Why choose Reddit Comment Scraper?
Built for precision, automation, and reliability, this actor outperforms manual tools and unstable extensions for scraping Reddit post comments.
- ✅ Accurate, structured outputs: Clean fields for authors, scores, permalinks, parent-child relationships, and content.
- 🌍 Scales to long threads: Expands “more” placeholders and handles large discussions efficiently.
- 💻 Developer-friendly & API-ready: Fetch datasets via REST API and integrate into Python pipelines.
- 🛡️ Safe & public-only: Scrapes publicly available content; no login or API keys required.
- 💪 Resilient infrastructure: Automatic proxy fallback keeps collection running when direct access is blocked.
- 💰 Cost-effective & predictable: Designed for reliable, repeatable workloads without brittle browser automation.
In short: a production-grade reddit comment scraping tool vs. extension-based alternatives.
Is it legal / ethical to use Reddit Comment Scraper?
Yes — when done responsibly. This actor collects publicly available data from Reddit post pages and does not access private or authenticated content.
Guidelines for compliant use:
- Scrape only public pages and respect platform terms.
- Do not target private subreddits or password-protected content.
- Ensure your use complies with applicable laws (e.g., GDPR, CCPA).
- Use the data ethically — for analysis and research, not spam.
For edge cases, confirm requirements with your legal team.
Input parameters & output format
Example JSON input:
{"startUrls": ["https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"],"maxComments": 250,"replyLimit": 0,"proxyConfiguration": {"useApifyProxy": false}}
Input fields (from the actor’s input schema):
- startUrls (array, required): List one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
Default: none. - maxComments (integer, optional): Maximum number of comments to fetch per URL. Range: 1–10,000.
Default: 1000. - replyLimit (integer, optional): Maximum number of replies to store per comment in the nested “replies” field. Set to 0 for unlimited. (All replies are still collected in the flattened output.) Range: 0–100.
Default: 0. - proxyConfiguration (object, optional): Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it will fallback to datacenter proxy, then residential proxy with retries.
Prefill: { "useApifyProxy": false }.
Output: dataset items (one per comment), with the following fields:
- url, comment_id, post_id, author, permalink, upvotes, content_type, parent_id, author_avatar, userUrl, contentText, created_time, replies
Example dataset item:
{"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/","comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "t3_1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}
Also saved to the key-value store as grouped output under key “OUTPUT”:
{"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [{"comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "t3_1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}]}
Notes:
- Fields author_avatar and created_time may be empty when not provided by Reddit’s response.
- The “replies” array stores nested replies per the replyLimit, while each reply is also included as its own record in the dataset.
FAQ
Is there a free tier or trial to test it?
Yes. The listing includes 120 trial minutes so you can evaluate the actor. For ongoing use, there’s a flat monthly price of $19.99 shown on the Apify Store. Actual billing depends on your Apify plan and usage.
Do I need Reddit API keys or login?
No. It works without login or OAuth — a reddit comment scraper without API keys. The actor uses publicly available JSON responses from Reddit post pages.
What types of content can it scrape?
It scrapes comments from public Reddit post URLs, including nested replies. It does not access private subreddits or require authentication.
How many comments can I scrape per URL?
You can set maxComments between 1 and 10,000 per URL. The actor will traverse “more” placeholders to retrieve additional batches until your limit is reached.
Does it capture nested replies?
Yes. Use replyLimit to control how many replies are stored per comment in the “replies” field. Set 0 for unlimited storage of nested replies (flattened comments are still collected either way).
What if Reddit blocks my requests?
The actor includes smart proxy fallback: it first tries a direct connection, then falls back to datacenter proxy, and finally residential proxy with retries to maximize success.
Can I export results to CSV?
Yes. Open the Dataset tab after the run and use the built-in export options to download CSV, JSON, or Excel. You can also fetch data via the Apify API.
Can I scrape subreddit comments in bulk?
Yes, by providing multiple Reddit post URLs from your target subreddits. The tool functions as a scalable reddit thread comment scraper for bulk processing.
Closing CTA / Final thoughts
Reddit Comment Scraper is built for accurate, scalable extraction of Reddit post comments. It delivers structured records with authors, scores, permalinks, and nested replies, ready for research, social listening, and analytics.
Whether you’re a marketer, developer, data analyst, or researcher, you can run bulk jobs, export to CSV/JSON, and integrate via the Apify API for automation. Start collecting smarter Reddit insights today — and turn conversations into measurable, repeatable intelligence.
