Reddit Comment Scraper
Pricing
$19.99/month + usage
Reddit Comment Scraper
🧰 Reddit Comment Scraper (reddit-comment-scraper) collects Reddit comments & threads across subreddits — with author, score, timestamps, permalinks & nesting. 📊 Export CSV/JSON for research, sentiment, brand monitoring & SEO. ⚡ Ideal for analysts, marketers & community teams.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapeMesh
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
17 days ago
Last modified
Categories
Share
Reddit Comment Scraper
The Reddit Comment Scraper is a production-ready Apify actor that collects structured comments from Reddit post URLs — fast, reliable, and built for scale. It solves the hassle of manually navigating threads by turning any Reddit discussion into clean, analyzable records with authors, scores, permalinks, parent/child relationships, and nested replies. Whether you’re a marketer, developer, data analyst, or researcher, this reddit comments scraper tool helps you scrape reddit comments and export them to a usable reddit comment dataset for insights, NLP, and reporting at scale. Think of it as a Reddit thread comment scraper and reddit comment extractor optimized for workflow automation and data accuracy.
What data / output can you get?
Below are the exact fields pushed to the Apify dataset for each comment. You can export results as JSON or CSV from the Apify dataset UI.
| Data field | Description | Example value |
|---|---|---|
| url | The original Reddit post URL the comment belongs to | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/ |
| comment_id | Unique comment identifier | lhk1f7n |
| post_id | Reddit post thing ID (t3_…) | t3_1epeshq |
| author | Comment author username (or “[deleted]”) | AutoModerator |
| permalink | Direct link to the specific comment | https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/ |
| upvotes | Number of upvotes (score) | 42 |
| content_type | Content type label | text |
| parent_id | Parent comment ID without prefix (null for top-level) | lhk1f7n |
| author_avatar | Author avatar URL (if available; empty string otherwise) | |
| userUrl | Link to the user’s Reddit profile (empty if deleted) | https://www.reddit.com/user/AutoModerator/ |
| contentText | The comment text content, line breaks normalized | This is a comment… |
| created_time | Created timestamp placeholder (empty string if unavailable) | |
| replies | Array of nested reply objects (same schema), trimmed by replyLimit | [ … ] |
Notes:
- Each dataset item represents one comment and includes a nested replies array (which can be limited by replyLimit). All discovered comments are also emitted as individual flat records.
- You can export reddit comments to CSV or JSON directly from the Apify dataset.
- The actor also stores a grouped “by URL” structure in the key-value store under the OUTPUT key for convenience.
Key features
-
⚡️ Robust proxy fallback Automatically tries a direct connection, then datacenter proxy, then residential proxy with retries to keep your reddit comments crawler running even under blocks.
-
🧵 Nested conversation structure Captures parent/child relationships with a nested replies array per comment. Control how many replies are stored via replyLimit while still collecting all comments in the flat output.
-
📦 Bulk URL processing Process multiple Reddit post URLs in one run to build a larger reddit comment dataset efficiently.
-
💾 Clean, structured output Pushes consistent JSON records to the Apify dataset with author, score, permalinks, parent IDs, and more — perfect for analysis, NLP, and reporting.
-
🚫 No login or cookies required Works against public Reddit JSON endpoints; no authentication needed for scraping public threads.
-
🔁 Production-ready reliability Async HTTP requests, progress logging (e.g., “Collected N comments so far”), and defensive deduplication ensure dependable runs at scale.
-
🧰 Developer-friendly Built as a Python reddit comment scraper on Apify. Access results via the Apify API, and integrate into pipelines for automated reddit comment downloader workflows.
How to use Reddit Comment Scraper - step by step
- Create or log in to your Apify account.
- Open the “reddit-comment-scraper” actor in the Apify Console.
- Add your Reddit post URLs in startUrls (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/). The input accepts a string list; each item can be a full post URL.
- Configure limits:
- Set maxComments (1–10,000) to control how many comments to collect per URL.
- Set replyLimit (0–100) to control how many nested replies are stored per comment (0 means unlimited).
- (Optional) Configure proxyConfiguration. By default, no proxy is used. If Reddit rejects requests, the actor will automatically fall back to datacenter and then residential proxies with retries.
- Click Run. Watch progress logs as comments are collected and expanded via Reddit’s morechildren endpoint.
- Download results. Go to the Dataset tab to export JSON or CSV. A grouped-by-URL JSON is also saved under the Key-Value Store as OUTPUT.
Pro tip: Use the Apify API to pull dataset items programmatically and feed them into your analytics or enrichment workflow.
Use cases
| Use case | Description |
|---|---|
| Market research + topic analysis | Aggregate and analyze discussion threads to quantify sentiment and themes across public posts. |
| Brand monitoring + community insights | Track brand mentions and extract replies to understand user feedback within specific threads. |
| Content research + editorial | Compile user perspectives from targeted discussions to inform articles and summaries. |
| Data science + NLP training | Build a structured reddit comment dataset with parent/child context for modeling and classification. |
| Academic research + social analysis | Study public discourse patterns using thread-level structures and upvote signals. |
| Developer pipelines (API) | Use the Apify API to automate scrape reddit comments workflows and feed data into ETL/ELT pipelines. |
Why choose Reddit Comment Scraper?
This actor prioritizes precision, automation, and reliability over brittle browser extensions or ad-hoc scripts.
- ✅ Accurate, structured fields for every comment, including parent/child links
- 🌐 No login required — collects publicly available thread data
- 📈 Scales across many URLs with async requests and intelligent deduplication
- 🧪 Developer-first design — Python-based, API-friendly, automation-ready
- 🛡️ Resilient proxy fallback (direct → datacenter → residential) to reduce blocks
- 💾 Easy exports (CSV/JSON) and grouped output for downstream processing
- 🧭 Better than unstable alternatives — production-ready infrastructure on Apify
In short, it’s a reliable reddit API comments scraper that turns threads into analytics-ready data, fast.
Is it legal / ethical to use Reddit Comment Scraper?
Yes — when done responsibly. This actor scrapes only publicly available Reddit content and does not access private or password-protected data.
Guidelines for compliant use:
- Collect only public data and respect Reddit’s Terms of Service.
- Avoid scraping private communities or content behind authentication.
- Ensure your use complies with data protection laws (e.g., GDPR, CCPA).
- Use the data responsibly (e.g., analysis, research) and avoid spam or misuse.
- Consult your legal team if you have edge cases or questions.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"],"maxComments": 1000,"replyLimit": 0,"proxyConfiguration": {"useApifyProxy": false}}
Input fields
- startUrls (array, required): List one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
- Default: none
- maxComments (integer, optional): Maximum number of comments to fetch per URL.
- Range: 1–10,000
- Default: 1000
- replyLimit (integer, optional): Maximum number of replies to store per comment in the nested replies field. Set to 0 for unlimited. (All replies are still collected in the flattened output.)
- Range: 0–100
- Default: 0
- proxyConfiguration (object, optional): Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it will fall back to datacenter proxy, then residential proxy with retries.
- Default: { "useApifyProxy": false }
Example dataset record (single comment)
{"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/","comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": null,"author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}
Grouped output saved to Key-Value Store (key: OUTPUT)
{"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [{"comment_id": "lhk1f7n","post_id": "t3_1epeshq","author": "AutoModerator","permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/","upvotes": 1,"content_type": "text","parent_id": "1epeshq","author_avatar": "","userUrl": "https://www.reddit.com/user/AutoModerator/","contentText": "Comment text here...","created_time": "","replies": []}]}
Notes:
- created_time and author_avatar may be empty strings when not present in Reddit’s JSON.
- parent_id is null for top-level comments and contains the normalized parent ID for replies (prefixes like t1_/t3_ are stripped).
FAQ
Is there a free trial?
Yes. The actor offers trial minutes on Apify so you can test before subscribing. You’ll see current trial availability and pricing on the actor’s Apify Store page.
Do I need to log in or provide cookies?
No. The scraper works with public Reddit JSON endpoints and does not require authentication. It fetches publicly available comments only.
How many comments can I scrape per URL?
You can set maxComments from 1 to 10,000 per URL. The actor will collect up to this limit, expanding “more” placeholders via the Reddit API.
Can it scrape nested replies?
Yes. Nested replies are traversed and included. The replyLimit parameter controls how many replies are stored in the replies array per comment (0 means unlimited). All discovered replies are still emitted as individual flat records.
What happens if Reddit blocks the requests?
The actor automatically falls back from a direct connection to a datacenter proxy and then to a residential proxy with retries. This increases resilience during scraping.
Can I export results to CSV?
Yes. All comments are stored in the Apify dataset, which supports exports to JSON, CSV, and more. You can also access records via the Apify API to build pipelines.
Does it work for private subreddits or deleted comments?
No. It collects only publicly accessible content. Deleted or removed comments will appear as “[deleted]” where applicable.
Can I integrate this with Python or APIs?
Yes. This is a Python-based actor on Apify. You can pull dataset items via the Apify API or integrate into your own python reddit comment scraper workflows and automation stacks.
Closing CTA / Final thoughts
The Reddit Comment Scraper is built for teams that need accurate, scalable extraction of Reddit thread comments. It delivers structured records with authors, scores, permalinks, and nested replies — ideal for market research, analytics, and NLP.
Marketers, developers, data analysts, and researchers can export reddit comments to CSV/JSON, automate runs via the Apify API, and build a reliable reddit comments crawler into their pipelines. Start turning public Reddit discussions into actionable datasets — quickly, safely, and at scale.
