Reddit Comment Scraper avatar

Reddit Comment Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Comment Scraper

Reddit Comment Scraper

💬 Reddit Comment Scraper (reddit-comment-scraper) extracts comments from posts and subreddits — with author, score, timestamp, nesting & permalinks. 📊 Export CSV/JSON. 🔍 Ideal for market research, social listening, brand monitoring & academic analysis. 🚀 Fast, scalable.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeFlow

ScrapeFlow

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

24 days ago

Last modified

Share

Reddit Comment Scraper

Reddit Comment Scraper is a fast, scalable Reddit comment extractor that lets you scrape Reddit comments from public post URLs and export structured data for analysis. Built for marketers, developers, data analysts, and researchers, it solves the pain of collecting threaded discussions at scale — including authors, upvotes, nesting, and permalinks — and enables automated workflows to download Reddit comments and export Reddit comments to CSV or JSON for further processing.

What data / output can you get?

Below are the exact fields this Reddit thread comments scraper produces in the Apify dataset (one record per comment):

Data typeDescriptionExample value
urlThe original Reddit post URL the comment belongs tohttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/
comment_idUnique comment identifierlhk1f7n
post_idPost identifier (prefixed with t3_)t3_1epeshq
authorComment author username (or [deleted])AutoModerator
permalinkDirect link to the specific commenthttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/
upvotesNumber of upvotes (score)1
content_typeContent type markertext
parent_idParent comment/post ID (normalized; null for top-level)t3_1epeshq
author_avatarAuthor avatar URL (empty if not available)
userUrlLink to the author’s Reddit profile (blank if [deleted])https://www.reddit.com/user/AutoModerator/
contentTextThe comment text with newlines normalizedComment text here...
created_timeTimestamp placeholder (empty if not available)
repliesArray of nested reply objects attached to this comment (size limited by replyLimit)[ { ...nested reply object... } ]

Notes:

  • You can export results in multiple formats (CSV, JSON, Excel) directly from the Apify dataset.
  • In addition to the dataset, the actor also stores a grouped object in the key‑value store under OUTPUT, mapping each post URL to an array of its comments — useful for bulk analysis and to download Reddit comments per-thread.

Key features

  • 🚀 Fast, scalable extraction Built with async I/O to scrape Reddit comments efficiently across multiple post URLs in parallel. Ideal when you need to scrape subreddit comments by feeding many thread URLs.

  • 🧵 Threaded replies with nesting Captures nested replies and keeps a configurable number in each comment’s replies array, making it a reliable Reddit comment crawler for conversation structure.

  • 🔁 Automatic proxy fallback Smart fallback from direct connection to datacenter, then residential proxies with retries — improving resilience when you scrape Reddit comments at scale.

  • 📦 Structured outputs for analysis Clean JSON schema includes author, score, parent-child relationships, permalinks, and more, so you can export Reddit comments to CSV or JSON for dashboards and NLP.

  • 💻 Developer-friendly (API-ready, Python-based) Implemented in Python and deployable via the Apify API — a practical choice if you’re building a Reddit comment scraper Python workflow or integrating with data pipelines.

  • ⚙️ Robust comment expansion Uses Reddit’s JSON endpoints and the /api/morechildren flow to retrieve additional comments, functioning as a dependable Reddit API scrape comments solution without login.

  • 📊 Progress logging and summaries Real-time logs show collected counts, plus a final per-URL summary to keep large runs transparent and manageable.

How to use Reddit Comment Scraper - step by step

  1. Sign in to Apify
    Create or log in to your Apify account at https://console.apify.com.

  2. Open the actor
    Search for “reddit-comment-scraper” and open the actor page.

  3. Add input data
    Paste one or more Reddit post URLs into startUrls. You can add:

  1. Configure limits and replies
    Set maxComments to control how many comments to collect per URL (1–10,000; default 1,000). Set replyLimit to control how many nested replies are stored in each comment’s replies array (0 = unlimited).

  2. Configure proxy (optional)
    By default, no proxy is used. If Reddit blocks requests, the actor automatically falls back to datacenter and then residential proxies with retries.

  3. Run the actor
    Click Run. The job will fetch the post JSON, expand missing comments via /api/morechildren, and push structured items to the dataset.

  4. Monitor progress
    Follow logs to see the number of comments collected per thread and a final scraping summary.

  5. Download and integrate
    Go to the Output tab to download results as JSON, CSV, or Excel. Use the Apify API to automate pipelines, or connect the dataset to BI tools and data warehouses.

Pro Tip: For large-scale Reddit comment mining, queue many post URLs from target subreddits and automate exports with the Apify API to keep your Reddit comment data scraper in sync with your analytics stack.

Use cases

Use case nameDescription
Market research + voice of customerAggregate discussions from target threads to quantify themes, objections, and sentiment for product teams and PMMs.
Social listening for brandsMonitor comment sentiment and engagement on brand- or topic-related threads to inform community and support strategies.
Content research + curationMine high-signal comments to curate insights, FAQs, and examples for blogs, newsletters, or knowledge bases.
Academic research + NLP datasetsCollect structured Reddit comment datasets for linguistics, topic modeling, and sentiment analysis at scale.
Competitive analysis in subredditsTrack competitor mentions and user feedback by downloading Reddit comments from relevant threads over time.
Data engineering pipeline (API)Feed structured JSON/CSV into warehouses and ML pipelines via the Apify API for downstream analytics and dashboards.

Why choose Reddit Comment Scraper?

The Reddit Comment Scraper is built for precision, automation, and reliability — a production-ready Reddit comment extractor that outperforms fragile browser extensions.

  • ✅ Accurate, structured outputs with IDs, permalinks, parent/child links, and scores
  • 🌍 No login required — works on publicly available Reddit JSON endpoints
  • 📈 Scales to thousands of comments per URL with batching and retries
  • 🧰 Developer access via Apify API — ideal for Python-based data pipelines
  • 🔒 Ethical-by-design: focuses on public data and avoids private/authenticated content
  • 💾 Flexible exports: easily export Reddit comments to CSV, JSON, or Excel
  • 🛠️ Robust infrastructure: automatic proxy fallback (direct → datacenter → residential) with retry logic

In short, it’s a Reddit thread comments scraper engineered for consistency and scale — not a one-off browser hack.

Yes — when used responsibly. This tool accesses publicly available Reddit content only and does not log in or access private data.

Guidelines for compliant use:

  • Scrape only public URLs and respect Reddit’s Terms of Service.
  • Adhere to applicable regulations (e.g., GDPR, CCPA) and process personal data lawfully.
  • Avoid collecting or using data in ways that could be considered abusive or spammy.
  • Consult your legal team for edge cases and jurisdiction-specific requirements.

The actor is designed to collect public comment data for legitimate research and analytics purposes.

Input parameters & output format

Example input (JSON)

{
"startUrls": [
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"
],
"maxComments": 1000,
"replyLimit": 0,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Parameters

ParameterTypeRequiredDefaultDescription
startUrlsarrayYesList one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
maxCommentsintegerNo1000Maximum number of comments to fetch per URL. Min 1, max 10,000.
replyLimitintegerNo0Maximum number of replies to store per comment in the nested replies field. Set to 0 for unlimited. (All replies are still collected in the flattened output.)
proxyConfigurationobjectNo{ "useApifyProxy": false }Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it falls back to datacenter, then residential proxies with retries.

Example dataset item (one comment per record)

{
"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/",
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "t3_1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}

Example grouped output (key‑value store: key = "OUTPUT")

{
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [
{
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "t3_1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}
]
}

Notes:

  • Some fields (e.g., author_avatar, created_time) may be empty if Reddit does not provide them.
  • The replies array is stored per comment and limited by replyLimit, while the flattened dataset includes all discovered comments.

FAQ

Do I need a Reddit account or API key to use this?

✅ No. The actor uses publicly available Reddit JSON endpoints and does not require login, cookies, or API keys. It’s a straightforward Reddit comment scraping without API authentication.

Can it scrape nested replies from threads?

✅ Yes. Nested replies are captured, and you can control how many are stored in each comment’s replies array via replyLimit. All replies are still collected in the flattened output even when the nested array is limited.

How many comments can I collect per URL?

✅ You can set maxComments from 1 to 10,000 per URL. The default is 1,000. The actor trims results to this limit after expanding additional comments via /api/morechildren.

What happens if Reddit blocks or rate-limits requests?

✅ The actor automatically falls back: it tries a direct connection first, then datacenter proxy, and finally residential proxy with retries. This improves reliability for large runs.

Can I export results to CSV and JSON?

✅ Yes. All results are stored in the Apify dataset, so you can export Reddit comments to CSV, JSON, or Excel. A grouped JSON object is also saved under the OUTPUT key in the key‑value store.

Does this scrape entire subreddits?

ℹ️ It targets Reddit post URLs. To scrape subreddit comments broadly, supply multiple post URLs from the subreddit. This approach scales well for a Reddit comment mining workflow.

Is Pushshift used by this tool?

❌ No. This actor uses Reddit’s public JSON endpoints (including /api/morechildren) and does not rely on Pushshift.

Is there a free trial?

✅ Yes. This actor includes trial minutes on Apify (120 trial minutes are available) so you can test before subscribing.

Can developers integrate this with Python or the API?

✅ Yes. It’s built in Python and accessible via the Apify API, making it a great fit for Reddit comment scraper Python pipelines, ETL jobs, and automated workflows.

Closing CTA / Final thoughts

Reddit Comment Scraper is built to extract structured, high-quality comment data from Reddit threads at scale. With nested replies, author metadata, permalinks, and robust proxy fallback, it powers market research, social listening, and data science workflows.

Marketers, developers, analysts, and researchers can quickly download Reddit comments, export to CSV/JSON, and automate pipelines via the Apify API. Start collecting richer Reddit discussion data today and turn threads into actionable insight.