Reddit Comment Scraper avatar

Reddit Comment Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Comment Scraper

Reddit Comment Scraper

💬 Reddit Comment Scraper (reddit-comment-scraper) captures comments from posts & subreddits—text, authors, scores, timestamps, permalinks & nesting. 🔎 Export CSV/JSON for research, social listening, sentiment & trend analysis. ⚡ Fast, reliable, API-ready.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

Scraply

Scraply

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

16 days ago

Last modified

Share

Reddit Comment Scraper

Reddit Comment Scraper is a production-ready Apify actor that collects structured comment data from public Reddit post URLs — a fast, reliable reddit comment extractor to scrape reddit comments at scale for research, social listening, and analytics. Built in Python, it works as a focused reddit thread comment scraper to capture text, authors, scores, permalinks, and nesting, and it’s API-ready for teams that need to export reddit comments to CSV or JSON. Perfect for marketers, developers, data analysts, and researchers, it enables large-scale monitoring and insight generation across subreddits.

What data / output can you get?

Below are the exact fields pushed to the Apify dataset for each comment record. You can export results to JSON, CSV, or Excel directly from the Apify dataset UI or via API.

Data fieldDescriptionExample value
urlThe source Reddit post URL this comment belongs tohttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/
comment_idUnique Reddit comment identifierlhk1f7n
post_idReddit post identifier (thing id format)t3_1epeshq
authorComment author username (or “[deleted]”)AutoModerator
userUrlDirect link to the author’s Reddit profile (empty for “[deleted]”)https://www.reddit.com/user/AutoModerator/
permalinkDirect link to the specific commenthttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/
upvotesComment score (upvotes)1
content_typeContent type labeltext
parent_idParent thing ID (comment or post); null if nonet3_1epeshq
contentTextCleaned text content of the commentComment text here...
created_timeTimestamp (if present in source; may be empty)
author_avatarAuthor avatar URL if available (empty by default)
repliesArray of nested replies kept per replyLimit (each reply object has the same fields)[]

Notes:

  • Dataset items are flattened at the comment level for easy analysis, while each item also includes a “replies” array to preserve conversation structure up to your configured reply limit.
  • In addition to the dataset, the actor saves a grouped JSON to the key-value store under key “OUTPUT” in the shape: { "<post_url>": [

Key features

  • ⚡ Automatic proxy fallback for reliability
    Built-in smart fallback from direct connection to datacenter and then residential proxies with retries, so your reddit comment scraping bot stays resilient under blocking.

  • 📦 Scalable bulk URL processing
    Feed multiple Reddit post URLs in one run and handle large threads — ideal for a reddit post comments downloader or reddit comment scraping tool workflows.

  • 🧵 Nested replies with depth control
    Capture comment threads with a configurable replyLimit that controls how many replies are stored per comment in the nested “replies” field.

  • 🚀 Async, high-throughput architecture
    Implemented with aiohttp and async/await to collect more comments faster and reduce latency across large jobs.

  • 🔌 API-ready, easy exporting
    Access results via the Apify API and export reddit comments to CSV, JSON, or Excel — great for pipelines and dashboards.

  • 🔒 No API keys or login required
    Works on publicly available Reddit JSON responses; a practical reddit comment scraper without API credentials.

  • 🧪 Flexible sort orders
    Supports Reddit’s standard sort orders (hot, new, top, controversial, old) for more control over comment retrieval.

  • 🛠️ Production-grade logging and progress tracking
    Clear progress updates (e.g., “Collected N comments so far”) and a final scraping summary for auditability.

How to use Reddit Comment Scraper - step by step

  1. Create or log in to your Apify account.
  2. Open the Apify Console and navigate to Actors, then find “reddit-comment-scraper”.
  3. Add input data:
    • Paste one or more Reddit post URLs into startUrls.
    • Optionally set maxComments (per URL) and replyLimit.
    • Optionally configure proxyConfiguration.
  4. Click Run to start the job. The actor will fetch the post’s JSON, follow “more” comment placeholders, and expand nested threads.
  5. Monitor logs and progress in real-time to see how many comments have been collected.
  6. When finished, open the Dataset tab to review individual comment records.
  7. Export results to CSV, JSON, or Excel, or pull data via the Apify API for downstream workflows.

Pro Tip: Use the Apify API to integrate the dataset into analytics stacks or automations (e.g., schedule recurring runs for social listening and sentiment tracking).

Use cases

Use caseDescription
Market research + topic miningAggregate large volumes of thread comments to quantify opinions and extract themes around products, competitors, or trends.
Sentiment analysis for social listeningFeed comment text and metadata into NLP models to track sentiment shifts and emerging narratives.
Community & subreddit monitoringMonitor discussions across specific subreddits by scraping Reddit comments from key threads regularly.
Academic & policy researchCollect structured comment-level datasets for behavioral studies and qualitative analysis.
Developer API pipelineUse the Apify API to automate a reddit comment scraper Python workflow and stream datasets into your systems.
Content aggregation & curationCapture insightful comments and reply threads to curate quotes, FAQs, or knowledge bases.
Competitive/brand analysisTrack brand mentions, upvotes, and discussion depth around campaigns or launches.

Why choose Reddit Comment Scraper?

Built for precision, automation, and reliability, this actor outperforms manual tools and unstable extensions for scraping Reddit post comments.

  • ✅ Accurate, structured outputs: Clean fields for authors, scores, permalinks, parent-child relationships, and content.
  • 🌍 Scales to long threads: Expands “more” placeholders and handles large discussions efficiently.
  • 💻 Developer-friendly & API-ready: Fetch datasets via REST API and integrate into Python pipelines.
  • 🛡️ Safe & public-only: Scrapes publicly available content; no login or API keys required.
  • 💪 Resilient infrastructure: Automatic proxy fallback keeps collection running when direct access is blocked.
  • 💰 Cost-effective & predictable: Designed for reliable, repeatable workloads without brittle browser automation.

In short: a production-grade reddit comment scraping tool vs. extension-based alternatives.

Yes — when done responsibly. This actor collects publicly available data from Reddit post pages and does not access private or authenticated content.

Guidelines for compliant use:

  • Scrape only public pages and respect platform terms.
  • Do not target private subreddits or password-protected content.
  • Ensure your use complies with applicable laws (e.g., GDPR, CCPA).
  • Use the data ethically — for analysis and research, not spam.

For edge cases, confirm requirements with your legal team.

Input parameters & output format

Example JSON input:

{
"startUrls": [
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"
],
"maxComments": 250,
"replyLimit": 0,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input fields (from the actor’s input schema):

  • startUrls (array, required): List one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
    Default: none.
  • maxComments (integer, optional): Maximum number of comments to fetch per URL. Range: 1–10,000.
    Default: 1000.
  • replyLimit (integer, optional): Maximum number of replies to store per comment in the nested “replies” field. Set to 0 for unlimited. (All replies are still collected in the flattened output.) Range: 0–100.
    Default: 0.
  • proxyConfiguration (object, optional): Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it will fallback to datacenter proxy, then residential proxy with retries.
    Prefill: { "useApifyProxy": false }.

Output: dataset items (one per comment), with the following fields:

  • url, comment_id, post_id, author, permalink, upvotes, content_type, parent_id, author_avatar, userUrl, contentText, created_time, replies

Example dataset item:

{
"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/",
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "t3_1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}

Also saved to the key-value store as grouped output under key “OUTPUT”:

{
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [
{
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "t3_1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}
]
}

Notes:

  • Fields author_avatar and created_time may be empty when not provided by Reddit’s response.
  • The “replies” array stores nested replies per the replyLimit, while each reply is also included as its own record in the dataset.

FAQ

Is there a free tier or trial to test it?

Yes. The listing includes 120 trial minutes so you can evaluate the actor. For ongoing use, there’s a flat monthly price of $19.99 shown on the Apify Store. Actual billing depends on your Apify plan and usage.

Do I need Reddit API keys or login?

No. It works without login or OAuth — a reddit comment scraper without API keys. The actor uses publicly available JSON responses from Reddit post pages.

What types of content can it scrape?

It scrapes comments from public Reddit post URLs, including nested replies. It does not access private subreddits or require authentication.

How many comments can I scrape per URL?

You can set maxComments between 1 and 10,000 per URL. The actor will traverse “more” placeholders to retrieve additional batches until your limit is reached.

Does it capture nested replies?

Yes. Use replyLimit to control how many replies are stored per comment in the “replies” field. Set 0 for unlimited storage of nested replies (flattened comments are still collected either way).

What if Reddit blocks my requests?

The actor includes smart proxy fallback: it first tries a direct connection, then falls back to datacenter proxy, and finally residential proxy with retries to maximize success.

Can I export results to CSV?

Yes. Open the Dataset tab after the run and use the built-in export options to download CSV, JSON, or Excel. You can also fetch data via the Apify API.

Can I scrape subreddit comments in bulk?

Yes, by providing multiple Reddit post URLs from your target subreddits. The tool functions as a scalable reddit thread comment scraper for bulk processing.

Closing CTA / Final thoughts

Reddit Comment Scraper is built for accurate, scalable extraction of Reddit post comments. It delivers structured records with authors, scores, permalinks, and nested replies, ready for research, social listening, and analytics.

Whether you’re a marketer, developer, data analyst, or researcher, you can run bulk jobs, export to CSV/JSON, and integrate via the Apify API for automation. Start collecting smarter Reddit insights today — and turn conversations into measurable, repeatable intelligence.