Reddit Comment Scraper avatar

Reddit Comment Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Comment Scraper

Reddit Comment Scraper

🧰 Reddit Comment Scraper (reddit-comment-scraper) collects Reddit comments & threads across subreddits — with author, score, timestamps, permalinks & nesting. 📊 Export CSV/JSON for research, sentiment, brand monitoring & SEO. ⚡ Ideal for analysts, marketers & community teams.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeMesh

ScrapeMesh

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 days ago

Last modified

Share

Reddit Comment Scraper

The Reddit Comment Scraper is a production-ready Apify actor that collects structured comments from Reddit post URLs — fast, reliable, and built for scale. It solves the hassle of manually navigating threads by turning any Reddit discussion into clean, analyzable records with authors, scores, permalinks, parent/child relationships, and nested replies. Whether you’re a marketer, developer, data analyst, or researcher, this reddit comments scraper tool helps you scrape reddit comments and export them to a usable reddit comment dataset for insights, NLP, and reporting at scale. Think of it as a Reddit thread comment scraper and reddit comment extractor optimized for workflow automation and data accuracy.

What data / output can you get?

Below are the exact fields pushed to the Apify dataset for each comment. You can export results as JSON or CSV from the Apify dataset UI.

Data fieldDescriptionExample value
urlThe original Reddit post URL the comment belongs tohttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/
comment_idUnique comment identifierlhk1f7n
post_idReddit post thing ID (t3_…)t3_1epeshq
authorComment author username (or “[deleted]”)AutoModerator
permalinkDirect link to the specific commenthttps://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/
upvotesNumber of upvotes (score)42
content_typeContent type labeltext
parent_idParent comment ID without prefix (null for top-level)lhk1f7n
author_avatarAuthor avatar URL (if available; empty string otherwise)
userUrlLink to the user’s Reddit profile (empty if deleted)https://www.reddit.com/user/AutoModerator/
contentTextThe comment text content, line breaks normalizedThis is a comment…
created_timeCreated timestamp placeholder (empty string if unavailable)
repliesArray of nested reply objects (same schema), trimmed by replyLimit[ … ]

Notes:

  • Each dataset item represents one comment and includes a nested replies array (which can be limited by replyLimit). All discovered comments are also emitted as individual flat records.
  • You can export reddit comments to CSV or JSON directly from the Apify dataset.
  • The actor also stores a grouped “by URL” structure in the key-value store under the OUTPUT key for convenience.

Key features

  • ⚡️ Robust proxy fallback Automatically tries a direct connection, then datacenter proxy, then residential proxy with retries to keep your reddit comments crawler running even under blocks.

  • 🧵 Nested conversation structure Captures parent/child relationships with a nested replies array per comment. Control how many replies are stored via replyLimit while still collecting all comments in the flat output.

  • 📦 Bulk URL processing Process multiple Reddit post URLs in one run to build a larger reddit comment dataset efficiently.

  • 💾 Clean, structured output Pushes consistent JSON records to the Apify dataset with author, score, permalinks, parent IDs, and more — perfect for analysis, NLP, and reporting.

  • 🚫 No login or cookies required Works against public Reddit JSON endpoints; no authentication needed for scraping public threads.

  • 🔁 Production-ready reliability Async HTTP requests, progress logging (e.g., “Collected N comments so far”), and defensive deduplication ensure dependable runs at scale.

  • 🧰 Developer-friendly Built as a Python reddit comment scraper on Apify. Access results via the Apify API, and integrate into pipelines for automated reddit comment downloader workflows.

How to use Reddit Comment Scraper - step by step

  1. Create or log in to your Apify account.
  2. Open the “reddit-comment-scraper” actor in the Apify Console.
  3. Add your Reddit post URLs in startUrls (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/). The input accepts a string list; each item can be a full post URL.
  4. Configure limits:
    • Set maxComments (1–10,000) to control how many comments to collect per URL.
    • Set replyLimit (0–100) to control how many nested replies are stored per comment (0 means unlimited).
  5. (Optional) Configure proxyConfiguration. By default, no proxy is used. If Reddit rejects requests, the actor will automatically fall back to datacenter and then residential proxies with retries.
  6. Click Run. Watch progress logs as comments are collected and expanded via Reddit’s morechildren endpoint.
  7. Download results. Go to the Dataset tab to export JSON or CSV. A grouped-by-URL JSON is also saved under the Key-Value Store as OUTPUT.

Pro tip: Use the Apify API to pull dataset items programmatically and feed them into your analytics or enrichment workflow.

Use cases

Use caseDescription
Market research + topic analysisAggregate and analyze discussion threads to quantify sentiment and themes across public posts.
Brand monitoring + community insightsTrack brand mentions and extract replies to understand user feedback within specific threads.
Content research + editorialCompile user perspectives from targeted discussions to inform articles and summaries.
Data science + NLP trainingBuild a structured reddit comment dataset with parent/child context for modeling and classification.
Academic research + social analysisStudy public discourse patterns using thread-level structures and upvote signals.
Developer pipelines (API)Use the Apify API to automate scrape reddit comments workflows and feed data into ETL/ELT pipelines.

Why choose Reddit Comment Scraper?

This actor prioritizes precision, automation, and reliability over brittle browser extensions or ad-hoc scripts.

  • ✅ Accurate, structured fields for every comment, including parent/child links
  • 🌐 No login required — collects publicly available thread data
  • 📈 Scales across many URLs with async requests and intelligent deduplication
  • 🧪 Developer-first design — Python-based, API-friendly, automation-ready
  • 🛡️ Resilient proxy fallback (direct → datacenter → residential) to reduce blocks
  • 💾 Easy exports (CSV/JSON) and grouped output for downstream processing
  • 🧭 Better than unstable alternatives — production-ready infrastructure on Apify

In short, it’s a reliable reddit API comments scraper that turns threads into analytics-ready data, fast.

Yes — when done responsibly. This actor scrapes only publicly available Reddit content and does not access private or password-protected data.

Guidelines for compliant use:

  • Collect only public data and respect Reddit’s Terms of Service.
  • Avoid scraping private communities or content behind authentication.
  • Ensure your use complies with data protection laws (e.g., GDPR, CCPA).
  • Use the data responsibly (e.g., analysis, research) and avoid spam or misuse.
  • Consult your legal team if you have edge cases or questions.

Input parameters & output format

Example JSON input

{
"startUrls": [
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/"
],
"maxComments": 1000,
"replyLimit": 0,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input fields

  • startUrls (array, required): List one or more Reddit post URLs (e.g., https://www.reddit.com/r/subreddit/comments/post_id/title/).
    • Default: none
  • maxComments (integer, optional): Maximum number of comments to fetch per URL.
    • Range: 1–10,000
    • Default: 1000
  • replyLimit (integer, optional): Maximum number of replies to store per comment in the nested replies field. Set to 0 for unlimited. (All replies are still collected in the flattened output.)
    • Range: 0–100
    • Default: 0
  • proxyConfiguration (object, optional): Choose which proxies to use. By default, no proxy is used. If Reddit rejects or blocks the request, it will fall back to datacenter proxy, then residential proxy with retries.
    • Default: { "useApifyProxy": false }

Example dataset record (single comment)

{
"url": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/",
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": null,
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}

Grouped output saved to Key-Value Store (key: OUTPUT)

{
"https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/": [
{
"comment_id": "lhk1f7n",
"post_id": "t3_1epeshq",
"author": "AutoModerator",
"permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/lhk1f7n/",
"upvotes": 1,
"content_type": "text",
"parent_id": "1epeshq",
"author_avatar": "",
"userUrl": "https://www.reddit.com/user/AutoModerator/",
"contentText": "Comment text here...",
"created_time": "",
"replies": []
}
]
}

Notes:

  • created_time and author_avatar may be empty strings when not present in Reddit’s JSON.
  • parent_id is null for top-level comments and contains the normalized parent ID for replies (prefixes like t1_/t3_ are stripped).

FAQ

Is there a free trial?

Yes. The actor offers trial minutes on Apify so you can test before subscribing. You’ll see current trial availability and pricing on the actor’s Apify Store page.

Do I need to log in or provide cookies?

No. The scraper works with public Reddit JSON endpoints and does not require authentication. It fetches publicly available comments only.

How many comments can I scrape per URL?

You can set maxComments from 1 to 10,000 per URL. The actor will collect up to this limit, expanding “more” placeholders via the Reddit API.

Can it scrape nested replies?

Yes. Nested replies are traversed and included. The replyLimit parameter controls how many replies are stored in the replies array per comment (0 means unlimited). All discovered replies are still emitted as individual flat records.

What happens if Reddit blocks the requests?

The actor automatically falls back from a direct connection to a datacenter proxy and then to a residential proxy with retries. This increases resilience during scraping.

Can I export results to CSV?

Yes. All comments are stored in the Apify dataset, which supports exports to JSON, CSV, and more. You can also access records via the Apify API to build pipelines.

Does it work for private subreddits or deleted comments?

No. It collects only publicly accessible content. Deleted or removed comments will appear as “[deleted]” where applicable.

Can I integrate this with Python or APIs?

Yes. This is a Python-based actor on Apify. You can pull dataset items via the Apify API or integrate into your own python reddit comment scraper workflows and automation stacks.

Closing CTA / Final thoughts

The Reddit Comment Scraper is built for teams that need accurate, scalable extraction of Reddit thread comments. It delivers structured records with authors, scores, permalinks, and nested replies — ideal for market research, analytics, and NLP.

Marketers, developers, data analysts, and researchers can export reddit comments to CSV/JSON, automate runs via the Apify API, and build a reliable reddit comments crawler into their pipelines. Start turning public Reddit discussions into actionable datasets — quickly, safely, and at scale.