Reddit Comments Scraper avatar

Reddit Comments Scraper

Pricing

Pay per usage

Go to Apify Store
Reddit Comments Scraper

Reddit Comments Scraper

Extract detailed comments and discussion threads from Reddit instantly. Perfect for sentiment analysis, market research, and community monitoring. Get structured data from any post URL efficiently. Residential proxies are recommended for high-volume scraping stability.

Pricing

Pay per usage

Rating

5.0

(2)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

1

Bookmarked

30

Total users

4

Monthly active users

a day ago

Last modified

Share

Extract Reddit post comments into a clean dataset with author, scoring, threading, flair, moderation, and post context fields. It is designed for research, monitoring, moderation analysis, and discussion intelligence on public Reddit threads.

Features

  • Thread-wide comment capture — Collect top-level comments and nested replies from public Reddit posts
  • Rich comment context — Save author, score, timestamps, thread depth, flair, moderation flags, and post metadata
  • Duplicate-safe output — Merge repeated comment records so each comment ID appears once in the final dataset
  • Cleaned missing values — Handle deleted authors, removed bodies, and sparse Reddit fields gracefully
  • Configurable collection size — Stop after the number of comments defined by results_wanted

Use Cases

Community Research

Study how people respond to prompts, news, or product discussions. Build datasets for qualitative analysis or trend tracking.

Moderation Analysis

Review stickied comments, locked discussions, controversial replies, and other moderation-related signals in one place.

NLP and Sentiment Work

Collect structured discussion text with reply depth, timestamps, and score data for downstream language analysis.

Competitive Monitoring

Track how Reddit communities talk about brands, topics, launches, or public events over time.


Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlStringYesReddit thread URL to collect comments from
results_wantedIntegerNo20Maximum number of unique comments to save
proxyConfigurationObjectNoApify Residential ProxyProxy settings for more reliable collection

Output Data

Each dataset item contains comment content plus thread and author context.

FieldTypeDescription
idStringReddit comment ID
comment_fullnameStringFull Reddit thing name for the comment
authorStringComment author or [deleted] when unavailable
author_fullnameStringReddit fullname for the author when available
bodyStringComment text or [removed] when unavailable
body_htmlStringComment body in Reddit HTML format when available
scoreNumberComment score
upsNumberUpvote count reported by Reddit
downsNumberDownvote count reported by Reddit
created_utcNumberUnix timestamp from Reddit
created_atStringISO timestamp derived from created_utc
editedBoolean or NumberEdit flag or edit timestamp
depthNumberReply depth within the thread
parent_idStringParent thing ID
parent_comment_idStringParent comment ID when the parent is another comment
parent_typeStringParent Reddit thing type such as t1 or t3
link_idStringFullname of the parent post
permalinkStringAbsolute Reddit URL for the comment
subredditStringSubreddit name
subreddit_idStringSubreddit ID
subreddit_name_prefixedStringPrefixed subreddit name such as r/AskReddit
subreddit_typeStringSubreddit visibility type
post_idStringParent post ID
post_titleStringParent post title
post_authorStringParent post author
post_permalinkStringParent post URL
is_submitterBooleanWhether the comment author created the post
distinguishedStringModerator or admin distinction when present
stickiedBooleanWhether the comment is stickied
lockedBooleanWhether the comment is locked
archivedBooleanWhether the comment is archived
collapsedBooleanWhether the comment is collapsed
controversialityNumberReddit controversiality score
score_hiddenBooleanWhether Reddit hides the score
gildedNumberLegacy gild count
total_awards_receivedNumberTotal awards on the comment
all_awardings_countNumberNumber of award entries present
author_premiumBooleanReddit premium flag for the author
author_is_blockedBooleanWhether the author is blocked
author_flair_textStringAuthor flair text
author_flair_typeStringFlair type
author_flair_text_colorStringFlair text color
author_flair_background_colorStringFlair background color
comment_typeStringComment classification when provided
treatment_tagsArrayReddit treatment tags when present
retrieval_sourceStringWhether the record came from the main listing or deferred expansion
source_urlStringFinal verified thread URL used for collection

Usage Examples

Basic Thread Extraction

{
"startUrl": "https://www.reddit.com/r/AskReddit/comments/1pqgcx9/whats_the_most_unexpected_way_someone_you_know/",
"results_wanted": 20
}

Larger Collection

{
"startUrl": "https://www.reddit.com/r/webscraping/comments/1qs66k0/couldnt_find_proxy_directory_with_filters_so/",
"results_wanted": 100
}

With Proxy Configuration

{
"startUrl": "https://www.reddit.com/r/AskReddit/comments/1pqgcx9/whats_the_most_unexpected_way_someone_you_know/",
"results_wanted": 50,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Sample Output

{
"id": "nuuw9it",
"comment_fullname": "t1_nuuw9it",
"author": "Agua_Frecuentemente",
"author_fullname": "t2_mobpn32j",
"body": "The guy who invented Smartfood (popcorn) lives in my town and we have some mutual friends.",
"score": 2851,
"ups": 2851,
"downs": 0,
"created_utc": 1766150326,
"created_at": "2025-12-19T18:25:26.000Z",
"depth": 0,
"parent_id": "t3_1pqgcx9",
"parent_type": "t3",
"link_id": "t3_1pqgcx9",
"permalink": "https://www.reddit.com/r/AskReddit/comments/1pqgcx9/whats_the_most_unexpected_way_someone_you_know/nuuw9it/",
"subreddit": "AskReddit",
"subreddit_id": "t5_2qh1i",
"subreddit_name_prefixed": "r/AskReddit",
"post_id": "1pqgcx9",
"post_title": "What’s the most unexpected way someone you know became wealthy?",
"post_author": "xFaith",
"post_permalink": "https://www.reddit.com/r/AskReddit/comments/1pqgcx9/whats_the_most_unexpected_way_someone_you_know/",
"is_submitter": false,
"stickied": false,
"locked": false,
"archived": false,
"collapsed": false,
"controversiality": 0,
"score_hidden": false,
"gilded": 0,
"total_awards_received": 0,
"author_premium": false,
"author_is_blocked": false,
"author_flair_type": "text",
"retrieval_source": "listing",
"source_url": "https://www.reddit.com/r/AskReddit/comments/1pqgcx9/whats_the_most_unexpected_way_someone_you_know/?solution=..."
}

Tips for Best Results

Use Working Reddit Thread URLs

  • Use direct thread URLs rather than subreddit feeds or search pages
  • Prefer public threads with active discussions

Start Small

  • Begin with results_wanted: 20 for quick validation
  • Increase the limit after confirming the thread loads correctly

Use Proxies for Reliability

  • Residential proxies can help maintain stable runs
  • If a thread is region-sensitive or intermittently blocked, rerun with proxy support enabled

Integrations

Connect your dataset with:

  • Google Sheets — Review discussion metrics in spreadsheets
  • Airtable — Build searchable discussion databases
  • Slack — Send comment updates into team workflows
  • Make — Automate processing pipelines
  • Zapier — Trigger downstream business actions

Export Formats

  • JSON — For application workflows and data pipelines
  • CSV — For spreadsheets and quick reviews
  • Excel — For business reporting
  • XML — For system integrations

Frequently Asked Questions

How many comments can I collect?

You can collect up to the number defined by results_wanted, as long as the thread contains that many unique comments.

Are nested replies included?

Yes. Replies are collected along with top-level comments, and their position in the thread is preserved through depth and parent fields.

Are duplicate comments removed?

Yes. Records are keyed by comment ID, and duplicate appearances are merged into one final dataset item.

What happens when a field is missing?

Sparse Reddit fields are handled gracefully. Deleted authors and removed bodies are normalized so the dataset stays usable.

Does this work on private communities?

No. The actor is intended for public Reddit threads only.


Use this actor only for legitimate data collection and analysis. You are responsible for complying with Reddit terms, rate limits, and applicable laws in your jurisdiction.