Reddit Comment Scraper

Pricing

$2.50 / 1,000 results

Try for free

Go to Apify Store

Reddit Comment Scraper

Try for free

Scrape Reddit Comments from a post on Reddit. Provides comment text, the parent of the thread, score and timestamps.

Pricing

$2.50 / 1,000 results

Rating

5.0

(5)

Developer

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

6.7 hours

Issues response

12 days ago

Last modified

Features

💬 Scrape comments from multiple Reddit posts
📊 Extract comprehensive comment data (text, author, score, timestamps, etc.)
🔄 Automatically expand collapsed threads and "load more" sections
🌳 Capture nested comment structure with depth levels
📦 No authentication required for public posts
💾 Data saved in structured JSON format
🌐 Browser automation bypasses API restrictions

Input Parameters

The actor accepts the following input parameters:

Parameter	Type	Required	Default	Description
`postUrls`	array	Yes	-	List of Reddit post URLs to scrape comments from
`maxComments`	integer	No	`100`	Maximum number of comments to scrape from each post (1-10000)
`expandThreads`	boolean	No	`true`	Automatically expand collapsed threads and "load more" sections

Example Input

{
  "postUrls": [
    "https://www.reddit.com/r/programming/comments/1abc123/interesting_discussion/",
    "https://old.reddit.com/r/python/comments/1def456/another_post/"
  ],
  "maxComments": 200,
  "expandThreads": true
}

Output Fields

The actor extracts the following data for each comment:

Comment Information

comment_id - Unique comment ID (e.g., "abc123xyz")
comment_name - Full comment name in Reddit format (e.g., "t1_abc123xyz")
author - Username of the comment author (or "[deleted]")
text - Full comment text/content

Engagement Metrics

score - Comment score/karma (upvotes minus downvotes)
awards_count - Number of awards/gildings the comment received

Metadata

depth - Nesting level/depth in the comment thread (0 = top-level)
parent_comment_id - ID of the parent comment (null for top-level comments)
is_op - Boolean indicating if the author is the Original Poster
is_edited - Boolean indicating if the comment was edited
is_stickied - Boolean indicating if the comment is stickied/pinned

Timestamps

created_utc - Unix timestamp when the comment was created
created_at - ISO 8601 formatted datetime (e.g., "2025-10-14T12:30:45")

Example Output

{
  "comment_id": "abc123xyz",
  "comment_name": "t1_abc123xyz",
  "author": "example_user",
  "text": "This is a great discussion! I totally agree with your points about...",
  "score": 42,
  "awards_count": 2,
  "permalink": "https://old.reddit.com/r/programming/comments/1abc123/_/abc123xyz/",
  "post_url": "https://old.reddit.com/r/programming/comments/1abc123/interesting_discussion/",
  "depth": 0,
  "parent_comment_id": null,
  "is_op": false,
  "is_edited": true,
  "is_stickied": false,
  "created_utc": 1728912645,
  "created_at": "2025-10-14T12:30:45"
}

Usage

Local Development

Install dependencies:

pip install -r requirements.txt
playwright install chromium

Set up input in storage/key_value_stores/default/INPUT.json:

{
  "postUrls": ["https://www.reddit.com/r/programming/comments/1example/"],
  "maxComments": 100,
  "expandThreads": true
}

Run the actor:
```
$python -m src
```
Check results in storage/datasets/default/

On Apify Platform

Push to Apify:
- Login to Apify CLI: apify login
- Initialize: apify init (if not already done)
- Push to Apify: apify push
Or manually upload:
- Create a new actor on Apify platform
- Upload all files including Dockerfile, requirements.txt, and .actor/ directory
Configure and run:
- Set input parameters in the Apify console
- Paste Reddit post URLs
- Click "Start" to run the actor
- Download results from the dataset tab

Technical Details

Browser Automation

Uses Playwright with Chromium browser
Scrapes old.reddit.com for better compatibility and simpler HTML structure
Implements anti-detection measures:
- Custom User-Agent headers
- Disabled automation flags
- Browser fingerprint masking

Features

Automatic thread expansion: Clicks "load more" and "continue this thread" buttons
Smart extraction: Handles nested comments and preserves thread structure
Depth tracking: Captures comment nesting levels
Parent-child relationships: Links comments to their parents
Error handling: Gracefully handles deleted comments and missing data

Comment Expansion

The scraper automatically:

Clicks "load more comments" buttons (up to 10 per attempt)
Clicks "continue this thread" links (up to 5 per attempt)
Makes up to 3 expansion attempts to maximize comment coverage
Waits for new comments to load after each expansion

Performance

Headless browser mode for efficiency
Optimized page load strategy (domcontentloaded)
Configurable wait times and timeouts
Parallel processing of multiple posts (sequential with delays)

Limitations

Only works with public Reddit posts
Cannot scrape private or restricted posts
Browser automation is slower than direct API calls but more reliable
Hidden scores show as 0 (when "[score hidden]" is displayed)
Maximum 10,000 comments per post (configurable)

Dependencies

apify>=2.1.0 - Apify SDK for Python
playwright~=1.40.0 - Browser automation framework
beautifulsoup4~=4.12.0 - HTML parsing library

Troubleshooting

Timeout Issues

If you encounter timeout errors:

Check if the post URL is valid and accessible
Increase timeout values in the code if needed
Verify the post has comments

Missing Comments

If some comments are missing:

Enable expandThreads to load collapsed comments
Increase maxComments limit
Some comments may be deleted or removed by moderators

"[deleted]" Authors

Comments from deleted accounts show "[deleted]" as author
This is normal Reddit behavior
The comment text may still be available or show as "[removed]"

Use Cases

Sentiment Analysis: Analyze community opinions on topics
Market Research: Gather user feedback and discussions
Content Moderation: Monitor discussions for moderation
Academic Research: Study online community interactions
Data Analysis: Build datasets for machine learning

License

This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.

Notes

This scraper uses browser automation to access Reddit's public web interface
Always respect Reddit's robots.txt and terms of service
Use responsibly and avoid overwhelming Reddit's servers
Consider implementing additional rate limiting for large-scale scraping
The actor works best with the Apify platform's infrastructure
Posts with thousands of comments may take longer to scrape

Reddit Comments Scraper

scrapio/reddit-comments-scraper

Use Reddit Comments Scraper to gather Reddit comment data at scale. Capture comment threads, upvotes, authors, and posting time to analyze discussions, audience sentiment, and community engagement patterns.

Scrapio

Reddit Thread & Comment Scraper

arlusm/reddit-scraper

Also creates screenshots of the posts - useful for reddit to youtube automated video pipeline.

Aluslabs

1.0

Reddit Comments Search Scraper

easyapi/reddit-comments-search-scraper

Search and extract Reddit comments with advanced filtering options. Get detailed metadata including comment content, author info, post context, and engagement metrics. Perfect for sentiment analysis, trend research, and social media monitoring.

EasyApi

104

5.0

Reddit Keywords

crawlerbros/reddit-keywords

Welcome to Reddit Keywords Scraper. Scrape Posts from Reddit through Reddit search engine by providing your desired keyword, the crawler will return post urls, number of comments, score, title, content, thumbnail and much more. Be sure to leave a review and provide feedback.

Crawler Bros

5.0

Reddit Comments Scraper

easyapi/reddit-comments-scraper

Extract Reddit comments with their complete thread structure, including nested replies, user information, and engagement metrics. Perfect for analyzing discussions, sentiment analysis, and tracking community engagement on Reddit posts.

EasyApi

220

5.0

Reddit Post Scrapper

dead00/reddit-post-scrapper

A Reddit post scraper is a tool or script that automatically collects data from Reddit posts—such as titles, content, comments.

Dead

Reddit API Scraper

comchat/reddit-api-scraper

Reddit Scraper is a powerful tool that allows you to extract data from Reddit such as posts by keyword. With Reddit Scraper, you can easily gather valuable information from Reddit without the need to log in. You can easily use this Reddit scraper as an alternative API.

Comchat

1.4K

1.0

Reddit Posts Search Scraper

vulnv/reddit-posts-search-scraper

Search and scrape Reddit posts by keyword. Extract detailed post data, comments, scores, timestamps, and metadata for research and analysis.

VulnV

110

5.0

Reddit Search Post Scraper

igview-owner/reddit-post-viewer

Scrape Reddit search results into clean datasets without login. Get post text, author info, subreddit details, engagement metrics, and media URLs for any keyword or topic. Perfect for research, sentiment analysis, and trend tracking.

Sachin Kumar Yadav

Reddit Scraper

crawlerbros/reddit-scraper

Scrape entire subreddits with this crawler. Returns the posts in a subreddit along with their title, text, scores and timestamps etc.

Crawler Bros

105

5.0

Reddit Comment Scraper

Reddit Comment Scraper

Features

Input Parameters

Example Input

Output Fields

Comment Information

Engagement Metrics

Links

Metadata

Timestamps

Example Output

Usage

Local Development

On Apify Platform

Technical Details

Browser Automation

Features

Comment Expansion

Performance

Limitations

Dependencies

Troubleshooting

Timeout Issues

Missing Comments

"[deleted]" Authors

Use Cases

License

Notes

You might also like

Reddit Comments Scraper

Reddit Thread & Comment Scraper

Reddit Comments Search Scraper

Reddit Keywords

Reddit Comments Scraper

Reddit Post Scrapper

Reddit API Scraper

Reddit Posts Search Scraper

Reddit Search Post Scraper

Reddit Scraper