
Reddit Comment Scraper
Pricing
$2.50 / 1,000 results
Go to Apify Store

Reddit Comment Scraper
Scrape Reddit Comments.
5.0 (3)
Pricing
$2.50 / 1,000 results
0
2
2
Last modified
10 hours ago
An Apify Actor for scraping comments from Reddit posts using browser automation with Playwright.
Features
- ๐ฌ Scrape comments from multiple Reddit posts
- ๐ Extract comprehensive comment data (text, author, score, timestamps, etc.)
- ๐ Automatically expand collapsed threads and "load more" sections
- ๐ณ Capture nested comment structure with depth levels
- ๐ฆ No authentication required for public posts
- ๐พ Data saved in structured JSON format
- ๐ Browser automation bypasses API restrictions
Input Parameters
The actor accepts the following input parameters:
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
postUrls | array | Yes | - | List of Reddit post URLs to scrape comments from |
maxComments | integer | No | 100 | Maximum number of comments to scrape from each post (1-10000) |
expandThreads | boolean | No | true | Automatically expand collapsed threads and "load more" sections |
Example Input
{"postUrls": ["https://www.reddit.com/r/programming/comments/1abc123/interesting_discussion/","https://old.reddit.com/r/python/comments/1def456/another_post/"],"maxComments": 200,"expandThreads": true}
Output Fields
The actor extracts the following data for each comment:
Comment Information
comment_id
- Unique comment ID (e.g., "abc123xyz")comment_name
- Full comment name in Reddit format (e.g., "t1_abc123xyz")author
- Username of the comment author (or "[deleted]")text
- Full comment text/content
Engagement Metrics
score
- Comment score/karma (upvotes minus downvotes)awards_count
- Number of awards/gildings the comment received
Links
permalink
- Direct link to the commentpost_url
- URL of the parent post
Metadata
depth
- Nesting level/depth in the comment thread (0 = top-level)parent_comment_id
- ID of the parent comment (null for top-level comments)is_op
- Boolean indicating if the author is the Original Posteris_edited
- Boolean indicating if the comment was editedis_stickied
- Boolean indicating if the comment is stickied/pinned
Timestamps
created_utc
- Unix timestamp when the comment was createdcreated_at
- ISO 8601 formatted datetime (e.g., "2025-10-14T12:30:45")
Example Output
{"comment_id": "abc123xyz","comment_name": "t1_abc123xyz","author": "example_user","text": "This is a great discussion! I totally agree with your points about...","score": 42,"awards_count": 2,"permalink": "https://old.reddit.com/r/programming/comments/1abc123/_/abc123xyz/","post_url": "https://old.reddit.com/r/programming/comments/1abc123/interesting_discussion/","depth": 0,"parent_comment_id": null,"is_op": false,"is_edited": true,"is_stickied": false,"created_utc": 1728912645,"created_at": "2025-10-14T12:30:45"}
Usage
Local Development
-
Install dependencies:
pip install -r requirements.txtplaywright install chromium -
Set up input in
storage/key_value_stores/default/INPUT.json
:{"postUrls": ["https://www.reddit.com/r/programming/comments/1example/"],"maxComments": 100,"expandThreads": true} -
Run the actor:
$python -m src -
Check results in
storage/datasets/default/
On Apify Platform
-
Push to Apify:
- Login to Apify CLI:
apify login
- Initialize:
apify init
(if not already done) - Push to Apify:
apify push
- Login to Apify CLI:
-
Or manually upload:
- Create a new actor on Apify platform
- Upload all files including
Dockerfile
,requirements.txt
, and.actor/
directory
-
Configure and run:
- Set input parameters in the Apify console
- Paste Reddit post URLs
- Click "Start" to run the actor
- Download results from the dataset tab
Technical Details
Browser Automation
- Uses Playwright with Chromium browser
- Scrapes
old.reddit.com
for better compatibility and simpler HTML structure - Implements anti-detection measures:
- Custom User-Agent headers
- Disabled automation flags
- Browser fingerprint masking
Features
- Automatic thread expansion: Clicks "load more" and "continue this thread" buttons
- Smart extraction: Handles nested comments and preserves thread structure
- Depth tracking: Captures comment nesting levels
- Parent-child relationships: Links comments to their parents
- Error handling: Gracefully handles deleted comments and missing data
Comment Expansion
The scraper automatically:
- Clicks "load more comments" buttons (up to 10 per attempt)
- Clicks "continue this thread" links (up to 5 per attempt)
- Makes up to 3 expansion attempts to maximize comment coverage
- Waits for new comments to load after each expansion
Performance
- Headless browser mode for efficiency
- Optimized page load strategy (
domcontentloaded
) - Configurable wait times and timeouts
- Parallel processing of multiple posts (sequential with delays)
Limitations
- Only works with public Reddit posts
- Cannot scrape private or restricted posts
- Browser automation is slower than direct API calls but more reliable
- Hidden scores show as 0 (when "[score hidden]" is displayed)
- Maximum 10,000 comments per post (configurable)
Dependencies
apify>=2.1.0
- Apify SDK for Pythonplaywright~=1.40.0
- Browser automation frameworkbeautifulsoup4~=4.12.0
- HTML parsing library
Troubleshooting
Timeout Issues
If you encounter timeout errors:
- Check if the post URL is valid and accessible
- Increase timeout values in the code if needed
- Verify the post has comments
Missing Comments
If some comments are missing:
- Enable
expandThreads
to load collapsed comments - Increase
maxComments
limit - Some comments may be deleted or removed by moderators
"[deleted]" Authors
- Comments from deleted accounts show "[deleted]" as author
- This is normal Reddit behavior
- The comment text may still be available or show as "[removed]"
Use Cases
- Sentiment Analysis: Analyze community opinions on topics
- Market Research: Gather user feedback and discussions
- Content Moderation: Monitor discussions for moderation
- Academic Research: Study online community interactions
- Data Analysis: Build datasets for machine learning
License
This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.
Notes
- This scraper uses browser automation to access Reddit's public web interface
- Always respect Reddit's robots.txt and terms of service
- Use responsibly and avoid overwhelming Reddit's servers
- Consider implementing additional rate limiting for large-scale scraping
- The actor works best with the Apify platform's infrastructure
- Posts with thousands of comments may take longer to scrape
On this page
Share Actor: