Reddit Scraper
Pricing
$2.50 / 1,000 results
Reddit Scraper
Scrape entire subreddits with this crawler. Returns the posts in a subreddit along with their title, text, scores and timestamps etc.
Pricing
$2.50 / 1,000 results
Rating
5.0
(7)
Developer

Crawler Bros
Actor stats
2
Bookmarked
58
Total users
37
Monthly active users
21 days ago
Last modified
Categories
Share
Reddit Subreddit Scraper
An Apify Actor for scraping posts from Reddit subreddits using browser automation with Playwright.
Features
- π― Scrape multiple subreddits in a single run
- π Extract comprehensive post data (title, author, score, comments, etc.)
- π Support for different sorting methods (hot, new, top, rising, controversial)
- β° Time filters for "top" and "controversial" posts
- π¦ No authentication required for public subreddits
- πΎ Data saved in structured JSON format
- π Browser automation bypasses API restrictions
- π Automatic pagination support
Input Parameters
The actor accepts the following input parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
subreddits | array | Yes | ["python"] | List of subreddit names to scrape (without 'r/' prefix) |
maxPosts | integer | No | 25 | Maximum number of posts to scrape from each subreddit (1-1000) |
sort | string | No | "hot" | How to sort posts: hot, new, top, rising, or controversial |
timeFilter | string | No | "day" | Time filter for 'top'/'controversial': hour, day, week, month, year, all |
Example Input
{"subreddits": ["islamabad", "pakistan", "programming"],"maxPosts": 50,"sort": "hot","timeFilter": "day"}
Output Fields
The actor extracts the following data for each post:
Subreddit Information
subreddit- Subreddit name (e.g., "islamabad")subreddit_prefixed- Subreddit name with r/ prefix (e.g., "r/islamabad")
Post Content
post_id- Unique post ID (e.g., "1kql1t5")post_name- Full post name in Reddit format (e.g., "t3_1kql1t5")title- Post titleauthor- Username of the post authorselftext- Text content preview (first 1000 chars, for self posts only)
Engagement Metrics
score- Post score/karma (upvotes minus downvotes)num_comments- Number of comments on the post
Links
url- URL of the linked content (external URL or permalink for self posts)permalink- Direct link to the Reddit post
Metadata
domain- Domain of the linked content (e.g., "self.islamabad" for text posts)is_self_post- Boolean indicating if it's a text post (true) or link post (false)link_flair- Post flair/tag text (if any)thumbnail_url- URL of the post thumbnail image (if any)
Timestamps
created_utc- Unix timestamp when the post was createdcreated_at- ISO 8601 formatted datetime (e.g., "2025-05-19T19:40:28")
Flags
is_stickied- Boolean indicating if the post is stickied/pinnedis_locked- Boolean indicating if the post is locked (no new comments)is_nsfw- Boolean indicating if the post is marked as NSFW (over 18)
Example Output
{"subreddit": "islamabad","subreddit_prefixed": "r/islamabad","post_id": "1kql1t5","post_name": "t3_1kql1t5","title": "Everyone's always asking what to do in Islamabad - I made a list","author": "hafmaestro","selftext": "Note: I have not mentioned normal restaurants and cafes...","score": 595,"num_comments": 101,"url": "https://old.reddit.com/r/islamabad/comments/1kql1t5/...","permalink": "https://old.reddit.com/r/islamabad/comments/1kql1t5/...","domain": "self.islamabad","is_self_post": true,"link_flair": "Islamabad","thumbnail_url": null,"created_utc": 1747683628,"created_at": "2025-05-19T19:40:28","is_stickied": false,"is_locked": false,"is_nsfw": false}
Usage
Local Development
-
Install dependencies:
pip install -r requirements.txtplaywright install chromium -
Set up input in
storage/key_value_stores/default/INPUT.json:{"subreddits": ["python"],"maxPosts": 25,"sort": "hot"} -
Run the actor:
$python -m src -
Check results in
storage/datasets/default/
On Apify Platform
-
Push to Apify:
- Login to Apify CLI:
apify login - Initialize:
apify init(if not already done) - Push to Apify:
apify push
- Login to Apify CLI:
-
Or manually upload:
- Create a new actor on Apify platform
- Upload all files including
Dockerfile,requirements.txt, and.actor/directory
-
Configure and run:
- Set input parameters in the Apify console
- Click "Start" to run the actor
- Download results from the dataset tab
Technical Details
Browser Automation
- Uses Playwright with Chromium browser
- Scrapes
old.reddit.comfor better compatibility and simpler HTML structure - Implements anti-detection measures:
- Custom User-Agent headers
- Disabled automation flags
- Browser fingerprint masking
Features
- Automatic pagination: Clicks "next" button to load more posts
- Smart selectors: Multiple fallback CSS selectors for reliability
- Error handling: Screenshots saved on errors for debugging
- Rate limiting: Built-in delays between requests
Performance
- Headless browser mode for efficiency
- Optimized page load strategy (
domcontentloaded) - Configurable wait times and timeouts
Limitations
- Only works with public subreddits
- Cannot scrape private or restricted communities
- Browser automation is slower than direct API calls but more reliable
- Selftext preview limited to first 1000 characters
Dependencies
apify>=2.1.0- Apify SDK for Pythonplaywright~=1.40.0- Browser automation frameworkbeautifulsoup4~=4.12.0- HTML parsing library
Troubleshooting
Timeout Issues
If you encounter timeout errors:
- Check the debug screenshots in the key-value store
- Increase timeout values in the code
- Verify the subreddit exists and is public
No Posts Found
- Verify the subreddit name is correct (without 'r/' prefix)
- Check if the subreddit has posts for the selected sort method
- Review logs for detailed error messages
License
This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.
Notes
- This scraper uses browser automation to access Reddit's public web interface
- Always respect Reddit's robots.txt and terms of service
- Use responsibly and avoid overwhelming Reddit's servers
- Consider implementing additional rate limiting for large-scale scraping
- The actor works best with the Apify platform's infrastructure