
Reddit Scraper
Pricing
$2.50 / 1,000 results

Reddit Scraper
Scrape entire subreddits with this crawler. Returns the posts in a subreddit along with their title, text, scores and timestamps etc.
5.0 (3)
Pricing
$2.50 / 1,000 results
0
6
6
Last modified
8 hours ago
Reddit Subreddit Scraper
An Apify Actor for scraping posts from Reddit subreddits using browser automation with Playwright.
Features
- π― Scrape multiple subreddits in a single run
- π Extract comprehensive post data (title, author, score, comments, etc.)
- π Support for different sorting methods (hot, new, top, rising, controversial)
- β° Time filters for "top" and "controversial" posts
- π¦ No authentication required for public subreddits
- πΎ Data saved in structured JSON format
- π Browser automation bypasses API restrictions
- π Automatic pagination support
Input Parameters
The actor accepts the following input parameters:
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
subreddits | array | Yes | ["python"] | List of subreddit names to scrape (without 'r/' prefix) |
maxPosts | integer | No | 25 | Maximum number of posts to scrape from each subreddit (1-1000) |
sort | string | No | "hot" | How to sort posts: hot , new , top , rising , or controversial |
timeFilter | string | No | "day" | Time filter for 'top'/'controversial': hour , day , week , month , year , all |
Example Input
{"subreddits": ["islamabad", "pakistan", "programming"],"maxPosts": 50,"sort": "hot","timeFilter": "day"}
Output Fields
The actor extracts the following data for each post:
Subreddit Information
subreddit
- Subreddit name (e.g., "islamabad")subreddit_prefixed
- Subreddit name with r/ prefix (e.g., "r/islamabad")
Post Content
post_id
- Unique post ID (e.g., "1kql1t5")post_name
- Full post name in Reddit format (e.g., "t3_1kql1t5")title
- Post titleauthor
- Username of the post authorselftext
- Text content preview (first 1000 chars, for self posts only)
Engagement Metrics
score
- Post score/karma (upvotes minus downvotes)num_comments
- Number of comments on the post
Links
url
- URL of the linked content (external URL or permalink for self posts)permalink
- Direct link to the Reddit post
Metadata
domain
- Domain of the linked content (e.g., "self.islamabad" for text posts)is_self_post
- Boolean indicating if it's a text post (true) or link post (false)link_flair
- Post flair/tag text (if any)thumbnail_url
- URL of the post thumbnail image (if any)
Timestamps
created_utc
- Unix timestamp when the post was createdcreated_at
- ISO 8601 formatted datetime (e.g., "2025-05-19T19:40:28")
Flags
is_stickied
- Boolean indicating if the post is stickied/pinnedis_locked
- Boolean indicating if the post is locked (no new comments)is_nsfw
- Boolean indicating if the post is marked as NSFW (over 18)
Example Output
{"subreddit": "islamabad","subreddit_prefixed": "r/islamabad","post_id": "1kql1t5","post_name": "t3_1kql1t5","title": "Everyone's always asking what to do in Islamabad - I made a list","author": "hafmaestro","selftext": "Note: I have not mentioned normal restaurants and cafes...","score": 595,"num_comments": 101,"url": "https://old.reddit.com/r/islamabad/comments/1kql1t5/...","permalink": "https://old.reddit.com/r/islamabad/comments/1kql1t5/...","domain": "self.islamabad","is_self_post": true,"link_flair": "Islamabad","thumbnail_url": null,"created_utc": 1747683628,"created_at": "2025-05-19T19:40:28","is_stickied": false,"is_locked": false,"is_nsfw": false}
Usage
Local Development
-
Install dependencies:
pip install -r requirements.txtplaywright install chromium -
Set up input in
storage/key_value_stores/default/INPUT.json
:{"subreddits": ["python"],"maxPosts": 25,"sort": "hot"} -
Run the actor:
$python -m src -
Check results in
storage/datasets/default/
On Apify Platform
-
Push to Apify:
- Login to Apify CLI:
apify login
- Initialize:
apify init
(if not already done) - Push to Apify:
apify push
- Login to Apify CLI:
-
Or manually upload:
- Create a new actor on Apify platform
- Upload all files including
Dockerfile
,requirements.txt
, and.actor/
directory
-
Configure and run:
- Set input parameters in the Apify console
- Click "Start" to run the actor
- Download results from the dataset tab
Technical Details
Browser Automation
- Uses Playwright with Chromium browser
- Scrapes
old.reddit.com
for better compatibility and simpler HTML structure - Implements anti-detection measures:
- Custom User-Agent headers
- Disabled automation flags
- Browser fingerprint masking
Features
- Automatic pagination: Clicks "next" button to load more posts
- Smart selectors: Multiple fallback CSS selectors for reliability
- Error handling: Screenshots saved on errors for debugging
- Rate limiting: Built-in delays between requests
Performance
- Headless browser mode for efficiency
- Optimized page load strategy (
domcontentloaded
) - Configurable wait times and timeouts
Limitations
- Only works with public subreddits
- Cannot scrape private or restricted communities
- Browser automation is slower than direct API calls but more reliable
- Selftext preview limited to first 1000 characters
Dependencies
apify>=2.1.0
- Apify SDK for Pythonplaywright~=1.40.0
- Browser automation frameworkbeautifulsoup4~=4.12.0
- HTML parsing library
Troubleshooting
Timeout Issues
If you encounter timeout errors:
- Check the debug screenshots in the key-value store
- Increase timeout values in the code
- Verify the subreddit exists and is public
No Posts Found
- Verify the subreddit name is correct (without 'r/' prefix)
- Check if the subreddit has posts for the selected sort method
- Review logs for detailed error messages
License
This actor is provided as-is for scraping public Reddit data in accordance with Reddit's terms of service.
Notes
- This scraper uses browser automation to access Reddit's public web interface
- Always respect Reddit's robots.txt and terms of service
- Use responsibly and avoid overwhelming Reddit's servers
- Consider implementing additional rate limiting for large-scale scraping
- The actor works best with the Apify platform's infrastructure
On this page
Share Actor: