Changelog
All notable changes to the Reddit Scraper Actor will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
[1.0.0] - 2026-01-13
Added
- Initial release of Reddit Scraper Actor
- Multi-subreddit scraping support
- Post fetching from High Performance API
- Comment fetching with configurable depth
- Rate limit tracking and logging
- Flexible sorting (newest/oldest)
- Search functionality within subreddits
- Rich metadata extraction (score, author, timestamps, etc.)
- Apify platform integration
- Dataset storage with structured format
- Comprehensive error handling
- Input validation
- User-friendly logging with emojis
- Docker containerization
- Python 3.11 support
Features
-
RedditScraper Class:
fetch_posts(): Fetch posts from subreddit
fetch_comments(): Fetch comments for specific post
format_post(): Format post data for storage
format_comment(): Format comment data for storage
- Rate limit monitoring via response headers
-
Input Parameters:
subreddits: Array of subreddit names
maxPosts: Maximum posts per subreddit (1-100)
maxCommentsPerPost: Maximum comments per post (0-50)
sortBy: Sort order (desc/asc)
fetchComments: Toggle comment fetching
searchQuery: Optional search text
proxy: Proxy configuration
-
Output Format:
- Post objects with full metadata
- Comment objects linked to posts
- Timestamp conversion to readable dates
- Type identification (post/comment)
Technical Details
- Base URL:
Proprietary Endpoint
- API Endpoints:
/api/posts/search
/api/comments/search
- Rate Limit Headers:
X-RateLimit-Remaining
X-RateLimit-Reset
- Request Timeout: 30 seconds
- User-Agent: Chrome 121.0.0.0
Documentation
- README.md with user guide
- PROJECT.md with developer documentation
- Input schema with parameter descriptions
- Test script for local development
- Example input configuration
Testing
test_scraper.py: Local testing script
- Basic post fetching test
- Comment fetching test
- Search functionality test
- Data formatting test
[Unreleased]
Planned Features
- User profile scraping
- Retry logic with exponential backoff
- Proxy rotation support
- Enhanced error recovery
- Subreddit metadata fetching
- Batch processing optimization
- Advanced rate limit handling
- Multiple sort options (hot, top, controversial)
- Time-based filtering
- Media download support
- Flair filtering
- Award tracking
Known Limitations
- Historical data depth varies by subreddit
- Rate limits are shared across all API users
- No real-time streaming support
- Limited to archived data availability
Version History
| Version | Date | Description |
|---|
| 1.0.0 | 2026-01-13 | Initial release |
Support
For issues or questions:
- Review PROJECT.md for technical details
- Check README.md for usage instructions