β‘ Reddit Lightning Scraper
Pricing
from $1.40 / 1,000 results
β‘ Reddit Lightning Scraper
β‘ Fast - your key advantage (no browser) π No API - differentiator from official API scrapers
Pricing
from $1.40 / 1,000 results
Rating
0.0
(0)
Developer

AbotAPI
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
16 days ago
Last modified
Categories
Share
π€ Reddit Scraper
Scrape Reddit posts, comments, and media from any subreddit or user profile. π No login required, no API keys needed.
π What does Reddit Scraper do?
Reddit Scraper is a lightweight Apify Actor that extracts posts and comments from Reddit without using a browser. It uses Reddit's JSON API endpoints directly, making it fast and resource-efficient (runs on just 128-256MB memory).
Unlike browser-based scrapers, this Actor:
- β‘ Starts instantly (no browser launch time)
- πΎ Uses minimal compute resources
- π Handles rate limiting automatically
- πͺ Supports resume on failure
π What data can you extract?
| Data Type | Fields Extracted |
|---|---|
| π Posts | ID, title, author, subreddit, score, upvote ratio, comments count, post type, URL, text content, flair, awards, NSFW/spoiler flags |
| π¬ Comments | ID, author, body text, score, depth, parent ID, timestamps |
| πΌοΈ Media | Image URLs, video URLs, gallery items, Reddit-hosted media |
π‘ Use cases
- π Market Research - Monitor product discussions, brand mentions, and competitor analysis
- π Sentiment Analysis - Gather opinions on topics, products, or events for NLP processing
- π° Content Curation - Collect trending posts and media for content aggregation
- π Academic Research - Build datasets for social media studies and behavioral analysis
- π― Lead Generation - Find potential customers discussing relevant topics
- π Trend Monitoring - Track emerging topics and viral content in specific communities
π οΈ How to use Reddit Scraper
- π― Enter a target - Subreddit name (e.g.,
python,askreddit) or username - π’ Set the limit - How many posts to scrape (1-1000)
- π Choose sort order -
new,hot,top, orrising - βοΈ Enable options - Comments, media extraction, or media download
- βΆοΈ Run the Actor - Results are saved to the dataset
π Example: Scrape r/python
{"target": "python","limit": 100,"sort": "hot","scrapeComments": true,"extractMediaUrls": true}
π€ Example: Scrape a user's posts
{"target": "spez","isUser": true,"limit": 50,"sort": "top","timeframe": "year"}
βοΈ Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
target | string | Subreddit name or username (required) | - |
isUser | boolean | Set true if target is a username | false |
limit | integer | Maximum posts to scrape (1-1000) | 10 |
sort | string | Sort order: new, hot, top, rising | new |
timeframe | string | Time filter for top sort: hour, day, week, month, year, all | all |
scrapeComments | boolean | Extract comments for each post | false |
commentsLimit | integer | Max comments per post (1-500) | 100 |
extractMediaUrls | boolean | Extract image/video URLs from posts | true |
downloadMedia | boolean | Download media files to Key-Value Store | false |
proxy | object | Proxy configuration (RESIDENTIAL recommended) | - |
π Proxy configuration
Reddit blocks most datacenter IPs. For reliable scraping, enable Apify Proxy with RESIDENTIAL group:
{"proxy": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "US"}}
π€ Output examples
π Post output
{"type": "post","id": "1hj2abc","title": "What's your favorite Python library?","author": "pythonista123","subreddit": "python","created_utc": "2024-12-20T15:30:00","permalink": "https://reddit.com/r/python/comments/1hj2abc/...","url": "https://reddit.com/r/python/comments/1hj2abc/...","score": 1542,"upvote_ratio": 0.96,"num_comments": 234,"num_crossposts": 5,"selftext": "I've been exploring different libraries and wanted to hear what everyone's using...","post_type": "text","is_nsfw": false,"is_spoiler": false,"flair": "Discussion","total_awards": 3,"has_media": false,"media_urls": {"images": [],"videos": [],"galleries": []}}
π¬ Comment output
{"type": "comment","comment_id": "kx7y9z","post_permalink": "https://reddit.com/r/python/comments/1hj2abc/...","post_title": "What's your favorite Python library?","parent_id": "t3_1hj2abc","author": "dev_guru","body": "Pandas is absolutely essential for data work. Can't imagine doing analysis without it.","score": 89,"created_utc": "2024-12-20T16:45:00","depth": 0,"is_submitter": false}
πΌοΈ Image post with media
{"type": "post","id": "1hk3def","title": "My home office setup","post_type": "image","has_media": true,"media_urls": {"images": ["https://i.redd.it/abc123.jpg","https://preview.redd.it/xyz789.jpg"],"videos": [],"galleries": []},"media_downloaded": true}
π° Cost estimation
This Actor uses minimal resources:
| Memory | Speed | Cost per 1000 posts |
|---|---|---|
| 128 MB | ~2 posts/sec | ~$0.05 |
| 256 MB | ~3 posts/sec | ~$0.08 |
π‘ Note: Enabling
scrapeCommentsordownloadMediaincreases runtime and cost.
β Features
| Feature | Supported |
|---|---|
| π Subreddit scraping | β Yes |
| π€ User profile scraping | β Yes |
| π¬ Comments extraction | β Yes |
| π Media URL extraction | β Yes |
| π₯ Media file download | β Yes |
| πΌοΈ Gallery support | β Yes |
| π Resume on failure | β Yes |
| π Proxy support | β Yes |
| π No login required | β Yes |
| π No API key needed | β Yes |
βοΈ How it works
- π‘ Fetches posts via Reddit's JSON API (
old.reddit.com/r/{sub}.json) - π Falls back to mirrors if Reddit blocks the request (Redlib instances)
- π Extracts structured data from JSON responses
- π¬ Optionally fetches comments for each post
- π₯ Optionally downloads media to Apify Key-Value Store
- πΎ Saves progress for resume capability on long scrapes
β οΈ Limitations
- π¦ Reddit rate limits apply (~100 requests per minute)
- π Some subreddits may be private or quarantined
- π Very old posts (>1000 in listing) may not be accessible via pagination
- π¦ Media download is limited to 5 images + 2 videos per post to manage storage
β FAQ
π Is it legal to scrape Reddit?
Web scraping publicly available data is generally legal. However, always review Reddit's Terms of Service and robots.txt. This Actor only accesses public data and respects rate limits.
π Why do I need a proxy?
Reddit blocks most datacenter IP addresses. Using Apify's RESIDENTIAL proxy group ensures reliable access.
π How is this different from Reddit's official API?
Reddit's official API requires registration, has strict rate limits (100 requests/minute for OAuth), and recently introduced paid tiers. This Actor uses public JSON endpoints with no authentication needed.
π Can I scrape private subreddits?
No, this Actor only accesses publicly available content.
π Where are downloaded media files stored?
Media files are saved to Apify's Key-Value Store with keys like media/{post_id}/image_0.jpg. You can access them via the Storage tab in Apify Console.
