⚑ Reddit Lightning Scraper avatar
⚑ Reddit Lightning Scraper

Pricing

from $1.40 / 1,000 results

Go to Apify Store
⚑ Reddit Lightning Scraper

⚑ Reddit Lightning Scraper

⚑ Fast - your key advantage (no browser) πŸ”‘ No API - differentiator from official API scrapers

Pricing

from $1.40 / 1,000 results

Rating

0.0

(0)

Developer

AbotAPI

AbotAPI

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

16 days ago

Last modified

Share

πŸ€– Reddit Scraper

Scrape Reddit posts, comments, and media from any subreddit or user profile. πŸ”“ No login required, no API keys needed.

πŸš€ What does Reddit Scraper do?

Reddit Scraper is a lightweight Apify Actor that extracts posts and comments from Reddit without using a browser. It uses Reddit's JSON API endpoints directly, making it fast and resource-efficient (runs on just 128-256MB memory).

Unlike browser-based scrapers, this Actor:

  • ⚑ Starts instantly (no browser launch time)
  • πŸ’Ύ Uses minimal compute resources
  • πŸ”„ Handles rate limiting automatically
  • πŸ’ͺ Supports resume on failure

πŸ“Š What data can you extract?

Data TypeFields Extracted
πŸ“ PostsID, title, author, subreddit, score, upvote ratio, comments count, post type, URL, text content, flair, awards, NSFW/spoiler flags
πŸ’¬ CommentsID, author, body text, score, depth, parent ID, timestamps
πŸ–ΌοΈ MediaImage URLs, video URLs, gallery items, Reddit-hosted media

πŸ’‘ Use cases

  • πŸ“ˆ Market Research - Monitor product discussions, brand mentions, and competitor analysis
  • 🎭 Sentiment Analysis - Gather opinions on topics, products, or events for NLP processing
  • πŸ“° Content Curation - Collect trending posts and media for content aggregation
  • πŸŽ“ Academic Research - Build datasets for social media studies and behavioral analysis
  • 🎯 Lead Generation - Find potential customers discussing relevant topics
  • πŸ“Š Trend Monitoring - Track emerging topics and viral content in specific communities

πŸ› οΈ How to use Reddit Scraper

  1. 🎯 Enter a target - Subreddit name (e.g., python, askreddit) or username
  2. πŸ”’ Set the limit - How many posts to scrape (1-1000)
  3. πŸ“‘ Choose sort order - new, hot, top, or rising
  4. βš™οΈ Enable options - Comments, media extraction, or media download
  5. ▢️ Run the Actor - Results are saved to the dataset

πŸ“Œ Example: Scrape r/python

{
"target": "python",
"limit": 100,
"sort": "hot",
"scrapeComments": true,
"extractMediaUrls": true
}

πŸ‘€ Example: Scrape a user's posts

{
"target": "spez",
"isUser": true,
"limit": 50,
"sort": "top",
"timeframe": "year"
}

βš™οΈ Input parameters

ParameterTypeDescriptionDefault
targetstringSubreddit name or username (required)-
isUserbooleanSet true if target is a usernamefalse
limitintegerMaximum posts to scrape (1-1000)10
sortstringSort order: new, hot, top, risingnew
timeframestringTime filter for top sort: hour, day, week, month, year, allall
scrapeCommentsbooleanExtract comments for each postfalse
commentsLimitintegerMax comments per post (1-500)100
extractMediaUrlsbooleanExtract image/video URLs from poststrue
downloadMediabooleanDownload media files to Key-Value Storefalse
proxyobjectProxy configuration (RESIDENTIAL recommended)-

🌐 Proxy configuration

Reddit blocks most datacenter IPs. For reliable scraping, enable Apify Proxy with RESIDENTIAL group:

{
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}
}

πŸ“€ Output examples

πŸ“ Post output

{
"type": "post",
"id": "1hj2abc",
"title": "What's your favorite Python library?",
"author": "pythonista123",
"subreddit": "python",
"created_utc": "2024-12-20T15:30:00",
"permalink": "https://reddit.com/r/python/comments/1hj2abc/...",
"url": "https://reddit.com/r/python/comments/1hj2abc/...",
"score": 1542,
"upvote_ratio": 0.96,
"num_comments": 234,
"num_crossposts": 5,
"selftext": "I've been exploring different libraries and wanted to hear what everyone's using...",
"post_type": "text",
"is_nsfw": false,
"is_spoiler": false,
"flair": "Discussion",
"total_awards": 3,
"has_media": false,
"media_urls": {
"images": [],
"videos": [],
"galleries": []
}
}

πŸ’¬ Comment output

{
"type": "comment",
"comment_id": "kx7y9z",
"post_permalink": "https://reddit.com/r/python/comments/1hj2abc/...",
"post_title": "What's your favorite Python library?",
"parent_id": "t3_1hj2abc",
"author": "dev_guru",
"body": "Pandas is absolutely essential for data work. Can't imagine doing analysis without it.",
"score": 89,
"created_utc": "2024-12-20T16:45:00",
"depth": 0,
"is_submitter": false
}

πŸ–ΌοΈ Image post with media

{
"type": "post",
"id": "1hk3def",
"title": "My home office setup",
"post_type": "image",
"has_media": true,
"media_urls": {
"images": [
"https://i.redd.it/abc123.jpg",
"https://preview.redd.it/xyz789.jpg"
],
"videos": [],
"galleries": []
},
"media_downloaded": true
}

πŸ’° Cost estimation

This Actor uses minimal resources:

MemorySpeedCost per 1000 posts
128 MB~2 posts/sec~$0.05
256 MB~3 posts/sec~$0.08

πŸ’‘ Note: Enabling scrapeComments or downloadMedia increases runtime and cost.

βœ… Features

FeatureSupported
🏠 Subreddit scrapingβœ… Yes
πŸ‘€ User profile scrapingβœ… Yes
πŸ’¬ Comments extractionβœ… Yes
πŸ”— Media URL extractionβœ… Yes
πŸ“₯ Media file downloadβœ… Yes
πŸ–ΌοΈ Gallery supportβœ… Yes
πŸ”„ Resume on failureβœ… Yes
🌐 Proxy supportβœ… Yes
πŸ”“ No login requiredβœ… Yes
πŸ”‘ No API key neededβœ… Yes

βš™οΈ How it works

  1. πŸ“‘ Fetches posts via Reddit's JSON API (old.reddit.com/r/{sub}.json)
  2. πŸ”€ Falls back to mirrors if Reddit blocks the request (Redlib instances)
  3. πŸ” Extracts structured data from JSON responses
  4. πŸ’¬ Optionally fetches comments for each post
  5. πŸ“₯ Optionally downloads media to Apify Key-Value Store
  6. πŸ’Ύ Saves progress for resume capability on long scrapes

⚠️ Limitations

  • 🚦 Reddit rate limits apply (~100 requests per minute)
  • πŸ”’ Some subreddits may be private or quarantined
  • πŸ“œ Very old posts (>1000 in listing) may not be accessible via pagination
  • πŸ“¦ Media download is limited to 5 images + 2 videos per post to manage storage

❓ FAQ

Web scraping publicly available data is generally legal. However, always review Reddit's Terms of Service and robots.txt. This Actor only accesses public data and respects rate limits.

🌐 Why do I need a proxy?

Reddit blocks most datacenter IP addresses. Using Apify's RESIDENTIAL proxy group ensures reliable access.

πŸ”„ How is this different from Reddit's official API?

Reddit's official API requires registration, has strict rate limits (100 requests/minute for OAuth), and recently introduced paid tiers. This Actor uses public JSON endpoints with no authentication needed.

πŸ”’ Can I scrape private subreddits?

No, this Actor only accesses publicly available content.

πŸ“ Where are downloaded media files stored?

Media files are saved to Apify's Key-Value Store with keys like media/{post_id}/image_0.jpg. You can access them via the Storage tab in Apify Console.