Reddit Scraper
Pricing
from $1.00 / 1,000 results
Pricing
from $1.00 / 1,000 results
Rating
5.0
(1)
Developer

DaddyAPI
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
Reddit Scraper (Cheerio)
Fast, Robust, and Cost-Effective. Scrape Reddit posts from any subreddit with advanced sorting, smart pagination, and proxy flexibility.
π Why this scraper?
Many Reddit scrapers are slow, heavy, or get blocked easily. This actor is designed for performance and stability.
- Lightweight: Uses Cheerio (raw HTTP) instead of heavy browsers, making it 10x cheaper to run.
- Smart Pagination: Automatically reverse-engineers Reddit's internal pagination to fetch "more posts" efficiently.
- Proxy Freedom: Works with Datacenter proxies (cheap) and Residential proxies (reliable).
- Rich Data: Extracts detailed post metrics (upvotes, comments), media links, and text content.
Perfect for:
- π Trend Analysis: Monitor trending topics.
- π’ Sentiment Analysis: Analyze user discussions and opinions.
- π€ AI Training: Gather diverse text datasets for LLMs.
- π’ Brand Monitoring: Track mentions of your brand across communities.
π How to Use
Option 1: Apify Console (No Coding)
- Go to the Input tab.
- Enter the Subreddit Name (e.g.,
technology,funny,dataisbeautiful). - (Optional) Select Sort By (
Hot,New,Top,Rising). - Proxy Selection:
- Use Datacenter (Default) for speed/cost.
- Switch to Residential if you see "403 Forbidden" errors.
- Click Start.
- Download your data in JSON, CSV, or Excel.
Option 2: API (Developers)
You can trigger this actor programmatically via REST API, Python, or Node.js.
Input Payload (JSON)
{"subreddit": "technology","sort": "hot",// Options: "hot", "new", "top", "rising""maxRequestsPerCrawl": 1,// 1 Request β 1 Page (or batch of ~25 posts).// Set to 2 for ~28 posts. Set to 10 for ~200+ posts."proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"] // Optional}}
π Python Example (Simple & Clean)
This script runs the scraper and saves the results to a local file.
import jsonfrom apify_client import ApifyClient# 1. ConfigurationAPIFY_TOKEN = 'YOUR_APIFY_TOKEN'ACTOR_ID = 'daddyapi/reddit-cheerio-scraper'client = ApifyClient(APIFY_TOKEN)# 2. Define Inputrun_input = {"subreddit": "artificial","sort": "top","maxRequestsPerCrawl": 1, # Fetches approx 50-75 posts"proxyConfiguration": {"useApifyProxy": True,# Uncomment below to use Residential proxies if Datacenter gets blocked# "apifyProxyGroups": ["RESIDENTIAL"]}}print(f"π Starting scraper for r/{run_input['subreddit']}...")# 3. Run Actorrun = client.actor(ACTOR_ID).call(run_input=run_input)if not run:print("β Failed to start run.")exit(1)print(f"β Run finished! Status: {run['status']}")# 4. Fetch & Save Resultsdataset_client = client.dataset(run["defaultDatasetId"])items = dataset_client.list_items().itemsfilename = "reddit_data.json"with open(filename, "w", encoding="utf-8") as f:json.dump(items, f, indent=2, ensure_ascii=False)print(f"πΎ Saved {len(items)} posts to {filename}")
π Proxy Configuration (Bring Your Own Proxies)
This actor is fully compatible with Apify Proxy (Datacenter & Residential) and Custom Proxies.
1. Datacenter Proxies (Cost-Effective)
Great for high-volume users who want to control costs. Note that Reddit sometimes blocks these.
{"proxyConfiguration": {"useApifyProxy": true}}
2. Residential Proxies (Best Reliability)
Recommended. Residential proxies are harder to block and provide the highest success rate. Use this if you are getting empty results or 403 errors.
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
3. Bring Your Own Proxies (Custom URLs)
If you have proxies from an external provider (Webshare, BrightData, Smartproxy, etc.), you can pass the connection strings directly.
{"proxyConfiguration": {"useApifyProxy": false,"proxyUrls": ["http://username:password@my-proxy.example.com:8000","http://username:password@my-proxy-2.example.com:8000"]}}
π Data Output
The scraper returns structured data for every post:
{"post_kind": "t3","author": "TechEnthusiast","author_id": "t2_8a7b3c","time_posted": "2023-10-27T10:00:00.000Z","title": "The Future of AI in 2026","body_text": "Here is a deep dive into what we can expect...","permalink": "/r/technology/comments/18x9z/the_future_of_ai/","comment_count": "452","score": "1500","content_href": "https://i.redd.it/example_image.jpg","external_links": ["https://openai.com/blog", "https://wired.com/ai-news"]}
π‘οΈ Troubleshooting
-
Why am I only getting 3 posts?
- Reddit's initial page load is small. The scraper simulates scrolling by fetching subsequent batches.
-
"Request blocked (403)" Error?
- Reddit aggressively blocks Datacenter IPs.
- Fix: Switch to Residential Proxies in the input configuration.
-
Scraper stops early?
- Ensure you have enough memory allocated (256MB is usually enough, but 512MB is safer for very long crawls).
βοΈ Legal & Ethics
This scraper is for educational and analytical purposes. Please respect Reddit's Terms of Service and robots.txt. Do not use this tool to spam communities or overload their servers. Use responsible rate limits.