Reddit Subreddit Scraper
Pricing
Pay per usage
Reddit Subreddit Scraper
Reddit Subreddit Scraper is your plug-and-play radar for Reddit communities: it harvests fresh stats from 100+ subreddits via Apify Residential proxies, returns clean JSON, and drops straight into AI pipelines or dashboards within minutes.
Pricing
Pay per usage
Rating
5.0
(2)
Developer

D
Actor stats
2
Bookmarked
3
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Reddit Subreddit Scraper & Proxy Pipeline
Fetch live subreddit metadata with built-in ban resistance, smart fallbacks, and automatic proxy rotation.
This Actor is designed to be the most reliable way to check subreddit details (subscribers, title, description, active users, etc.) without getting blocked. It intelligently switches between Reddit's API and web endpoints to ensure data delivery.
🚀 Key Features
- 🛡️ Smart Ban Resistance: Automatically rotates Apify Residential Proxies for every single request.
- 🔄 Dual-Mode Fetching: Tries the official Reddit API first. If blocked (403/429), it seamlessly falls back to web scraping (
/r/name/about) to get the data. - ⚡ High Performance: Built on
asyncioandhttpx(HTTP/2) for maximum concurrency and speed. - 🔗 Flexible Inputs: Accepts any format:
- Subreddit names:
r/AskReddit,AskReddit - Full URLs:
https://www.reddit.com/r/Python - Reddit Fullnames:
t5_2qh1i
- Subreddit names:
- 📊 Rich Diagnostics: Returns detailed stats for every item, including HTTP status codes, attempt counts, and whether the fallback was used.
📥 Input Configuration
The Actor accepts a simple JSON input. You only need to provide the list of subreddits.
Example Input
{"subreddits": ["r/AskReddit","https://www.reddit.com/r/machinelearning","t5_2qh1i"]}
| Field | Type | Description |
|---|---|---|
subreddits | Array | Required. List of subreddits to fetch. Supports r/name, URLs, or t5_ IDs. |
proxyConfiguration | Object | Optional. If omitted, the Actor enables Apify Residential proxies automatically (recommended for Reddit). |
📤 Output Data
The results are stored in the default Apify Dataset. Each item represents one subreddit and contains the full response from Reddit.
Success Example
{"input_raw": "r/AskReddit","input_type": "name","identifier_value": "AskReddit","status": "success","http_status": 200,"response": {"kind": "t5","data": {"display_name": "AskReddit","title": "Ask Reddit...","subscribers": 57140321,"active_user_count": 84210,"public_description": "r/AskReddit is the place to ask and answer thought-provoking questions.","created_utc": 1201233135.0}},"used_web_fallback": false,"attempts_used": 1}
Error Example
If a subreddit cannot be reached after multiple retries, it is marked as an error:
{"input_raw": "r/NonExistentSub123","status": "error","http_status": 404,"error": "404 Client Error: Not Found for url: https://www.reddit.com/r/NonExistentSub123/about.json","attempts_used": 6}
💡 Tips & Tricks
- Use Residential Proxies: Reddit is very strict with datacenter IPs. This Actor is pre-configured to use Residential proxies for the best success rate.
- Batch Processing: You can pass thousands of subreddits in a single run. The Actor handles concurrency automatically.
- Monitoring: The output includes
attempts_usedandused_web_fallback. If you seeused_web_fallback: trueoften, it means the API is blocking requests, but the Actor is successfully bypassing it via the web interface.
License
Apache 2.0