Reddit Subreddit Scraper avatar
Reddit Subreddit Scraper

Pricing

Pay per usage

Go to Apify Store
Reddit Subreddit Scraper

Reddit Subreddit Scraper

Reddit Subreddit Scraper is your plug-and-play radar for Reddit communities: it harvests fresh stats from 100+ subreddits via Apify Residential proxies, returns clean JSON, and drops straight into AI pipelines or dashboards within minutes.

Pricing

Pay per usage

Rating

5.0

(2)

Developer

D

D

Maintained by Community

Actor stats

2

Bookmarked

3

Total users

1

Monthly active users

8 days ago

Last modified

Share

Reddit Subreddit Scraper & Proxy Pipeline

Apify Actor Badge

Fetch live subreddit metadata with built-in ban resistance, smart fallbacks, and automatic proxy rotation.

This Actor is designed to be the most reliable way to check subreddit details (subscribers, title, description, active users, etc.) without getting blocked. It intelligently switches between Reddit's API and web endpoints to ensure data delivery.


🚀 Key Features

  • 🛡️ Smart Ban Resistance: Automatically rotates Apify Residential Proxies for every single request.
  • 🔄 Dual-Mode Fetching: Tries the official Reddit API first. If blocked (403/429), it seamlessly falls back to web scraping (/r/name/about) to get the data.
  • ⚡ High Performance: Built on asyncio and httpx (HTTP/2) for maximum concurrency and speed.
  • 🔗 Flexible Inputs: Accepts any format:
    • Subreddit names: r/AskReddit, AskReddit
    • Full URLs: https://www.reddit.com/r/Python
    • Reddit Fullnames: t5_2qh1i
  • 📊 Rich Diagnostics: Returns detailed stats for every item, including HTTP status codes, attempt counts, and whether the fallback was used.

📥 Input Configuration

The Actor accepts a simple JSON input. You only need to provide the list of subreddits.

Example Input

{
"subreddits": [
"r/AskReddit",
"https://www.reddit.com/r/machinelearning",
"t5_2qh1i"
]
}
FieldTypeDescription
subredditsArrayRequired. List of subreddits to fetch. Supports r/name, URLs, or t5_ IDs.
proxyConfigurationObjectOptional. If omitted, the Actor enables Apify Residential proxies automatically (recommended for Reddit).

📤 Output Data

The results are stored in the default Apify Dataset. Each item represents one subreddit and contains the full response from Reddit.

Success Example

{
"input_raw": "r/AskReddit",
"input_type": "name",
"identifier_value": "AskReddit",
"status": "success",
"http_status": 200,
"response": {
"kind": "t5",
"data": {
"display_name": "AskReddit",
"title": "Ask Reddit...",
"subscribers": 57140321,
"active_user_count": 84210,
"public_description": "r/AskReddit is the place to ask and answer thought-provoking questions.",
"created_utc": 1201233135.0
}
},
"used_web_fallback": false,
"attempts_used": 1
}

Error Example

If a subreddit cannot be reached after multiple retries, it is marked as an error:

{
"input_raw": "r/NonExistentSub123",
"status": "error",
"http_status": 404,
"error": "404 Client Error: Not Found for url: https://www.reddit.com/r/NonExistentSub123/about.json",
"attempts_used": 6
}

💡 Tips & Tricks

  • Use Residential Proxies: Reddit is very strict with datacenter IPs. This Actor is pre-configured to use Residential proxies for the best success rate.
  • Batch Processing: You can pass thousands of subreddits in a single run. The Actor handles concurrency automatically.
  • Monitoring: The output includes attempts_used and used_web_fallback. If you see used_web_fallback: true often, it means the API is blocking requests, but the Actor is successfully bypassing it via the web interface.

License

Apache 2.0