Reddit Scraper avatar
Reddit Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Reddit Scraper

Reddit Scraper

Cheap, fast and reliable . Bring your own proxies

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

DaddyAPI

DaddyAPI

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Reddit Scraper (Cheerio)

Fast, Robust, and Cost-Effective. Scrape Reddit posts from any subreddit with advanced sorting, smart pagination, and proxy flexibility.

Apify Actor Node.js

πŸš€ Why this scraper?

Many Reddit scrapers are slow, heavy, or get blocked easily. This actor is designed for performance and stability.

  1. Lightweight: Uses Cheerio (raw HTTP) instead of heavy browsers, making it 10x cheaper to run.
  2. Smart Pagination: Automatically reverse-engineers Reddit's internal pagination to fetch "more posts" efficiently.
  3. Proxy Freedom: Works with Datacenter proxies (cheap) and Residential proxies (reliable).
  4. Rich Data: Extracts detailed post metrics (upvotes, comments), media links, and text content.

Perfect for:

  • πŸ“ˆ Trend Analysis: Monitor trending topics.
  • πŸ“’ Sentiment Analysis: Analyze user discussions and opinions.
  • πŸ€– AI Training: Gather diverse text datasets for LLMs.
  • πŸ“’ Brand Monitoring: Track mentions of your brand across communities.

πŸ“– How to Use

Option 1: Apify Console (No Coding)

  1. Go to the Input tab.
  2. Enter the Subreddit Name (e.g., technology, funny, dataisbeautiful).
  3. (Optional) Select Sort By (Hot, New, Top, Rising).
  4. Proxy Selection:
    • Use Datacenter (Default) for speed/cost.
    • Switch to Residential if you see "403 Forbidden" errors.
  5. Click Start.
  6. Download your data in JSON, CSV, or Excel.

Option 2: API (Developers)

You can trigger this actor programmatically via REST API, Python, or Node.js.

Input Payload (JSON)

{
"subreddit": "technology",
"sort": "hot",
// Options: "hot", "new", "top", "rising"
"maxRequestsPerCrawl": 1,
// 1 Request β‰ˆ 1 Page (or batch of ~25 posts).
// Set to 2 for ~28 posts. Set to 10 for ~200+ posts.
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"] // Optional
}
}

🐍 Python Example (Simple & Clean)

This script runs the scraper and saves the results to a local file.

import json
from apify_client import ApifyClient
# 1. Configuration
APIFY_TOKEN = 'YOUR_APIFY_TOKEN'
ACTOR_ID = 'daddyapi/reddit-cheerio-scraper'
client = ApifyClient(APIFY_TOKEN)
# 2. Define Input
run_input = {
"subreddit": "artificial",
"sort": "top",
"maxRequestsPerCrawl": 1, # Fetches approx 50-75 posts
"proxyConfiguration": {
"useApifyProxy": True,
# Uncomment below to use Residential proxies if Datacenter gets blocked
# "apifyProxyGroups": ["RESIDENTIAL"]
}
}
print(f"πŸš€ Starting scraper for r/{run_input['subreddit']}...")
# 3. Run Actor
run = client.actor(ACTOR_ID).call(run_input=run_input)
if not run:
print("❌ Failed to start run.")
exit(1)
print(f"βœ… Run finished! Status: {run['status']}")
# 4. Fetch & Save Results
dataset_client = client.dataset(run["defaultDatasetId"])
items = dataset_client.list_items().items
filename = "reddit_data.json"
with open(filename, "w", encoding="utf-8") as f:
json.dump(items, f, indent=2, ensure_ascii=False)
print(f"πŸ’Ύ Saved {len(items)} posts to {filename}")

πŸ”’ Proxy Configuration (Bring Your Own Proxies)

This actor is fully compatible with Apify Proxy (Datacenter & Residential) and Custom Proxies.

1. Datacenter Proxies (Cost-Effective)

Great for high-volume users who want to control costs. Note that Reddit sometimes blocks these.

{
"proxyConfiguration": {
"useApifyProxy": true
}
}

2. Residential Proxies (Best Reliability)

Recommended. Residential proxies are harder to block and provide the highest success rate. Use this if you are getting empty results or 403 errors.

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

3. Bring Your Own Proxies (Custom URLs)

If you have proxies from an external provider (Webshare, BrightData, Smartproxy, etc.), you can pass the connection strings directly.

{
"proxyConfiguration": {
"useApifyProxy": false,
"proxyUrls": [
"http://username:password@my-proxy.example.com:8000",
"http://username:password@my-proxy-2.example.com:8000"
]
}
}

πŸ“Š Data Output

The scraper returns structured data for every post:

{
"post_kind": "t3",
"author": "TechEnthusiast",
"author_id": "t2_8a7b3c",
"time_posted": "2023-10-27T10:00:00.000Z",
"title": "The Future of AI in 2026",
"body_text": "Here is a deep dive into what we can expect...",
"permalink": "/r/technology/comments/18x9z/the_future_of_ai/",
"comment_count": "452",
"score": "1500",
"content_href": "https://i.redd.it/example_image.jpg",
"external_links": ["https://openai.com/blog", "https://wired.com/ai-news"]
}

πŸ›‘οΈ Troubleshooting

  • Why am I only getting 3 posts?

    • Reddit's initial page load is small. The scraper simulates scrolling by fetching subsequent batches.
  • "Request blocked (403)" Error?

    • Reddit aggressively blocks Datacenter IPs.
    • Fix: Switch to Residential Proxies in the input configuration.
  • Scraper stops early?

    • Ensure you have enough memory allocated (256MB is usually enough, but 512MB is safer for very long crawls).

This scraper is for educational and analytical purposes. Please respect Reddit's Terms of Service and robots.txt. Do not use this tool to spam communities or overload their servers. Use responsible rate limits.