Reddit Subreddit Scraper avatar

Reddit Subreddit Scraper

Pricing

$1.50 / 1,000 results

Go to Apify Store
Reddit Subreddit Scraper

Reddit Subreddit Scraper

Extract Reddit posts from any subreddit without an API key. Get titles, scores, authors, comment counts, flairs, and URLs from old.reddit.com.

Pricing

$1.50 / 1,000 results

Rating

0.0

(0)

Developer

Casey Marsh

Casey Marsh

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 hours ago

Last modified

Share

Extract posts from any public subreddit on Reddit with zero API authentication required. Get post titles, scores, authors, comment counts, flairs, domains, and full permalinks using old.reddit.com for reliable, fast, and lightweight scraping. Ideal for social media monitoring, content research, and sentiment analysis.

Summary

The Reddit Subreddit Scraper is a production-grade Apify actor that extracts posts from any public subreddit without requiring a Reddit API key, app registration, or OAuth token. It uses old.reddit.com — Reddit's lightweight, server-rendered interface — which is far more scraper-friendly than the modern React-based Reddit frontend.

Built on Crawlee's CheerioCrawler with Apify residential proxy rotation, this actor handles Reddit's rate limiting gracefully, retries failed requests automatically, and extracts rich metadata including upvote ratios, NSFW flags, sticky post detection, award counts, and post flairs. Whether you need 10 posts or 500, the actor paginates automatically to collect your requested volume.

How It Works

  1. Input: You provide a subreddit name, sort order, and maximum post count.
  2. URL Construction: The actor builds the correct old.reddit.com/r/{subreddit}/{sort}/ URL — old Reddit renders complete HTML without JavaScript, making it ideal for Cheerio-based scraping.
  3. Rate Limit Detection: On each page load, the actor checks for Reddit's rate limit or "blocked" pages and re-throws the error to trigger a retry with a fresh residential proxy IP.
  4. Post Extraction: Each post (.thing.link) is parsed for title, URL, author, score, comment count, flair, domain, timestamp, and more using multiple CSS selector fallbacks.
  5. Session Management: A Crawlee session pool rotates user agents and cookies to reduce fingerprinting.
  6. Pagination: The actor continuously scrapes until it reaches your maxPosts limit.
  7. Output: Clean, structured JSON saved to your Apify dataset with ISO 8601 timestamps.

Input Parameters

FieldTypeRequiredDefaultDescription
subredditstringNopopularSubreddit name without the r/ prefix (e.g. australia, programming, worldnews)
sortstringNohotSort order: hot, new, top, or rising
maxPostsintegerNo50Maximum number of posts to extract (1–500)
includeCommentsbooleanNofalseWhether to also scrape comments from each post (adds significant runtime)

Output Example

{
"title": "What's the most interesting fact you learned this week?",
"url": "https://example.com/article",
"author": "curious_user42",
"subreddit": "popular",
"score": 15420,
"commentCount": 2301,
"flair": "Discussion",
"postedAt": "2026-07-04T08:00:00.000Z",
"domain": "self.popular",
"redditUrl": "https://old.reddit.com/r/popular/comments/abc123/",
"isNSFW": false,
"isSticky": false,
"awards": 3,
"upvoteRatio": "0.92",
"sort": "hot",
"scrapedAt": "2026-07-04T10:30:00.000Z"
}

Pricing

This actor uses Apify's pay-per-result model. You only pay for the posts you successfully extract. No monthly subscriptions, no Reddit API costs, no minimums. A typical run of 50 posts from a single subreddit costs a fraction of an Apify platform credit.

Because the actor uses old.reddit.com (static HTML served directly from Reddit's servers without JavaScript rendering), it is extremely efficient — no headless browser overhead. Residential proxies provide reliability against rate limiting, but you can switch to datacenter proxies for lower costs if your use case allows occasional blocks.

Use Cases

  • Social Media Monitoring: Track trending topics, brand mentions, and community discussions across multiple subreddits. Monitor sentiment around products, companies, or public figures in real time.
  • Content Research and Curation: Discover viral content, identify trending formats, and understand what resonates with specific communities. Source content ideas for blogs, newsletters, and social media channels.
  • Community Sentiment Analysis: Analyze post titles, flairs, and scores to gauge community sentiment on specific topics. Feed scraped data into NLP pipelines for large-scale sentiment tracking.
  • Data Collection for Machine Learning: Build training datasets for text classification, toxicity detection, or recommendation systems using Reddit's diverse, community-labeled content.
  • Competitor Research: Monitor competitor subreddits, track product announcement threads, and analyze community engagement patterns.
  • Market Research: Understand consumer pain points, feature requests, and product discussions within niche communities relevant to your industry.

FAQ

Q: Do I need a Reddit API key or OAuth token? A: No. This actor scrapes publicly available pages on old.reddit.com. No Reddit account, API key, or authentication is required. This is one of its key advantages over the official Reddit API, which requires app registration and has stricter rate limits on the free tier.

Q: Why use old.reddit.com instead of the new Reddit? A: Old Reddit renders complete HTML server-side with predictable CSS classes (.thing.link, .score.unvoted, .linkflairlabel). The new Reddit is a React SPA that requires JavaScript execution, making it much slower and more expensive to scrape. Old Reddit is also less aggressively rate-limited.

Q: What if a subreddit is private or banned? A: The actor can only scrape public subreddits. If a subreddit is private, banned, or quarantined, the request will fail, and an error record will be saved to the dataset.

Q: Can I scrape comments too? A: Basic comment count extraction is included. For full comment scraping (comment bodies, nested threads), set includeComments to true. Note this significantly increases runtime and data volume, as each post spawns additional requests.

Q: How does the actor handle Reddit's rate limiting? A: When a rate limit (HTTP 429) or block page is detected — identified by checking the page title — the error is re-thrown to Crawlee's retry mechanism, which rotates to a fresh residential proxy IP and retries the request automatically. Up to 4 retries are attempted.

Q: Is it legal to scrape Reddit? A: This actor scrapes publicly accessible pages. You are responsible for complying with Reddit's terms of service, robots.txt, and applicable laws. For production use at scale, review Reddit's API terms and consider using the official API for sensitive data.


Actor ID: reddit-subreddit-scraper · Runtime: Node.js 20 · Type: CheerioCrawler