Reddit Crawler
Pricing
from $1.00 / 1,000 record scrapeds
Reddit Crawler
Works after reddit 11/06/2026 update! Crawl and scrape Reddit subreddits, user profiles, and posts.
Pricing
from $1.00 / 1,000 record scrapeds
Rating
0.0
(0)
Developer
r. mann
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
REDDIT API UPDATE NOTICE
This crawler is freshly built after the 11/06/2026 reddit API update, where they have made it much more difficult to scrape. This crawler does NOT care.
Reddit Scraper
Crawl and scrape Reddit: subreddits, user profiles, and posts (with their comments), and get structured JSON output.
Scope
This module does one thing well: pull posts and comments off Reddit and push them to the dataset. Point it at any mix of subreddits, users, and post URLs and it crawls each target, following listing pagination up to a configurable depth. It does not log in, post, vote, or modify anything - it is read-only and stealthy.
Capabilities
- Crawl subreddit listings (front-page style feeds) with pagination
- Crawl user profile listings (a user's posts and comments) with pagination
- Crawl individual posts together with their comments
- Configurable request cooldown, plus automatic backoff that reads Reddit's
X-Ratelimitheaders and slows down before you get blocked
Input schema
Provide at least one of subreddits, users, or postUrls. Everything else is
optional and has sensible defaults.
{"subreddits": [], // Subreddit names to crawl, without the "r/" prefix (e.g. "technology").// Each entry is one subreddit."users": [], // Reddit usernames to crawl, without the "u/" prefix (e.g. "spez").// Crawls that user's profile listing."postUrls": [], // Crawl post with comments. Acccepts Reddit post URLs or permalinks (e.g. "/r/technology/comments/abc123/title/").// Each post is crawled together with its comments."maxPages": 1, // Max number of listing pages to follow per subreddit/user.// Each page is 25 items. 1 = 25, 2 = 50, and so on."cooldown": 1, // Baseline delay, in seconds, between requests. Raised automatically// when Reddit reports the rate-limit quota is running low."useProxy": true // Route requests through Apify proxy (recommended).}
Output
Every crawled item is pushed to the dataset.
-
Subreddits and users push one record per post:
{"id": "t3_1u41vjv", // Reddit fullname"url": "https://example.com/article", // The post's outbound/link URL"permalink": "/r/technology/comments/...",// Reddit permalink to the post"title": "Post title","author": { "username": "username", "uri": null },"published": "2026-06-12T17:28:14+00:00", // ISO 8601 timestamp"updated": null,"content": "self-post text, if any", // post/comment text; null for link posts"thumbnail": null,"source": { "type": "subreddit", "name": "technology" } // where this came from} -
Post URLs push one record per post, with its comments attached:
{"post": { /* same shape as above */ },"comments": [ /* same shape, one per comment */ ],"source": { "type": "post", "name": "<the post url>" }}Comments share the post shape, but
titleis always null (comments have no title) and their text is incontent.
FAQ
Does it work after the new Reddit API changes?
Yes. this is why it's here.
What does the proxy toggle do?
When useProxy is on (the default), every request is routed through
Apify's residential proxies, which look like ordinary home connections and are
the only reliable way to scrape Reddit at any volume. This is the recommended
setting.
When it is off, requests go out directly from Apify's datacenter IPs. This is cheaper - you pay no residential proxy usage - but riskier: Reddit blocks datacenter IPs quickly, so you will likely get throttled or blocked after a small number of requests. Turn it off only for quick tests or very light runs.
It stopped returning results / I see rate-limit warnings.
You are being throttled. Increase cooldown, lower maxPages, and make sure
useProxy is on. The actor already backs off automatically when
Reddit signals a low quota, but a heavier run needs gentler settings.
