Reddit Crawler avatar

Reddit Crawler

Pricing

from $1.00 / 1,000 record scrapeds

Go to Apify Store
Reddit Crawler

Reddit Crawler

Works after reddit 11/06/2026 update! Crawl and scrape Reddit subreddits, user profiles, and posts.

Pricing

from $1.00 / 1,000 record scrapeds

Rating

0.0

(0)

Developer

r. mann

r. mann

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

REDDIT API UPDATE NOTICE

This crawler is freshly built after the 11/06/2026 reddit API update, where they have made it much more difficult to scrape. This crawler does NOT care.

Reddit Scraper

Crawl and scrape Reddit: subreddits, user profiles, and posts (with their comments), and get structured JSON output.

Scope

This module does one thing well: pull posts and comments off Reddit and push them to the dataset. Point it at any mix of subreddits, users, and post URLs and it crawls each target, following listing pagination up to a configurable depth. It does not log in, post, vote, or modify anything - it is read-only and stealthy.

Capabilities

  • Crawl subreddit listings (front-page style feeds) with pagination
  • Crawl user profile listings (a user's posts and comments) with pagination
  • Crawl individual posts together with their comments
  • Configurable request cooldown, plus automatic backoff that reads Reddit's X-Ratelimit headers and slows down before you get blocked

Input schema

Provide at least one of subreddits, users, or postUrls. Everything else is optional and has sensible defaults.

{
"subreddits": [], // Subreddit names to crawl, without the "r/" prefix (e.g. "technology").
// Each entry is one subreddit.
"users": [], // Reddit usernames to crawl, without the "u/" prefix (e.g. "spez").
// Crawls that user's profile listing.
"postUrls": [], // Crawl post with comments. Acccepts Reddit post URLs or permalinks (e.g. "/r/technology/comments/abc123/title/").
// Each post is crawled together with its comments.
"maxPages": 1, // Max number of listing pages to follow per subreddit/user.
// Each page is 25 items. 1 = 25, 2 = 50, and so on.
"cooldown": 1, // Baseline delay, in seconds, between requests. Raised automatically
// when Reddit reports the rate-limit quota is running low.
"useProxy": true // Route requests through Apify proxy (recommended).
}

Output

Every crawled item is pushed to the dataset.

  • Subreddits and users push one record per post:

    {
    "id": "t3_1u41vjv", // Reddit fullname
    "url": "https://example.com/article", // The post's outbound/link URL
    "permalink": "/r/technology/comments/...",// Reddit permalink to the post
    "title": "Post title",
    "author": { "username": "username", "uri": null },
    "published": "2026-06-12T17:28:14+00:00", // ISO 8601 timestamp
    "updated": null,
    "content": "self-post text, if any", // post/comment text; null for link posts
    "thumbnail": null,
    "source": { "type": "subreddit", "name": "technology" } // where this came from
    }
  • Post URLs push one record per post, with its comments attached:

    {
    "post": { /* same shape as above */ },
    "comments": [ /* same shape, one per comment */ ],
    "source": { "type": "post", "name": "<the post url>" }
    }

    Comments share the post shape, but title is always null (comments have no title) and their text is in content.

FAQ

Does it work after the new Reddit API changes?

Yes. this is why it's here.

What does the proxy toggle do?

When useProxy is on (the default), every request is routed through Apify's residential proxies, which look like ordinary home connections and are the only reliable way to scrape Reddit at any volume. This is the recommended setting.

When it is off, requests go out directly from Apify's datacenter IPs. This is cheaper - you pay no residential proxy usage - but riskier: Reddit blocks datacenter IPs quickly, so you will likely get throttled or blocked after a small number of requests. Turn it off only for quick tests or very light runs.

It stopped returning results / I see rate-limit warnings.

You are being throttled. Increase cooldown, lower maxPages, and make sure useProxy is on. The actor already backs off automatically when Reddit signals a low quota, but a heavier run needs gentler settings.