Reddit Scraper
Under maintenancePricing
from $0.60 / 1,000 posts
Go to Apify Store
Reddit Scraper
Under maintenanceExtract posts, comments, user profiles, and search results from Reddit. Pure HTTP, no API key required.
Reddit Scraper
Under maintenancePricing
from $0.60 / 1,000 posts
Extract posts, comments, user profiles, and search results from Reddit. Pure HTTP, no API key required.
All notable changes to this project are documented here. Format follows Keep a Changelog .
HttpCrawler → CheerioCrawler with a generated Chrome browser fingerprint.redditClientId / redditClientSecret and the REDDIT_CLIENT_* env fallback are gone). No Reddit account or API key is needed.Retry-After. The fail-loud guard from 0.3 is kept — a run blocked on every request fails loudly instead of reporting an empty success; a run that reaches Reddit (even if all posts are filtered) succeeds.upvoteRatio, totalAwards, and subredditSubscribers are no longer available from HTML (set to 0 / 0 / null); gallery imageUrls is best-effort.?limit=500). "Load more comments" / deeply collapsed threads use an AJAX endpoint Reddit blocks, so the deepest tails are not retrieved.selfText). Self-text + comments are populated when scraping a post URL directly, with includeComments=true, or for the AI output formats (all fetch the post page). Keyword filtering on listings matches the title..json API — every www.reddit.com/*.json request (and old.reddit.com/*.json) now returns an HTTP 403 "blocked by network security" HTML page, regardless of User-Agent, browser fingerprint, cookies, or proxy IP (the block is endpoint-level, so residential proxies don't help). The actor treated those 403s as benign skips (ignoreHttpErrorStatusCodes: [403] + a "403 = private subreddit" assumption + non-JSON bodies producing only a warning + no zero-result guard), so every run drained its queue and exited successfully with an empty dataset.oauth.reddit.com), which returns the identical Listing/Thing JSON the parsers already consume — so post/comment mapping is unchanged. Requests carry a bearer token obtained via the application-only (client_credentials) grant.redditClientId / redditClientSecret) or, as a fallback, from REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET environment variables (so a maintainer can set one shared app via Apify secrets and keep end users key-free). Credentials are validated up front; a missing or invalid pair fails the run immediately with setup instructions (https://www.reddit.com/prefs/apps ).Retry-After. If a run produces zero items because requests were blocked or rejected, it now calls Actor.fail() with a diagnostic instead of reporting an empty success. A legitimately empty source (valid but no matching posts) still succeeds.Actor.openDataset('posts') / 'comments'), which made the Apify Console "Storage" tab, run.defaultDatasetId API access, and standard SDK smoke tests all see an empty dataset even though items were silently accumulating on the user's account-wide named datasets across runs. Both record types now share the per-run default dataset and are discriminated by the existing type: 'post' | 'comment' field on each record.dataset_schema.json now declares both post and comment field shapes and adds a "Comments" view alongside the existing "Posts" view. The type enum now includes comment.Actor.charge('actor_start') call in src/main.ts — Apify Console uses the synthetic apify-actor-start event which fires automatically. The explicit call was logging an unknown-event warning per run without doing anything useful.default (standard JSON), jsonl-finetune (OpenAI chat-format SFT records), rag-markdown (vector-DB-ready markdown documents with stable chunkId).HttpCrawler (no headless browser).automation-lab/reddit-scraper rates as of 2026-04-18.maxCommentsPerPost ≤ 1000 to bound per-run proxy/compute cost.Retry-After headers; session retirement on persistent blocks.actor_start after input validation (so failed-validation runs don't bill); post and comment charges only after successful dataset writes; comment charges suppressed for AI formats (comments are bundled into the post record).