Reddit Subreddit Scraper
Pricing
Pay per event
Reddit Subreddit Scraper
Scrape and download posts from any subreddit — hot, new, top, rising, or controversial — export to JSON or CSV. We rotate Firefox/Safari fingerprints, route through residential proxies, and retry on Reddit's 429s. Rows: title, author, score, upvote ratio, comments, NSFW, URL, selftext, posted-at.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
19 hours ago
Last modified
Categories
Share
🎯 What this scrapes
Reddit is one of the more aggressively guarded sources on the web — 429 storms, fingerprint checks, IP throttling, and a rate-limited OAuth tier that small teams can't absorb. This reddit subreddit scraper handles all of that for you. We rotate Firefox + Safari TLS fingerprints (Chrome impersonation gets 403'd from datacenter IPs), rotate residential proxy sessions on every block, and walk the subreddits you pick to write one typed row per post. Self-text is preserved as Markdown so downstream pipelines don't need to re-parse anything. You bring the subreddit list; we handle the rest.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Firefox / Safari TLS handshakes so the target sees a genuine browser, not a Python script. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterheader honoured. - 🧱 Rate-limit-aware pacing — when Reddit pushes back, we slow down and resume instead of letting the run die.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that land in your dataset. No data, no charge.
💡 Use cases
- Community monitoring — diff
r/<your-product>daily and surface hot threads to your Slack or dashboard before they blow up. - Content discovery — pull the top 100 posts from
r/MachineLearningweekly to feed a research digest or newsletter. - NLP corpus building — sweep dozens of hobby subreddits for a labelled training dataset; combine with the sibling comment-scraper for full thread context.
- Brand mention scanning — schedule hourly runs on brand-adjacent subreddits and pipe results into your alerting stack.
- Journalism and OSINT — export public-subreddit posts for investigative research or trend analysis, no Reddit OAuth required.
- Sentiment baselines — store post score and comment counts over time to build a sentiment time-series for any topic.
⚙️ How to use it
- Click Try for free at the top of the page.
- Enter one or more subreddit names (without the
r/prefix), pick a listing mode, and set a result cap. - Click Start. Output streams into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
subreddits | array | yes | ["programming"] | List of subreddit names (without the r/ prefix). One per line. Multiple subreddits are scraped in sequence. |
mode | string | no | "hot" | Which Reddit listing to read: hot, new, top, rising, or controversial. |
timeFilter | string | no | "day" | Applies only to top and controversial modes: hour, day, week, month, year, or all. |
maxResults | integer | no | 100 | Maximum posts to return per subreddit. Reddit caps any listing at 1 000 items. Set 0 to scrape until exhausted (still capped at 1 000). |
includeSelftext | boolean | no | true | When true, the post body (Markdown) is included for text posts. |
proxyConfiguration | object | no | {"useApifyProxy": true} | Apify Proxy config. Reddit rate-limits hot IPs aggressively — residential proxy rotation is the first mitigation we apply on blocks. |
Example input
{"subreddits": ["programming"],"mode": "hot","maxResults": 3,"includeSelftext": false,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Each run produces a subreddit posts dataset — one item per post, schema-validated and ready to query.
| Field | Type | Notes |
|---|---|---|
id | string | Reddit post fullname (e.g. t3_abc123). |
post_id | string | Short id (the abc123 part). |
subreddit | string | Subreddit name (without r/). |
title | string | Post title. |
author | string | Author username, or [deleted]. |
url | string | Outbound URL (or the Reddit permalink for self-posts). |
permalink | string | Canonical Reddit permalink (https://reddit.com/r/.../comments/...). |
selftext | string | null | Self-post body (Markdown). Null when includeSelftext is false or the post is a link post. |
score | integer | Upvotes minus downvotes. |
upvote_ratio | number | Approximate ratio (0.0 – 1.0). |
num_comments | integer | Comment count. |
over_18 | boolean | NSFW flag. |
spoiler | boolean | Marked as spoiler. |
stickied | boolean | Pinned by mods. |
locked | boolean | Comments locked. |
post_hint | string | null | Reddit's content classification (image, link, video, self, etc.). |
flair | string | null | Post flair text. |
created_utc | integer | Unix timestamp (seconds) — when the post was created. |
posted_at | string | ISO-8601 UTC timestamp derived from created_utc. |
scraped_at | string | When this row was recorded by the Actor. |
Example output
{"id": "t3_1ab2c3d","post_id": "1ab2c3d","subreddit": "programming","title": "An honest critique of the new Rust runtime","author": "u_rustacean","url": "https://example.com/blog/rust-runtime","permalink": "https://reddit.com/r/programming/comments/1ab2c3d/an_honest_critique","score": 1283,"upvote_ratio": 0.94,"num_comments": 312,"over_18": false,"created_utc": 1747353600,"posted_at": "2026-05-15T20:00:00+00:00","scraped_at": "2026-05-15T20:05:12+00:00"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.001 | Per dataset item written |
Example: 1 000 results at the rates above ≈ $1.00. No subscription, no minimum, no card required to start — Apify gives every new account $5 of free credit.
🚧 Limitations
- Reddit's JSON endpoint caps any listing at 1 000 items, including pagination. For deeper post history, the Pushshift archive is the standard alternative (separate Actor — request one if needed).
- This Actor scrapes public subreddits only — private or quarantined subreddits are not accessible without credentials, which we do not support.
- Comments are a separate Actor — Reddit threads fan out arbitrarily deep. See the
reddit-post-comments-scrapersibling for full comment trees. - Real-time firehose coverage is not possible with public listing endpoints. For near-real-time monitoring, schedule the Actor at 5-15 minute intervals.
❓ FAQ
Is this against Reddit's TOS?
Public read-only access via the .json listing endpoints has been the community standard for over a decade. We do not impersonate users, vote, comment, or access any private data. For commercial-scale crawling, Reddit prefers their Data API — sign up at reddit.com/dev/api. We recommend reviewing Reddit's User Agreement for your specific use case.
Is this a reddit api alternative for 2026?
Yes — this Actor is frequently used as a reddit api alternative when teams need structured subreddit data without managing Reddit OAuth, handling rate-limit responses, or dealing with the OAuth tier's review-board overhead. You get the same public post data with none of the credential plumbing.
How do I scrape Reddit without an API key?
To scrape Reddit without API key overhead, use this Actor. You need no Reddit developer account or OAuth credentials — just your Apify account (free tier included). The Actor accesses the same public listing data any logged-out browser can read, with fingerprint rotation and residential proxy rotation applied so you don't hit blocks on the first run.
Why am I seeing incomplete results?
Reddit actively rate-limits high-frequency requests. We apply exponential backoff and rotate proxy sessions on every block, but very aggressive runs (large maxResults across many subreddits in quick succession) can still hit partial results. Lower the result cap per run and schedule several smaller runs for more reliable coverage.
How do I get the top reddit scraper results for a specific time range?
Set mode to top and use the timeFilter input to choose hour, day, week, month, year, or all. This maps directly to Reddit's own top-posts ranking for the chosen window.
Do you scrape comments?
Not in this Actor — comment trees fan out arbitrarily and require different pagination logic. See the reddit-post-comments-scraper sibling Actor for full thread extraction.
Why does the URL match the permalink for some posts?
For self-posts (text posts), Reddit's own url field points back to the post itself rather than an outbound link. This is Reddit's data, not a scraping artefact.
Can I export as CSV or Excel?
Yes. Once the run completes, go to Storage → Dataset in the Apify Console and use the Export button to download JSON, CSV, JSONL, XML, or Excel. You can also fetch the dataset programmatically via the Apify API.
💬 Your feedback
Spotted a bug, hit an unusual edge case, or need an extra field? Open an issue on the Actor's Issues tab in the Apify Console — we ship fixes weekly and read every report.