Reddit Subreddit Scraper
Pricing
Pay per event
Reddit Subreddit Scraper
Scrape posts from any subreddit — hot, new, top, rising, or controversial. We rotate Firefox / Safari TLS fingerprints, route through residential proxies, retry with backoff on Reddit's 429s, and return typed dataset rows ready to ship.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🎯 What this scrapes
Reddit blocks scrapers aggressively — 429s, fingerprint checks, IP throttling. This Actor handles all of that for you. We rotate Firefox + Safari TLS fingerprints (Chrome impersonation gets 403'd from datacenter IPs), rotate residential proxy sessions on every block, retry with exponential backoff, and walk the subreddits you pick to write one typed row per post. Self-text is preserved as Markdown so downstream pipelines don't have to re-parse anything.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Community monitoring — diff
r/<your-product>daily and alert on hot threads. - Content discovery — pull the top 100 posts from
r/MachineLearningweekly to feed a research digest. - Brand mention scanning — scrape
r/allfiltered by a flair or keyword. - Sentiment baselines — store posts + comments counts to build a sentiment time-series for a topic.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
subreddits | array | yes | ['programming'] | List of subreddit names (without the r/ prefix). One per line. Multiple subreddits are scraped in sequence. |
mode | string | no | 'hot' | Which Reddit listing to read. |
timeFilter | string | no | 'day' | Applies only to top and controversial modes. |
maxResults | integer | no | 100 | Reddit caps a listing at 1000 items. 0 means scrape until exhausted (still capped at 1000). |
includeSelftext | boolean | no | True | When true, the post body (Markdown) is included for text posts. |
proxyConfiguration | object | no | {'useApifyProxy': True} | Reddit can soft-rate-limit hot IPs. Apify Proxy with sticky sessions smooths this out. |
Example input
{"subreddits": ["programming"],"mode": "hot","maxResults": 3,"includeSelftext": false,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
id | string | Reddit post fullname (e.g. t3_abc123). |
post_id | string | Short id (the abc123 part). |
subreddit | string | Subreddit name (without r/). |
title | string | Post title. |
author | string | Author username, or [deleted]. |
url | string | Outbound URL (or the Reddit permalink for self-posts). |
permalink | string | Canonical Reddit permalink (https://reddit.com/r/.../comments/...). |
selftext | ['string', 'null'] | Self-post body (Markdown). Null when includeSelftext=false or post is a link post. |
score | integer | Upvotes minus downvotes. |
upvote_ratio | number | Approximate ratio (0.0 – 1.0). |
num_comments | integer | Comment count. |
over_18 | boolean | NSFW flag. |
spoiler | boolean | Marked as spoiler. |
stickied | boolean | Pinned by mods. |
locked | boolean | Comments locked. |
post_hint | ['string', 'null'] | Reddit's content classification (image, link, video, self, etc.). |
flair | ['string', 'null'] | Post flair text. |
created_utc | integer | Unix timestamp (seconds) — when the post was created. |
posted_at | string | ISO-8601 UTC timestamp derived from created_utc. |
scraped_at | string | When this row was recorded. |
Example output
{"id": "t3_1ab2c3d","post_id": "1ab2c3d","subreddit": "programming","title": "An honest critique of the new Rust runtime","author": "u_rustacean","url": "https://example.com/blog/rust-runtime","permalink": "https://reddit.com/r/programming/comments/1ab2c3d/an_honest_critique","score": 1283,"upvote_ratio": 0.94,"num_comments": 312,"over_18": false,"created_utc": 1747353600,"posted_at": "2026-05-15T20:00:00+00:00"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.001 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
The Reddit JSON endpoint caps any listing at 1000 items, including pagination. For deeper history, use the Pushshift API (separate Actor — request one).
❓ FAQ
Is this against Reddit's TOS?
Public read-only access via .json URLs has been the standard for over a decade. We do not impersonate users, vote, or comment. For commercial-scale crawling Reddit prefers their Data API — sign up at reddit.com/dev/api.
Why am I getting 429s?
Reddit aggressively rate-limits scrapers. Lower maxResults per run, or schedule multiple smaller runs.
Do you scrape comments?
Not in this Actor — comments fan out arbitrarily. See the reddit-post-comments-scraper sibling.
Why is the URL the same as the permalink?
For self-posts, Reddit's url field points to the post itself, not an outbound link.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.