Reddit Subreddit Scraper avatar

Reddit Subreddit Scraper

Pricing

Pay per event

Go to Apify Store
Reddit Subreddit Scraper

Reddit Subreddit Scraper

Scrape posts from any subreddit — hot, new, top, rising, or controversial. We rotate Firefox / Safari TLS fingerprints, route through residential proxies, retry with backoff on Reddit's 429s, and return typed dataset rows ready to ship.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share


🎯 What this scrapes

Reddit blocks scrapers aggressively — 429s, fingerprint checks, IP throttling. This Actor handles all of that for you. We rotate Firefox + Safari TLS fingerprints (Chrome impersonation gets 403'd from datacenter IPs), rotate residential proxy sessions on every block, retry with exponential backoff, and walk the subreddits you pick to write one typed row per post. Self-text is preserved as Markdown so downstream pipelines don't have to re-parse anything.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • Community monitoring — diff r/<your-product> daily and alert on hot threads.
  • Content discovery — pull the top 100 posts from r/MachineLearning weekly to feed a research digest.
  • Brand mention scanning — scrape r/all filtered by a flair or keyword.
  • Sentiment baselines — store posts + comments counts to build a sentiment time-series for a topic.

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
subredditsarrayyes['programming']List of subreddit names (without the r/ prefix). One per line. Multiple subreddits are scraped in sequence.
modestringno'hot'Which Reddit listing to read.
timeFilterstringno'day'Applies only to top and controversial modes.
maxResultsintegerno100Reddit caps a listing at 1000 items. 0 means scrape until exhausted (still capped at 1000).
includeSelftextbooleannoTrueWhen true, the post body (Markdown) is included for text posts.
proxyConfigurationobjectno{'useApifyProxy': True}Reddit can soft-rate-limit hot IPs. Apify Proxy with sticky sessions smooths this out.

Example input

{
"subreddits": [
"programming"
],
"mode": "hot",
"maxResults": 3,
"includeSelftext": false,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
idstringReddit post fullname (e.g. t3_abc123).
post_idstringShort id (the abc123 part).
subredditstringSubreddit name (without r/).
titlestringPost title.
authorstringAuthor username, or [deleted].
urlstringOutbound URL (or the Reddit permalink for self-posts).
permalinkstringCanonical Reddit permalink (https://reddit.com/r/.../comments/...).
selftext['string', 'null']Self-post body (Markdown). Null when includeSelftext=false or post is a link post.
scoreintegerUpvotes minus downvotes.
upvote_rationumberApproximate ratio (0.0 – 1.0).
num_commentsintegerComment count.
over_18booleanNSFW flag.
spoilerbooleanMarked as spoiler.
stickiedbooleanPinned by mods.
lockedbooleanComments locked.
post_hint['string', 'null']Reddit's content classification (image, link, video, self, etc.).
flair['string', 'null']Post flair text.
created_utcintegerUnix timestamp (seconds) — when the post was created.
posted_atstringISO-8601 UTC timestamp derived from created_utc.
scraped_atstringWhen this row was recorded.

Example output

{
"id": "t3_1ab2c3d",
"post_id": "1ab2c3d",
"subreddit": "programming",
"title": "An honest critique of the new Rust runtime",
"author": "u_rustacean",
"url": "https://example.com/blog/rust-runtime",
"permalink": "https://reddit.com/r/programming/comments/1ab2c3d/an_honest_critique",
"score": 1283,
"upvote_ratio": 0.94,
"num_comments": 312,
"over_18": false,
"created_utc": 1747353600,
"posted_at": "2026-05-15T20:00:00+00:00"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.001Per dataset item

Example: 1 000 results at the rates above ≈ $1.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

The Reddit JSON endpoint caps any listing at 1000 items, including pagination. For deeper history, use the Pushshift API (separate Actor — request one).

❓ FAQ

Is this against Reddit's TOS?

Public read-only access via .json URLs has been the standard for over a decade. We do not impersonate users, vote, or comment. For commercial-scale crawling Reddit prefers their Data API — sign up at reddit.com/dev/api.

Why am I getting 429s?

Reddit aggressively rate-limits scrapers. Lower maxResults per run, or schedule multiple smaller runs.

Do you scrape comments?

Not in this Actor — comments fan out arbitrarily. See the reddit-post-comments-scraper sibling.

Why is the URL the same as the permalink?

For self-posts, Reddit's url field points to the post itself, not an outbound link.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.