Reddit Scraper avatar

Reddit Scraper

Pricing

from $2.50 / 1,000 posts

Go to Apify Store
Reddit Scraper

Reddit Scraper

Scrape Reddit posts, threads, and comments from any subreddit, search, or user — clean structured JSON, fast.

Pricing

from $2.50 / 1,000 posts

Rating

0.0

(0)

Developer

Always Prime

Always Prime

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🚀 Reddit Scraper — every post, comment & thread, as clean JSON

Apify Python Output: JSON · CSV · Excel

Pull structured Reddit data at speed — posts, comments, scores, flairs, awards, media, timestamps. No login. No code. No babysitting.

🏠 Subreddits · 🔍 Keyword search · 👤 User submissions/comments · 🔗 Custom URLs — all four sources, one input form.


⚡️ Why this scraper

  • 🎯 50+ fields per post — full title and body, score breakdown, upvote ratio, flair, awards, removal status, media URLs, edit timestamps. Nothing dropped on the floor.
  • 💬 Comment threads on demand — flip one switch and get the full comment tree per post, threaded via parent_id and depth.
  • 🚄 Fast — ~3 posts/second steady-state on default settings; ~250ms median per detail fetch.
  • 🧠 Smart pagination — stops the moment your Max items budget is reached. Never over-fetches, never wastes Apify Compute Units.
  • 🔁 Incremental mode — pass a since timestamp and only get posts newer than your last run. Perfect for daily monitoring jobs.
  • 🛡️ Built-in failure budget — if Reddit starts pushing back (challenges, hard 4xx), the actor aborts cleanly instead of burning through your CU on a broken extractor.
  • 📊 Three export formats out of the box — JSON, CSV, Excel. Direct download links from the run page.

🚀 Quick start

  1. Click Try for free (top-right). No code, no API key.
  2. Pick a search type — Subreddit, Search, User, or paste your own URLs.
  3. Hit Start and let it run.
  4. Download as JSON / CSV / Excel from the run page.

📥 Input

FieldTypeDescription
What to scrape (searchType)enumsubreddit · search · user · urls
Subreddits (subreddits)string liste.g. python, programming (no r/ prefix)
Search query (query)stringKeywords. Reddit operators work: author:, subreddit:, self:yes, flair:.
Users (users)string listUsernames to scrape (no u/ prefix)
User content type (userContent)enumsubmitted (posts) or comments
Sort by (sortBy)enumhot · new · top · rising · controversial · relevance · comments
Time window (time)enumhour · day · week · month · year · all (only matters for top/controversial)
Max items (maxItems)intStop after N posts. 0 = unlimited. Default 50.
Scrape comments (scrapeComments)boolFetch the comment tree for every post. Default off (cheaper for indexing).
Max comments per post (commentDepth)intCap on comments per post (BFS). Default 200.
Only posts newer than (since)datetimeISO 8601 cutoff for incremental runs.
Concurrency (concurrency)intParallel fetches. Default 5, max 25.
Start URLs (startUrls)string listAdvanced override — paste any reddit URLs and ignore the search-type builder.

📦 Sample output

One record per post — flat, JSON-friendly, ready to load into BigQuery / Postgres / pandas.

{
"id": "1t3x7ba",
"fullname": "t3_1t3x7ba",
"url": "https://www.reddit.com/r/Python/comments/1t3x7ba/whos_going_to_pycon_us_next_week/",
"subreddit": "Python",
"subreddit_prefixed": "r/Python",
"subreddit_id": "t5_2qh0y",
"title": "Who's going to PyCon US next week?",
"selftext": "Me ✋ I hope to see a good number of you all in Long Beach, too! ...",
"is_self": true,
"domain": "self.Python",
"post_hint": "self",
"link_url": null,
"author": "Loren-PSF",
"author_fullname": "t2_so0s40st",
"author_flair_text": ":pythonLogo: Python Software Foundation Staff",
"distinguished": null,
"score": 46,
"ups": 46,
"upvote_ratio": 0.91,
"num_comments": 35,
"num_crossposts": 0,
"total_awards_received": 0,
"gilded": 0,
"over_18": false,
"spoiler": false,
"locked": false,
"stickied": true,
"archived": false,
"is_video": false,
"is_original_content": false,
"link_flair_text": "Discussion",
"link_flair_css_class": "discussion",
"link_flair_background_color": "#f50057",
"thumbnail": null,
"preview_image_url": "https://external-preview.redd.it/FBtD3iI-OdRHdmfJbVushiwzLeMcmgTx-Ff3FnwUUg0.jpeg",
"video_url": null,
"removed_by_category": null,
"removal_reason": null,
"created_at": "2026-05-04T22:40:29+00:00",
"edited_at": null,
"scraped_at": "2026-05-09T13:43:47+00:00",
"comments": [
{
"id": "myz2pn1",
"parent_id": "t3_1t3x7ba",
"depth": 0,
"author": "vintagegeek",
"body": "I'll be there with bells on. Looking forward to meeting people!",
"score": 19,
"is_submitter": false,
"stickied": false,
"permalink": "https://www.reddit.com/r/Python/comments/1t3x7ba/.../myz2pn1/",
"created_at": "2026-05-04T23:01:14+00:00",
"edited_at": null
}
],
"comments_count_scraped": 35
}

💡 Use cases

WhoWhat for
📈 Market researchersTrack sentiment, competitor mentions and product feedback across niche subreddits.
🤖 AI / ML teamsBuild training corpora from focused subreddits — clean text, threading preserved.
📰 Journalists & analystsMonitor breaking-story subreddits and surface trending discussions for coverage.
💼 Brand / community managersFind unanswered support questions about your product across Reddit, on a daily cron.
🏷️ Recruiters & talent intelPull discussions in tech-job subreddits to track skill demand and salary chatter.
🧑‍🔬 Academic researchersPublic-discourse datasets for sociolinguistics, network analysis, opinion mining.

🧰 Tips & tricks

  • 🪶 Index-first, hydrate later. Run with scrapeComments: false and maxItems: 0 to cheaply enumerate everything. Then a second run with startUrls and scrapeComments: true only on the posts you care about.
  • ⏱️ Daily diffs. Save the timestamp of your last successful run, then pass it as since next time. The actor short-circuits old posts before fetching them.
  • 🎛️ Subreddit-scoped search. Set searchType: search, fill query, and add subreddits to subreddits — the actor automatically scopes search to those subreddits.
  • 🔗 Mix custom URLs. Drop any reddit.com/... URL into startUrls (a thread, a multireddit, a sort variant) — the actor strips/appends .json itself.

❓ FAQ

Does it need a Reddit account? No.

What about the new Reddit API limits? This actor doesn't use Reddit's Data API, so the post-2023 commercial pricing tiers don't apply.

Can I scrape NSFW subreddits? Yes. NSFW posts are returned with over_18: true so you can filter downstream.

Will it get all comments on a huge thread? Up to your commentDepth cap (default 200, max 5000), breadth-first across the tree. For Reddit's truly massive megathreads (>10K comments), Reddit itself paginates and not every comment is reachable in one fetch — that's a Reddit limitation, not the scraper's.

What if a post is deleted while scraping? Deleted posts come through with author: "[deleted]", selftext: "[deleted]", and removed_by_category: "deleted". They're not skipped — you get the metadata Reddit still surfaces.

How fresh is the data? Real-time. Each record carries a scraped_at UTC timestamp.


📅 Changelog

0.1 (initial release)

  • Subreddit, search, user, and start-URL modes
  • Configurable comment-tree scraping with depth cap
  • Incremental since filter, maxItems cap, dedup, failure budget
  • JSON / CSV / Excel exports

This scraper accesses Reddit through public, non-authenticated requests. Reddit's robots.txt disallows automated crawling, and Reddit's User Agreement and Public Content Policy restrict automated/commercial use of Reddit content. By using this scraper you take on responsibility for the legality of your specific use case in your jurisdiction (including GDPR / CCPA where applicable). The scraper does not bypass authentication, paywalls, or technical access controls. Use it for research, journalism, internal analytics, ML/AI training datasets, or other lawful purposes — and confirm that those purposes are compatible with Reddit's policies and any applicable law before running large-scale jobs. Personal data scraped from Reddit (usernames, comment bodies, flair) may constitute PII under GDPR even though usernames are pseudonymous; treat the output dataset accordingly.