Reddit Historical Archive Scraper
Pricing
from $1.50 / 1,000 results
Reddit Historical Archive Scraper
Access 10+ years of archived Reddit posts and comments via PullPush. Full-text comment search (Reddit can't do this). No login, no proxy. $0.001/item.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
🎯 Reddit All-in-One Scraper — Posts, Comments, Users, Subreddits, Search
The most complete Reddit scraper on Apify. One actor, six modes, zero authentication. Scrape subreddit listings, full post comment trees (including "load more" expansion), user profiles with their full submission and comment history, global search results, and subreddit discovery — all in a single run, all via Reddit's public .json endpoints. $0.001 per item — the cheapest Reddit scraper available, half the price of the next-cheapest competitor. Free 50 items per run.
This is the only Reddit scraper you need. No PRAW. No OAuth tokens. No API keys. No login. Just paste subreddits, URLs, usernames, or search queries — get clean structured JSON back. Pure HTTP, no browser overhead, no cookies, no risk of rate-limit bans on your account (because there's no account involved).
🚀 Why this Reddit scraper beats every alternative
| Feature | This actor | Most Apify Reddit scrapers |
|---|---|---|
| Price per 1,000 items | $1.00 | $2.00 – $5.00 |
| Login required | ❌ No | ⚠️ Some need OAuth |
| Subreddit listings (hot/new/top/rising/controversial/best) | ✅ Yes | ✅ Most |
| Full post + complete comment tree | ✅ Yes (with recursive "load more" expansion) | ⚠️ Often shallow |
| "More comments" expansion (Reddit's hidden comments) | ✅ Yes (automatic) | ❌ Usually missing |
| User profile + posts + comments | ✅ Yes (all in one mode) | ⚠️ Often separate actors |
| Global search (posts) | ✅ Yes | ⚠️ Some |
| Subreddit discovery search | ✅ Yes | ❌ Rare |
| Auto-detect any Reddit URL type | ✅ Yes | ❌ No |
| Mix multiple input modes in ONE run | ✅ Yes (subreddits + URLs + users + searches together) | ❌ One mode per run |
| Image / video / gallery extraction | ✅ Yes (preview URLs + dimensions + gallery items) | ⚠️ Partial |
| Comment depth tracking | ✅ Yes | ❌ Rare |
| NSFW / spoiler / locked / stickied flags | ✅ Yes | ⚠️ Partial |
| Custom request delay | ✅ Yes (default 1500ms) | ⚠️ Often hardcoded |
| Auto-retry with exponential backoff on 429 | ✅ Yes | ⚠️ Partial |
| Pure HTTP (256 MB memory) | ✅ Yes | ⚠️ Some use Playwright |
| Free items per run | 50 | 0 – 25 |
Used by: Brand monitoring teams, market researchers, AI training data engineers, journalists, academic researchers, social listening platforms, sentiment analysis tools, B2B lead generation, content marketing agencies, crypto/finance signal extraction, recruitment scouts, customer support intelligence, product feedback aggregators, niche community analysts.
💎 What makes this scraper different — 6 unique angles
1. All six modes in ONE actor — not six separate purchases
Most Reddit "scrapers" on Apify only do one thing. Subreddit-only. User-only. Search-only. Comments-only. To get a complete Reddit dataset you'd run 4-6 different actors, pay 4-6 separate fees, and stitch the data together yourself. This one does everything. Subreddits, posts (with comments), users, search, subreddit discovery, and any-URL auto-detection — all in a single run, single price, unified output schema. The dataset comes back tagged with type: "post", type: "comment", type: "user", or type: "subreddit" so you can filter or split it however you want.
2. Full comment tree expansion — including hidden "load more" branches
Reddit's API returns comment trees in chunks. After the first few replies, you get a kind: "more" placeholder — a list of comment IDs that weren't included in the response. Most scrapers ignore these. This one automatically calls Reddit's /api/morechildren.json endpoint to fetch them, recursively, until the entire tree is captured. On a popular post with 500+ comments, this is often the difference between seeing 100 comments and seeing 800. Configurable via expandComments and maxCommentDepth.
3. Mix-and-match input — one run, many sources
You can pass all of these in the same input:
- 10 subreddits
- 5 specific post URLs
- 3 usernames you want to profile
- 4 search queries
- And let the scraper auto-detect 7 random Reddit URLs you pasted from your browser
Everything runs in sequence with a unified budget. Want all comments on a specific viral post + a profile of its author + the top 100 posts in that subreddit + 50 results from a related search query? One run. Other scrapers force you to set up four separate jobs.
4. The cheapest Reddit scraper on Apify — by 50%+
- This actor: $0.001 per item ($1 per 1,000)
- Most competitors: $0.002–$0.005 per item ($2–$5 per 1,000)
- Subscription-model scrapers: $20–$45/month flat (only break even if you scrape 20K+ items)
If you scrape 100,000 Reddit items per month, you save $100–$400/month vs. alternatives. If you scrape less than 5,000/month, you avoid subscription lock-in entirely.
5. Built for AI training and RAG pipelines
Output is clean structured JSON with consistent field names across all record types. selftext, body, title are flat strings (no HTML escaping headaches). createdUtc is ISO 8601, not Unix epoch. Comments include depth for tree reconstruction. Posts include media URLs, gallery items, and preview images for multimodal training. Drop the dataset directly into LangChain, LlamaIndex, Pinecone, Weaviate, or any vector DB. Used by teams training instruction-following models on Reddit Q&A data and sentiment-analysis models on long-form discussion.
6. Zero authentication = zero ban risk
This scraper accesses Reddit's PUBLIC .json endpoints — the same data anyone can see by appending .json to any Reddit URL in their browser. No OAuth tokens, no Reddit account, no PRAW credentials. You can't get your Reddit account rate-limited or banned because no account is involved. Compare this to OAuth-based scrapers where a heavy run can flag your developer app for review or your account for temporary suspension.
💡 What you can do with this data
1. Brand & product mention monitoring
Set up search queries for your brand name, your competitors, and your product category. Schedule daily runs and pipe results into Slack, Notion, or your CRM. Catch a viral negative thread within hours of it going up. Used by DTC brands, SaaS companies, and PR teams.
2. Customer support and bug intelligence
Scrape your product's subreddit (or related discussion subreddits) on the new sort weekly. Filter for posts with low score + high numComments — these are usually unresolved complaints or bugs. Pipe into your support ticketing system as proactive leads.
3. AI training corpus assembly
Build a high-quality conversational corpus by scraping subreddits like r/AskHistorians, r/explainlikeimfive, r/AskScience — each containing curated Q&A pairs. Use includeComments: true to capture full discussion trees. Hundreds of thousands of high-quality training examples for instruction-following models.
4. Influencer and thought leader discovery
Use searchType: "user" with topic keywords to find the most-active Reddit voices in your niche. Filter by commentKarma > 50000 and recent activity. These are your potential AMA guests, podcast guests, or community partnerships.
5. Subreddit discovery for niche targeting
Use subredditSearch mode with industry keywords to find every relevant community for your product. A SaaS company in the fitness space might discover 50+ relevant subreddits — most far smaller and more engaged than r/fitness — perfect for grassroots community marketing.
6. Sentiment analysis on real consumer language
Scrape Reddit discussions about products, brands, or topics. Pipe into a sentiment analysis model. Get richer, more candid signal than Twitter (Reddit comments are longer and less performative) and broader than survey responses. Used by hedge funds, CPG product teams, and market research firms.
7. Reddit trend monitoring for content marketing
Schedule daily scrapes of top posts in your category subreddits with timeframe: "day". Tomorrow's content trends are today's top Reddit posts. Used by content marketing teams and editorial newsrooms.
8. Crypto and finance signal extraction
Subreddits like r/wallstreetbets, r/CryptoCurrency, r/stocks are leading indicators for retail sentiment. Scrape on a schedule, run NLP for ticker mention frequency, build a sentiment-tracking dashboard. Used by quant funds and finance content sites.
9. Competitive intelligence for SaaS
Scrape your competitors' subreddits or search for their product names. See what features users wish existed, where they're frustrated, what alternatives they mention. Free product research — far more honest than user interviews.
10. Academic and journalistic research
Scrape historical posts from any subreddit. Build longitudinal datasets of community discourse. Track how community language, sentiment, or topic emphasis has shifted over time. Used by linguistics researchers, political scientists, and investigative journalists.
11. Lead generation from active redditors
Find users posting in your category's subreddits with submitted or comments user mode. Their post history reveals their professional role, location, and interests — qualifying signal for B2B outreach (when paired with a careful, consent-respecting outreach approach).
12. Content idea engine
Stuck on what to post? Scrape top posts from your category subreddit, study the titles (every title is a tested headline) and discussions, and reverse-engineer winning content angles. Feed scraped titles into an AI summarizer for emerging themes.
📦 Output fields
Every record has a type field telling you what it is: post, comment, user, or subreddit. Fields are populated as relevant to each type.
Posts (type: "post")
| Field | Description |
|---|---|
id, fullname | Reddit post ID (1abc234) and fullname (t3_1abc234) |
subreddit, subredditId, subredditNamePrefixed | Where the post lives |
author, authorFullname | Poster's username and fullname (t2_...) |
title | Post title |
selftext, selftextHtml | Post body (text posts only) |
url, permalink | External link / direct Reddit permalink |
domain | Domain of the linked URL |
isSelf, isVideo, isGallery | Type flags |
over18, spoiler, locked, stickied, archived | Status flags |
score, upvoteRatio, ups, downs | Engagement metrics |
numComments, numCrossposts | Discussion volume |
gilded, totalAwardsReceived | Award counts |
flairText, flairCss, authorFlairText | Subreddit flair |
thumbnail, thumbnailWidth, thumbnailHeight | Thumbnail (when available) |
preview | Array of preview images with URLs and dimensions |
media | Embedded media (YouTube, video, gif) with oembed and direct URLs |
galleryImages | For multi-image gallery posts — array of all images with captions |
crosspostParent | If this is a crosspost |
createdUtc, edited | ISO timestamps |
distinguished, suggestedSort | Mod/admin flags |
Comments (type: "comment")
| Field | Description |
|---|---|
id, fullname | Comment ID (c3v7f8u) and fullname (t1_c3v7f8u) |
parentId, linkId | Parent comment/post and parent post fullnames |
subreddit, author, authorFullname | Where and who |
body, bodyHtml | Raw text and HTML-formatted version |
score, scoreHidden, ups, downs | Engagement (downvotes are usually 0 due to Reddit's fuzzing) |
gilded, controversial | Award and controversy flags |
depth | How deep in the reply tree (0 = top-level, 1 = direct reply, etc.) |
permalink, createdUtc, edited | URLs and timestamps |
distinguished, isSubmitter, stickied | Mod/OP/sticky flags |
flairText | Author flair text |
Users (type: "user")
| Field | Description |
|---|---|
id, fullname, username | Reddit user ID and username |
linkKarma, commentKarma, totalKarma | Karma breakdown |
awardeeKarma, awarderKarma | Award-related karma |
isMod, isGold, isEmployee | Account flags |
verified, hasVerifiedEmail | Verification status |
createdUtc | Account age |
iconImg | Avatar URL |
subredditTitle, subredditDescription, subredditSubscribers | The user's own profile page subreddit |
subredditNsfw | Whether profile is marked NSFW |
hiddenFromBots | If user opted out of search indexing |
Subreddits (type: "subreddit")
| Field | Description |
|---|---|
id, fullname | Subreddit ID (t5_...) |
displayName, displayNamePrefixed | programming, r/programming |
title, description, publicDescription | Subreddit metadata |
subscribers, activeUserCount | Community size and currently online |
subredditType | public, private, restricted, gold_restricted |
over18, quarantine | Status flags |
url, iconImg, bannerImg, communityIcon, headerImg | Visual assets |
primaryColor, keyColor | Branding |
lang | Primary language |
createdUtc | Subreddit age |
submissionType, submitText | What kind of posts are allowed |
wikiEnabled | Whether wiki is available |
⚙️ Input configuration
Input sources (use any combination — they all run together)
| Field | Description |
|---|---|
subreddits | List of subreddit names (no /r/ prefix) |
postIds | List of post IDs from /comments/XXXX/ URLs |
usernames | List of usernames (no /u/ prefix) |
searchQueries | Free-text search queries (Reddit-wide unless restrictToSubreddit set) |
subredditSearch | Find subreddits matching a keyword (community discovery) |
startUrls | Any Reddit URLs — type auto-detected from URL structure |
Filters
| Field | Default | Options |
|---|---|---|
sort | hot | hot, new, top, rising, controversial, best |
timeframe | all | hour, day, week, month, year, all |
userContent | overview | overview, submitted, comments, about |
searchType | link | link (posts), sr (subreddits), user |
searchSort | relevance | relevance, hot, top, new, comments |
restrictToSubreddit | null | Subreddit name to scope search to one community |
Volume & depth
| Field | Default | Description |
|---|---|---|
maxItems | 1000 | Hard ceiling across ALL targets. 0 = unlimited. |
maxItemsPerTarget | 200 | Cap per subreddit/user/post/search. Prevents one big target from eating the budget. |
expandComments | true | Recursively call Reddit's /api/morechildren to fetch hidden comments. |
maxCommentDepth | 10 | Max nesting depth for comment trees. |
includeComments | false | When scraping subreddit listings, also fetch ALL comments for each post. Dramatically increases data volume. |
Politeness
| Field | Default | Description |
|---|---|---|
requestDelayMs | 1500 | Milliseconds between requests. Lower = faster but risks 429. |
maxRetries | 4 | Retry attempts on 429/5xx with exponential backoff. |
💡 Example inputs
Quick subreddit scrape — top posts of the week
{"subreddits": ["programming"],"sort": "top","timeframe": "week","maxItemsPerTarget": 100}
~100 posts, $0.10, runs in ~3 minutes. The simplest possible run.
Multi-subreddit brand monitoring
{"subreddits": ["programming", "webdev", "javascript", "node", "reactjs"],"sort": "new","searchQueries": ["my-product-name"],"maxItems": 500,"maxItemsPerTarget": 100}
500 items, mixed across 5 subreddit listings AND a global search for your brand name. $0.50.
Full deep-dive on one viral post
{"postIds": ["1abc234"],"expandComments": true,"maxCommentDepth": 10,"maxItemsPerTarget": 5000}
Captures the post plus every comment (including all "load more" expansions). For a 2000-comment viral thread: ~$2.00. Perfect for post-mortem analysis of viral content.
Full user profile and history
{"usernames": ["spez", "kn0thing"],"userContent": "overview","maxItemsPerTarget": 200}
Returns each user's profile + their most recent 200 posts and comments. $0.40 total.
Search Reddit-wide for emerging trends
{"searchQueries": ["AI coding agents", "vibe coding", "AI editor"],"searchSort": "new","timeframe": "month","maxItemsPerTarget": 100}
Captures fresh posts mentioning each query across all of Reddit. Daily-scheduled = real-time emerging trend tracker. $0.30.
Subreddit discovery for niche targeting
{"subredditSearch": ["3d printing", "home brewing", "vintage cameras"],"maxItemsPerTarget": 50}
Finds 50 most-relevant subreddits per topic. Returns subscriber counts and descriptions so you can identify the most valuable niches. $0.15.
Mix-and-match — full Reddit intelligence run
{"subreddits": ["startups", "smallbusiness"],"searchQueries": ["my-product", "competitor-product"],"usernames": ["my-power-user", "competitor-founder"],"startUrls": ["https://www.reddit.com/r/saas/comments/abc123/...","https://www.reddit.com/r/entrepreneurship/top/"],"sort": "new","maxItems": 2000}
2 subreddits + 2 search queries + 2 users + 2 auto-detected URLs in ONE run. $2.00. The flagship use case.
URL-only mode (paste any Reddit URLs)
{"startUrls": ["https://www.reddit.com/r/programming/","https://www.reddit.com/r/webdev/top/?t=week","https://www.reddit.com/r/javascript/comments/abc123/some_post_title/","https://www.reddit.com/user/spez/","https://www.reddit.com/search?q=startup"]}
The lazy mode. Paste any 5 Reddit URLs from your browser. Each one is auto-detected and scraped accordingly. $0–$2 depending on what's behind the URLs.
📊 Output sample (post)
{"type": "post","id": "1abc234","fullname": "t3_1abc234","subreddit": "programming","subredditId": "t5_2fwo","subredditNamePrefixed": "r/programming","author": "example_user","authorFullname": "t2_abcdef","title": "Why X always beats Y for production workloads","selftext": "After 5 years running this in production, I've learned...","selftextHtml": "<div class=\"md\">...</div>","url": "https://example.com/article","permalink": "https://www.reddit.com/r/programming/comments/1abc234/why_x_always_beats_y/","domain": "example.com","isSelf": false,"isVideo": false,"isGallery": false,"over18": false,"score": 4823,"upvoteRatio": 0.94,"numComments": 412,"gilded": 3,"flairText": "Discussion","thumbnail": "https://b.thumbs.redditmedia.com/...","preview": [{ "url": "https://preview.redd.it/...", "width": 1200, "height": 630, "variants": ["gif", "mp4"] }],"createdUtc": "2026-05-05T14:32:11.000Z"}
📊 Output sample (comment)
{"type": "comment","id": "c3v7f8u","fullname": "t1_c3v7f8u","parentId": "t3_1abc234","linkId": "t3_1abc234","subreddit": "programming","author": "thoughtful_reply_user","body": "Great points, but I'd push back on the third one because...","score": 234,"depth": 0,"isSubmitter": false,"permalink": "https://www.reddit.com/r/programming/comments/1abc234/.../c3v7f8u/","createdUtc": "2026-05-05T15:01:44.000Z"}
💰 Pricing
Pay-per-event model. You pay only for items actually saved.
| Volume | Estimated cost |
|---|---|
| 50 items | FREE (every run) |
| 100 items | $0.05 |
| 500 items | $0.45 |
| 1,000 items | $0.95 |
| 5,000 items | $4.95 |
| 10,000 items | $9.95 |
| 50,000 items | $49.95 |
| 100,000 items | $99.95 |
| Subscription tier | Effective price per 1,000 items |
|---|---|
| Free / Starter | $1.00 |
| Bronze | $0.90 |
| Silver | $0.80 |
| Gold | $0.65 |
Cost comparison vs other Apify Reddit scrapers
| Scraper | Price / 1,000 items | Monthly minimum |
|---|---|---|
| This actor | $1.00 | $0 (pay-as-you-go) |
| Fast Reddit Scraper (practicaltools) | $2.00 | $0 |
| Reddit API Scraper (comchat) | $5.00 | $0 |
| Reddit Scraper Pro (harshmaur) | "Unlimited" | $20/month flat |
| Reddit Scraper (trudax) | + Apify usage | $45/month + usage |
| Reddit Scraper Plus (ctrlaltwin) | + usage | $30/month + usage |
At 100K items/month: this actor = $100, competitors = $200-$500+. At 1M items/month: this actor = $1,000, competitors = $2,000-$5,000+.
⚡ Performance
- Pure HTTP, no browser —
.jsonendpoints return clean JSON, 10× faster than Playwright-based scrapers. Runs in 256 MB memory. - No login or OAuth — public
.jsonendpoints, no Reddit account ever touched. - No proxy required for most workloads — Apify Datacenter proxy is sufficient. Add Residential for very large daily volumes.
- 100 items per page for listings (Reddit's max).
- Throughput: ~40 items per minute at default 1500ms delay. Lower the delay (e.g., 800ms) for ~75 items/minute if you're willing to risk occasional 429s.
- Auto-retry: 4 retries with exponential backoff on 429/503 — recovers cleanly from temporary throttling.
- Auto-deduplication within a single comment tree by fullname.
- Stable selectors — Reddit's
.jsonendpoints have been stable for 10+ years. No DOM fragility.
🔗 Integrations
Export as JSON, CSV, Excel, or XML. Connect via:
- Zapier / Make / n8n — auto-add new Reddit mentions to Slack/CRM
- Google Sheets — live brand-monitoring dashboard, refreshed hourly via Apify Schedules
- Slack / Discord — daily/hourly digest of new mentions
- REST API — programmatic access from Python, Node.js, any language
- Airtable / Notion — visual content swipe file or community CRM
- LangChain / LlamaIndex — RAG pipelines on Reddit Q&A and discussion data
- HubSpot / Salesforce — enrich lead records with their Reddit activity
- Apollo / Outreach / SalesLoft — feed active redditors into B2B sequencer
- BigQuery / Snowflake / PostgreSQL — data warehouse for sentiment analytics
- Pinecone / Weaviate / Chroma — vector DBs for semantic search
- Webhooks — push every new item to your backend in real time
- MCP (Model Context Protocol) — usable by Claude, ChatGPT, and other AI assistants for natural-language Reddit research
🆚 Reddit All-in-One Scraper vs alternatives
vs Reddit's official API (OAuth)
Reddit's official API requires OAuth app registration, scopes, refresh tokens, and 100 requests/minute per app. It locks you to a specific Reddit account that can be rate-limited or banned. This scraper uses the public .json endpoints — same data, no account, no OAuth, no app review. Faster to set up, safer to operate at scale.
vs PRAW (Python Reddit API Wrapper)
PRAW is a great library if you're building a Reddit bot or moderator tool — but it requires OAuth credentials and shares your account's rate-limit budget. For data extraction, this scraper is the right tool: no Python install, no OAuth, no account, full structured JSON output.
vs Pushshift / Reddit data dumps
Pushshift used to be the standard for Reddit historical research, but its API was effectively shut down after Reddit's 2023 policy changes. Public Reddit data dumps exist but require terabytes of storage and complex querying. This scraper gives you targeted, real-time access to exactly the slice of Reddit you need.
vs Brand24 / Mention / Brandwatch
Social listening platforms cost $99–$5,000/month and aggregate Reddit alongside Twitter, news, etc. Their Reddit coverage is often shallow (titles only, not full comment trees). This scraper at $1/1000 items delivers deeper Reddit-specific intelligence at 1–10% of the cost.
vs other Apify Reddit scrapers
There are 10+ Reddit scrapers on Apify, but most have at least one major limitation: single-mode (only subreddits OR only comments), shallow comment extraction (skipping "load more"), or 2–5× the price. This is the only Reddit scraper that combines all six modes, deep comment expansion, mix-and-match input, and lowest-on-store pricing.
❓ Frequently asked questions
Does this require a Reddit account or login?
No. This uses Reddit's public .json endpoints — the same data anyone can see by appending .json to any Reddit URL in their browser (try it: reddit.com/r/programming.json). No account, no OAuth, no API keys, no credentials.
Is this legal?
This scraper accesses only publicly available data. Public Reddit posts and comments are explicitly accessible without authentication and have been programmatically accessible via .json endpoints for over a decade. You are responsible for complying with Reddit's User Agreement and applicable privacy laws (GDPR, CCPA) when processing the scraped data. Do not scrape private subreddits, user direct messages, or any content requiring authentication.
How does the comment tree expansion work?
When you scrape a post, Reddit returns the comment tree in chunks. Deep replies and very long threads include kind: "more" placeholders — lists of comment IDs that weren't included in the response (Reddit truncates large threads). With expandComments: true (default), this scraper automatically calls Reddit's /api/morechildren.json to fetch those hidden comments, recursively, until the tree is complete. Without this, you'd often only see 20-30% of comments on heavily-discussed posts.
What's the difference between subreddits and subredditSearch modes?
subreddits= scrape POSTS from specific subreddits you already know (r/programming)subredditSearch= DISCOVER subreddits matching a keyword (find every subreddit about "3D printing")
The first is about content; the second is about community discovery.
Can I scrape private or quarantined subreddits?
No. Private subreddits require authentication and are out of scope. Quarantined subreddits may sometimes work depending on Reddit's current policy, but are not officially supported. NSFW subreddits work but may require setting over18 consent in some regions — flagged in output as over18: true.
How do I know what post IDs to use?
A post URL like reddit.com/r/programming/comments/1abc234/some_post_title/ has the post ID 1abc234. Pass that to postIds. Or just paste the full URL into startUrls — the scraper auto-extracts it.
How long does it take?
Reddit's anonymous rate limit is roughly 60 requests/minute. At default 1500ms delay, you get ~40 items/min for listings. A 1000-item run takes ~25 minutes. Comment-heavy runs are slower due to per-post detail calls; user runs are fast. You can lower requestDelayMs to 800-1000ms to roughly double throughput at higher 429-risk.
What if I hit rate limits?
The scraper auto-retries 429/503 responses with exponential backoff up to maxRetries (default 4). If retries are exhausted, the run aborts gracefully and logs the failure. To avoid: use higher requestDelayMs, smaller maxItemsPerTarget, or enable Apify Residential proxy for IP rotation on very large runs.
Why is downs always 0?
Reddit deliberately fuzzes upvote and downvote totals to disrupt bots. The score field (ups - downs) is accurate; the individual counts are obfuscated. Use score and upvoteRatio (the % of upvotes among voters) for real engagement signal.
Can I get a user's email or phone number?
No. Reddit never exposes these publicly. The scraper returns only publicly displayed user data: username, karma, account age, avatar, profile description, and submitted/comment history. For enrichment beyond that, pipe username into Apollo, Clearbit, or another B2B enrichment service in a separate step.
Can I monitor Reddit continuously?
Yes. Use Apify Schedules to run this actor every N minutes/hours/days. Combine with webhooks to push every new item to Slack, Discord, or your backend in real time. The classic setup: searchQueries: ["my-brand"], searchSort: "new", run every 15 minutes, webhook to Slack on completion.
Are deleted or removed posts/comments included?
- Deleted by user:
authorbecomes"[deleted]",body/selftextbecomes"[deleted]". The record is still returned. - Removed by mods:
body/selftextbecomes"[removed]"but other fields stay. Returned. - Removed by Reddit (admin/AEO): usually invisible to the API entirely. Not returned.
Can I scrape Reddit search results sorted by date?
Yes. Set searchSort: "new" and timeframe: "day"/"week"/etc. This is the recommended setup for real-time brand monitoring.
What's the difference between score and upvoteRatio?
score= net votes (ups - downs). E.g., 200 = net 200 upvotes.upvoteRatio= % of voters who upvoted (0.0–1.0). E.g., 0.85 = 85% upvoted.
A post can have score=200 with upvoteRatio=0.95 (broad consensus, 200 net upvotes, almost all up) or score=200 with upvoteRatio=0.55 (controversial — 1000 upvotes and 800 downvotes). Use both fields for nuance.
Can I integrate with Make / Zapier / n8n?
Yes. All three platforms have native Apify integrations. Common automations: new brand mentions → Slack, viral posts in your category → Airtable, new posts by your target users → email digest.
Is the output AI-ready / RAG-friendly?
Yes. Clean structured JSON, consistent field types, ISO 8601 timestamps. Each post/comment is a self-contained document. Common embedding strategy: use title + selftext for posts, body for comments. Filter by subreddit, type, createdUtc, score for retrieval. Used in production by AI training pipelines and RAG-based research assistants.
What's the rate of completeness for fields?
id,fullname,type,subreddit,createdUtc: 100%author,body/title,score,permalink: ~99% (occasional deleted records)selftextfor self-posts: 100% if not deletedpreview/thumbnail/media: depends on post type — ~30-50% of posts have onegalleryImages: only for gallery posts (~5% of posts)flairText,authorFlairText: varies by subreddit (~30-60%)gilded,awards: depends on post popularity
What payment methods does Apify support?
Credit card, invoicing for enterprise, and platform credits. New users get $5 free credits monthly — enough to scrape ~5,000 Reddit items for free. No credit card required to start.
⚖️ Legal & Compliance
This scraper accesses only publicly available data from Reddit's public .json endpoints — the same data Reddit exposes to anonymous browser users. No private subreddits, no authentication, no internal API endpoints.
You are responsible for ensuring your specific use of the scraped data complies with:
- Reddit's User Agreement and Content Policy
- GDPR (EU/UK) — lawful basis for processing personal data of EU/UK individuals
- CCPA (California) — consumer rights for CA residents
- Local data protection laws in any jurisdiction
- Anti-spam laws — CAN-SPAM (US), CASL (Canada), GDPR consent (EU) for any outreach use
Best-practice guidelines for Reddit data:
- Treat usernames as PII; anonymize before redistribution
- Do not republish full post/comment text without attribution
- Do not use scraped contact info (from user bios) for unsolicited outreach
- Respect users who set
hiddenFromBots: true— exclude them from your downstream processing where feasible
This scraper is a general-purpose tool. The actor author and Apify provide no warranty regarding the legality of any specific use case. When in doubt, consult legal counsel.
Not affiliated with Reddit, Inc. Reddit® is a registered trademark of Reddit, Inc. All trademarks belong to their respective owners.
🛠️ Technical details
- Endpoints used:
GET /r/{subreddit}/{sort}.json— listingsGET /r/{subreddit}/comments/{id}.json— post + comment treeGET /user/{name}/(about|submitted|comments|overview).json— user dataGET /search.json— global post/subreddit/user searchGET /subreddits/search.json— subreddit discoveryGET /api/morechildren.json— comment expansion
- Method: GET, JSON responses, anonymous (no auth)
- Headers: Required
User-Agentheader (Reddit returns 429 without it). This scraper sends a distinctive UA. - Pagination: Cursor-based via
afterparameter (max 100 items/page) - Comment tree expansion: Recursive via
/api/morechildren.jsonwith depth cap - Rate limiting: Configurable inter-request delay (default 1500ms) + exponential backoff retry
- Concurrency: Sequential by design — Reddit's per-IP limit makes parallel requests risky
- Memory: Runs comfortably in 256 MB
- Tech stack: Apify SDK v3, Crawlee v3, Node.js 20+, native fetch
🚦 Getting started in 30 seconds
- Click "Try for free" on this actor's page
- Paste a subreddit name into
subreddits(e.g.,programming) or paste any Reddit URL intostartUrls - Click "Start"
- Wait ~60 seconds for the first 50-100 items
- Download as JSON / CSV / Excel from the Storage tab
No credit card required. First 50 items per run are always free. Paid usage starts after that, billed monthly via Apify.
💬 Support
- Issues / feature requests: Open a ticket in the Issues tab on this actor's page
- Custom scraping needs: Contact the actor author for tailored solutions
- General Apify support: help.apify.com
🔍 Search keywords
Reddit scraper, Reddit API scraper, Reddit JSON scraper, scrape Reddit, Reddit data extractor, Reddit post scraper, Reddit comment scraper, Reddit user scraper, Reddit subreddit scraper, Reddit search scraper, Reddit community scraper, Reddit no login scraper, Reddit anonymous scraper, Reddit without API, Reddit without OAuth, Reddit without PRAW, Reddit pure HTTP scraper, cheapest Reddit scraper, Reddit data dump, Reddit bulk download, Reddit JSON export, Reddit CSV export, Reddit data for AI, Reddit AI training data, Reddit RAG dataset, Reddit LLM corpus, Reddit sentiment analysis, Reddit brand monitoring, Reddit mention tracker, Reddit search monitoring, Reddit trend tracking, Reddit competitor monitoring, Reddit market research, Reddit social listening, Reddit lead generation, Reddit influencer discovery, Reddit power user finder, Reddit niche subreddit finder, Reddit community discovery, Reddit subreddit search, Reddit post search, Reddit user profile scraper, Reddit karma scraper, Reddit comment tree scraper, Reddit nested comment scraper, Reddit full thread scraper, Reddit hot posts scraper, Reddit new posts scraper, Reddit top posts scraper, Reddit rising posts scraper, Reddit controversial posts scraper, Reddit gallery scraper, Reddit image scraper, Reddit video scraper, Reddit NSFW scraper, Pushshift alternative, PRAW alternative, Reddit Wayback alternative, r/wallstreetbets scraper, r/CryptoCurrency scraper, r/AskReddit scraper, Brand24 alternative, Mention alternative, Brandwatch alternative, Reddit Scraper Pro alternative, Reddit Scraper Plus alternative, trudax Reddit scraper alternative, harshmaur Reddit scraper alternative, practicaltools Reddit scraper alternative.
Ready to scrape Reddit at half the price of any competitor? Hit "Try for free" above. First 50 items on us. No credit card. No login. No risk.