Reddit Scraper
Pricing
from $0.60 / 1,000 posts
Reddit Scraper
Extract posts, comments, user profiles, and search results from Reddit. Pure HTTP, no API key required.
Pricing
from $0.60 / 1,000 posts
Rating
0.0
(0)
Developer
Arnas
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
2 days ago
Last modified
Categories
Share
A no-API-key Reddit scraper covering posts, comments, users, and search — input/output/pricing snapshot-compatible with automation-lab/reddit-scraper as of the published version date. Pure HTTP, no headless browser, three output formats (default JSON, OpenAI fine-tune JSONL, RAG-ready markdown).
What does Reddit Scraper do?
Reddit Scraper extracts structured data from Reddit at $1.15 per 1,000 posts (FREE tier, scaling down to $0.28 per 1,000 at DIAMOND). Reddit has 1.7 billion monthly visits and 100,000+ active communities, making it the largest public discussion platform on the web. This actor scrapes posts, comments, search results, and user profiles from any public subreddit. Just paste any Reddit URL or enter a search query and get clean JSON, CSV, or Excel output. No Reddit account or API key needed.
It supports subreddit listings (hot, new, top, rising), individual posts with nested comments, user submission history, and full-text search across all of Reddit or within a specific subreddit.
Built on pure HTTP requests (no browser), it runs fast and keeps costs low.
Who is Reddit Scraper for?
- Researchers — collect public opinion data, survey sentiment on topics, build datasets for academic studies
- Market analysts — track brand mentions, product feedback, and competitor discussions across subreddits
- SEO & content marketers — discover trending topics, find content ideas, monitor keyword discussions
- AI/ML engineers — gather training data, build sentiment analysis datasets, feed LLM pipelines with real conversations
- Journalists — monitor communities for breaking stories, track public reactions to events
- Product managers — collect user feedback from product subreddits, track feature requests and bug reports
- Lead generation teams — find potential customers asking for solutions your product solves
- Social listening agencies — monitor Reddit alongside other platforms for brand and reputation tracking
Why use Reddit Scraper?
- Posts + comments in one actor — no need to run separate scrapers
- All input types — subreddits, posts, users, search queries, or just paste any Reddit URL
- Pure HTTP — no browser, low memory, fast execution
- Clean, AI-ready output — three formats including OpenAI fine-tune JSONL and RAG markdown
- Pagination built in — scrape hundreds or thousands of posts automatically
- Pay only for results — pay-per-event pricing, no monthly subscription
- No API key required — works without Reddit developer credentials
- Keyword filtering — filter results to only keep posts matching specific terms (filtered posts are not charged)
What data can you extract from Reddit?
Post fields
| Field | Description |
|---|---|
title | Post title |
author | Reddit username |
subreddit | Subreddit name |
score | Net upvotes |
upvoteRatio | Upvote percentage (0-1) |
numComments | Comment count |
createdAt | ISO 8601 timestamp |
url | Full Reddit URL |
selfText | Post body text |
link | External link (for link posts) |
domain | Link domain |
isVideo, isSelf, isNSFW, isSpoiler, isStickied | Content flags |
linkFlairText | Post flair |
totalAwards | Award count |
subredditSubscribers | Subreddit size |
imageUrls | Extracted image URLs (gallery- and preview-aware, HTML-decoded) |
thumbnail | Thumbnail URL |
scrapedAt | When this actor scraped it |
Comment fields
| Field | Description |
|---|---|
author | Commenter username |
body | Comment text |
score | Net upvotes |
createdAt | ISO 8601 timestamp |
depth | Nesting level (0 = top-level) |
isSubmitter | Whether commenter is the post author |
parentId | Parent comment/post ID |
replies | Number of direct replies |
postId | Parent post ID |
postTitle | Parent post title |
scrapedAt | When this actor scraped it |
How much does it cost to scrape Reddit?
Pay-per-event pricing — you pay only for what you scrape. No monthly subscription. The tier price applied to your run depends on your Apify subscription plan.
| Event | FREE | BRONZE | SILVER | GOLD | PLATINUM | DIAMOND |
|---|---|---|---|---|---|---|
| Actor start | $0.003/run | $0.003/run | $0.003/run | $0.003/run | $0.003/run | $0.003/run |
| Per post | $0.00115 | $0.001 | $0.00078 | $0.0006 | $0.0004 | $0.00028 |
| Per comment | $0.000575 | $0.0005 | $0.00039 | $0.0003 | $0.0002 | $0.00014 |
At the FREE tier, that's roughly $1.15 per 1,000 posts or $0.58 per 1,000 comments.
AI format pricing note
When outputFormat is jsonl-finetune or rag-markdown, you are charged only for posts (not comments). Comments are bundled into the post record at no extra charge — cost-effective for large-scale training-set collection.
Real-world cost examples (FREE tier)
| Input | Results | Approx. duration | Approx. cost |
|---|---|---|---|
| 1 subreddit, 100 posts | 100 posts | ~30s | ~$0.12 |
| 5 subreddits, 50 posts each | 250 posts | ~75s | ~$0.30 |
| 1 post + 200 comments | 201 items | ~10s | ~$0.12 |
| Search "AI", 100 results | 100 posts | ~30s | ~$0.12 |
| 1 subreddit, 5 posts + 3 comments each | 20 items | ~15s | ~$0.02 |
Times reflect typical observed runs on RESIDENTIAL proxy in 2026. Actual times depend on Reddit response latency and proxy session warm-up. The deployment runbook captures observed numbers per release; check there for current measurements.
How to scrape Reddit
- Open the actor input page.
- Add Reddit URLs to the Reddit URLs field — any of these formats work:
https://www.reddit.com/r/technology/https://www.reddit.com/r/AskReddit/comments/abc123/post-title/https://www.reddit.com/user/spez/r/technologyor justtechnology
- Or enter a Search query to search across Reddit.
- Set Max posts per source to control how many posts to scrape.
- Enable Include comments if you also want comment data.
- Click Start and wait for results.
- Download your data as JSON, CSV, or Excel from the Dataset tab.
Example input
{"urls": ["https://www.reddit.com/r/technology/"],"maxPostsPerSource": 100,"sort": "hot","includeComments": false}
Scraping a specific post with comments
{"urls": ["https://www.reddit.com/r/technology/comments/abc123/some-post-title/"],"includeComments": true,"maxCommentsPerPost": 50,"commentDepth": 3}
Searching Reddit with keyword filtering
{"searchQuery": "best project management tools","searchSubreddit": "productivity","sort": "relevance","timeFilter": "month","maxPostsPerSource": 50,"filterKeywords": ["Notion", "Asana", "Monday"]}
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | string[] | — | Reddit URLs to scrape (subreddits, posts, users, search URLs). Shortcut forms r/x and bare x are accepted as subreddit names. |
searchQuery | string | — | Search Reddit for this query. Either urls or searchQuery must be provided. |
searchSubreddit | string | — | Limit search to a specific subreddit. |
sort | enum | hot | Sort order: hot, new, top, rising, relevance. |
timeFilter | enum | week | Time filter for top and relevance: hour, day, week, month, year, all. |
maxPostsPerSource | integer | 100 | Max posts per subreddit/search/user. 0 = unlimited (capped by Reddit's ~1000-item-per-listing ceiling). |
includeComments | boolean | false | Also scrape comments for each post. |
maxCommentsPerPost | integer | 100 | Max comments per post. Hard-capped at 1000 to bound proxy/compute cost. |
commentDepth | integer | 3 | Max reply nesting depth (1-10). |
filterKeywords | string[] | [] | Only keep posts containing at least one keyword (case-insensitive title/body match). Filtered posts are not charged. |
outputFormat | enum | default | default (standard JSON), jsonl-finetune (OpenAI chat-format SFT records), rag-markdown (vector-DB-ready markdown documents). |
maxRequestRetries | integer | 5 | Retry attempts for failed requests (1-10). |
proxyConfiguration | object | RESIDENTIAL | Apify proxy. RESIDENTIAL is the default — Reddit aggressively blocks datacenter IP ranges in 2026. Override only if you understand the trade-off. |
Output examples
Default format (post)
{"type": "post","id": "1qw5kwf","title": "Reddit AMA highlights from this week","author": "Sandstorm400","subreddit": "technology","score": 18009,"upvoteRatio": 0.92,"numComments": 1363,"createdAt": "2026-02-05T00:04:58.000Z","url": "https://www.reddit.com/r/technology/comments/1qw5kwf/...","permalink": "/r/technology/comments/1qw5kwf/...","selfText": "","link": "https://example.com/article","domain": "example.com","isVideo": false,"isSelf": false,"isNSFW": false,"isSpoiler": false,"isStickied": false,"thumbnail": "https://external-preview.redd.it/...","linkFlairText": "Society","totalAwards": 0,"subredditSubscribers": 17101887,"imageUrls": [],"scrapedAt": "2026-04-18T12:33:50.000Z"}
Default format (comment)
{"type": "comment","id": "m3abc12","postId": "1qw5kwf","postTitle": "Reddit AMA highlights from this week","author": "commenter123","body": "Phone addiction in teens is a serious issue.","score": 542,"createdAt": "2026-02-05T01:15:00.000Z","permalink": "https://www.reddit.com/r/technology/comments/1qw5kwf/.../m3abc12","depth": 0,"isSubmitter": false,"parentId": "t3_1qw5kwf","replies": 12,"scrapedAt": "2026-04-18T12:33:52.000Z"}
jsonl-finetune format (one record per post, comments bundled in assistant)
{"type": "finetune","messages": [{ "role": "system", "content": "You are analyzing a Reddit discussion. Summarize the key viewpoints from the community." },{ "role": "user", "content": "r/MachineLearning — What's the best approach for few-shot learning in 2026?\n\nI'm building a text classifier with only 50 labeled examples per class..." },{ "role": "assistant", "content": "1. [user_a, score: 482] Try SetFit — fine-tunes sentence transformers on a handful of examples.\n2. [user_b, score: 217] LLM + chain-of-thought prompting with 5-10 examples..." }],"metadata": {"postId": "abc123","subreddit": "MachineLearning","score": 1240,"upvoteRatio": 0.97,"numComments": 87,"createdAt": "2026-03-15T10:22:00.000Z","url": "https://www.reddit.com/r/MachineLearning/...","domain": "reddit.com","isNSFW": false,"linkFlairText": null}}
rag-markdown format (one self-contained markdown doc per post)
{"type": "rag-chunk","chunkId": "reddit-MachineLearning-abc123","markdown": "# What's the best approach for few-shot learning in 2026?\n\n**Subreddit:** r/MachineLearning \n**Author:** u/researcher42 \n**Score:** 1240 (97% upvoted) \n\n## Top Comments\n\n### u/user_a (score: 482)\n\nTry SetFit...\n","metadata": {"source": "reddit","postId": "abc123","subreddit": "MachineLearning","title": "What's the best approach for few-shot learning in 2026?","author": "researcher42","score": 1240,"upvoteRatio": 0.97,"numComments": 87,"createdAt": "2026-03-15T10:22:00.000Z","url": "https://www.reddit.com/r/MachineLearning/comments/abc123/...","domain": "reddit.com","isNSFW": false,"linkFlairText": null}}
Note about Console preview: the dataset preview in Apify Console renders the
defaultpost schema. WhenoutputFormatisjsonl-finetuneorrag-markdown, the records still land in thepostsdataset but with their respective shapes — most preview columns will appear empty. Use the JSON or JSONL exports to inspect the actual records.
How to monitor Reddit for brand mentions
- Search for brand-name mentions:
{ "searchQuery": "YourBrandName", "sort": "new", "timeFilter": "day", "maxPostsPerSource": 100 }
- Monitor specific product subreddits with comments enabled:
{"urls": ["https://www.reddit.com/r/YourProductSubreddit/new/", "https://www.reddit.com/r/CompetitorSubreddit/new/"],"maxPostsPerSource": 50,"includeComments": true,"maxCommentsPerPost": 20}
- Schedule the actor to run daily via Apify's built-in scheduler.
- Pipe the output to Slack/email/your CRM via Apify integrations.
- Filter for sentiment signals with
filterKeywords.
How to use Reddit data for AI and LLM workflows
jsonl-finetune for supervised fine-tuning
Each record is a single training example in OpenAI chat format (system / user / assistant). The assistant message is the top K comments by score (K = min(maxCommentsPerPost, available)), formatted as a ranked list with score + author.
Use cases: fine-tuning domain-specific chatbots, building Q&A models on niche topics, creating instruction-tuning datasets from community knowledge.
rag-markdown for vector-DB ingestion
Each record is a self-contained markdown document with metadata header + top K comments as ### u/<author> (score: N) H3 sections. The chunkId follows the stable format reddit-{subreddit}-{postId} for deduplication and upsert into Pinecone, Weaviate, Chroma, or Qdrant.
Use cases: building domain-specific RAG systems, enriching knowledge bases with community knowledge, powering chatbots that answer from Reddit discussions.
Quality filtering tips
When building training data or RAG corpora from Reddit, filter for quality:
- Score threshold — keep only
metadata.score >= 50 - Upvote ratio —
metadata.upvoteRatio >= 0.85excludes controversial posts - NSFW filter —
metadata.isNSFW === falsefor general-purpose datasets - Time filter —
sort: "top"+timeFilter: "year"gets the highest-quality content from the past year
Reddit data export: CSV, Excel, JSON
Apify datasets support JSON, CSV, Excel, XML, and HTML export formats. Use the dataset export buttons in Console or the dataset URL programmatically:
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv" \-H "Authorization: Bearer YOUR_API_TOKEN" > reddit_posts.csv
How to scrape Reddit without getting blocked
The actor handles Reddit's rate limits and anti-bot protections automatically:
- Reactive 429 backoff — when Reddit returns 429, the actor honors the
Retry-Afterheader (or falls back to exponential backoff) before retrying. Persistent blocks trigger session retirement (= new proxy IP). - Real-Chrome User-Agent — Reddit filters generic UAs; the actor sends a real-Chrome-shaped UA + the actor's identifier suffix.
- RESIDENTIAL proxy default — Reddit blocks datacenter IPs aggressively in 2026; residential is the practical baseline.
- Pure HTTP — no browser fingerprinting to mismatch.
For very high-volume scraping (tens of thousands of posts per hour), split the workload across multiple smaller runs.
How to use Reddit Scraper with the API
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/reddit-scraper').call({urls: ['https://www.reddit.com/r/technology/'],maxPostsPerSource: 100,sort: 'hot',includeComments: false,});const { items } = await client.dataset(run.namedDatasetIds.posts).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('automation-lab/reddit-scraper').call(run_input={'urls': ['https://www.reddit.com/r/technology/'],'maxPostsPerSource': 100,'sort': 'hot','includeComments': False,})items = client.dataset(run['namedDatasetIds']['posts']).list_items().itemsprint(items)
cURL
curl "https://api.apify.com/v2/acts/automation-lab~reddit-scraper/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"urls": ["https://www.reddit.com/r/technology/"],"maxPostsPerSource": 100,"sort": "hot"}'
Use with AI agents via MCP
Reddit Scraper is available as a tool for AI assistants that support the Model Context Protocol (MCP).
Setup for Claude Code
claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/reddit-scraper"
Setup for Claude Desktop, Cursor, or VS Code
{"mcpServers": {"apify": { "url": "https://mcp.apify.com?tools=automation-lab/reddit-scraper" }}}
Your AI assistant will use OAuth to authenticate with your Apify account on first use. Then ask:
- "Get the top 100 posts from r/technology this month"
- "Scrape comments from this Reddit thread"
- "Search Reddit for discussions about 'AI coding'"
Integrations
- Google Sheets — auto-export Reddit posts and comments to a spreadsheet
- Slack / Discord — get notifications when scraping finishes or when posts match keywords
- Zapier / Make — trigger workflows on new Reddit data
- Webhooks — send results to your own API endpoint
- Scheduled runs — daily/weekly subreddit monitoring via Apify scheduler
- Data warehouses — pipe data to BigQuery, Snowflake, or PostgreSQL
Is it legal to scrape Reddit?
Scraping publicly available data from Reddit is generally considered legal:
- Public data only. This actor only accesses publicly available posts and comments. It does not log in, bypass authentication, or access private content.
- Legal precedent. The US Ninth Circuit ruling in hiQ Labs v. LinkedIn (2022) established that scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA).
- No personal-data extraction. Usernames are public pseudonyms on Reddit; the actor does not attempt to deanonymize users or collect private information.
- Terms of Service. Reddit's ToS restricts automated access, but ToS violations are a contractual matter, not a criminal one.
- GDPR. If you scrape data that includes EU users, ensure your use case complies with GDPR. Aggregated, anonymized analysis is generally safe; storing individual user data for profiling may require additional compliance steps.
This information is for educational purposes and does not constitute legal advice.
FAQ
Can I scrape any subreddit? Yes, as long as it's public. Private subreddits return 403 and are skipped.
Does it scrape NSFW content? Yes, NSFW posts are included by default with the isNSFW: true flag. Filter them client-side if you want them excluded.
How many posts can I scrape? Set maxPostsPerSource: 0 for unlimited, capped by Reddit's ~1000-post pagination ceiling per listing. For more, use search with multiple time filters (e.g., timeFilter: month then year) to access older content.
What happens if Reddit rate-limits me? The actor reads Retry-After headers when present, falls back to exponential backoff otherwise, and retires sessions on persistent blocks. No configuration needed.
Can I export to CSV or Excel? Yes. Apify datasets support JSON, CSV, Excel, XML, and HTML formats.
The scraper returns fewer posts than I expected. Reddit's pagination caps at ~1000 posts per listing. For deeper history, use search with different time filters.
I'm getting 403 errors for a subreddit. The subreddit is private, quarantined, or banned. Check in an incognito browser — if you can't see it there, the scraper can't either.
Can I use filterKeywords to narrow search results? Yes. Set filterKeywords to an array of terms; only posts whose title or body contains at least one keyword are kept. Filtered posts are not charged.
How do I scrape a user's post history? Paste the user's profile URL (e.g., https://www.reddit.com/user/spez/) into the URLs field.
Does it handle deleted or removed posts? Deleted authors appear as [deleted]; removed bodies appear as [removed]. Records are emitted, not dropped.
What proxy should I use? RESIDENTIAL is the default and recommended for any sustained run. DATACENTER is offered as a low-cost option but Reddit blocks datacenter IPs aggressively in 2026 — expect 429s and dropped sources for runs above ~100 posts on DATACENTER.
Do AI formats (jsonl-finetune, rag-markdown) require comments? They work without comments, but the assistant message / comments section will be a placeholder. For meaningful AI training data, set includeComments: true.