Pricing

Pay per event

Try for free

Go to Apify Store

Reddit Scraper

Try for free

Working Reddit scraper for public Reddit search, subreddit listings, posts, comments, and user profiles. No Reddit account or API key required.

Pricing

Pay per event

Rating

4.6

(4)

Developer

Stas Persiianenko

Actor stats

Bookmarked

1.7K

Total users

636

Monthly active users

0.18 hours

Issues response

7 days ago

Last modified

Reddit Scraper — Search, Listings, Posts & Comments

What does Reddit Scraper do?

Reddit Scraper extracts structured data from public Reddit pages with no Reddit account or API key. It is currently best for Reddit search monitoring, subreddit hot/new/top/rising listings, brand mentions, keyword discovery, and public discussion datasets. Direct post pages, comments, user pages, and some metadata are supported on a best-effort basis when Reddit makes enough public data available.

Recovery note: Reddit recently restricted several public data paths this Actor used in the past. We recovered the Actor for practical Reddit monitoring, but some older inputs and output fields no longer have the same completeness as before. In particular, Reddit's public RSS/recovery pages do not reliably expose vote data, so score, upvoteRatio, and comment scores are best-effort and may be 0 even when the post or comment has upvotes on Reddit. Sorry for the disruption if you relied on the previous behavior. Results are exported as clean JSON, CSV, or Excel, and runs include warnings when Reddit limits specific metadata or comment coverage.

Use this with a brand monitoring stack

Apify's brand-monitoring guide shows why the best monitoring setup combines multiple channels instead of relying on one dashboard. Use Reddit Scraper as the community-discussion layer, then pair it with:

Twitter/X Scraper for social search, campaign hashtags, and public profile monitoring
Threads Scraper for public Threads posts and keyword searches
G2 Reviews & Products Scraper for B2B review intelligence and competitor sentiment
Trustpilot Scraper for consumer review monitoring
Google News Scraper for media coverage and PR alerts
Website Change Monitor for competitor pricing, changelog, and landing-page changes

Together these actors form a custom brand-monitoring pipeline on Apify: social conversations, review sites, news coverage, and owned-web changes in one workflow.

Who is it for?

Reddit Scraper is built for anyone who needs Reddit data at scale without dealing with Reddit's restrictive official API:

🔬 Researchers — collect public opinion data, survey sentiment on topics, build datasets for academic studies 📊 Market analysts — track brand mentions, product feedback, and competitor discussions across subreddits 📈 SEO & content marketers — discover trending topics, find content ideas, and monitor keyword discussions 🤖 AI/ML engineers — gather training data, build sentiment analysis datasets, or feed LLM pipelines with real conversations 📰 Journalists — monitor communities for breaking stories, track public reactions to events 🏢 Product managers — collect user feedback from product subreddits, track feature requests and bug reports 💼 Lead generation teams — find potential customers asking for solutions your product solves 📉 Social listening agencies — monitor Reddit alongside other platforms for brand and reputation tracking

Why use Reddit Scraper?

✅ Working Reddit scraper — restored for practical Reddit monitoring and public data export 🔎 Strong search + listing coverage — best for subreddit listings, Reddit search monitoring, and keyword/topic discovery 📦 Posts + best-effort comments in one actor — direct post/comment extraction is attempted where enough public data is available 🔗 Flexible input types — subreddits, posts, users, search queries, or just paste a Reddit URL ⚡ Fast, lightweight runs — designed for low-overhead Reddit collection ✅ Clean, AI-ready output — structured fields with consistent naming, ready for analytics, LLM training, RAG pipelines, and AI agent workflows 📄 Pagination built in — scrape hundreds or thousands of search/listing posts automatically 💰 Pay only for results — pay-per-event pricing, no monthly subscription 🔑 No API key required — works without Reddit developer credentials 🔍 Keyword filtering — filter results to only keep posts matching specific terms

What data can you extract from Reddit?

Post fields:

Field	Description
title	Post title
author	Reddit username
subreddit	Subreddit name
score	Best-effort net upvotes; may be `0` when Reddit does not expose vote data publicly
upvoteRatio	Best-effort upvote percentage (0-1); may be `0` or unavailable on public recovery paths
numComments	Comment count
createdAt	ISO 8601 timestamp
url	Full Reddit URL
selfText	Post body text
link	External link (for link posts)
domain	Link domain
isVideo, isSelf, isNSFW, isSpoiler	Content flags
linkFlairText	Post flair
totalAwards	Award count
subredditSubscribers	Subreddit size
imageUrls	Extracted image URLs
thumbnail	Thumbnail URL

Comment fields:

Field	Description
author	Commenter username
body	Comment text
score	Best-effort net upvotes; may be `0` when Reddit does not expose vote data publicly
createdAt	ISO 8601 timestamp
depth	Nesting level (0 = top-level)
isSubmitter	Whether commenter is the post author
parentId	Parent comment/post ID
replies	Number of direct replies
postId	Parent post ID
postTitle	Parent post title
targetCommentId	Requested comment ID when targeted comment context mode is used
isTargetComment	Whether this comment is the requested target comment
commentContextClassification	Best-known role in the focused context output (`target` or `context`)

How much does it cost to scrape Reddit?

This Actor uses pay-per-event pricing — you pay only for what you scrape. No monthly subscription. Platform costs are included.

For broad searches or unlimited modes (maxPostsPerSource: 0), set Maximum charge per run in the actor run settings before starting. The actor now stops automatically once the cap is reached, but it only checks this when charging events, so a hard cap is still required to prevent overage on large jobs.

Pricing is tiered — the more you use, the cheaper each item gets:

Event	FREE tier	BRONZE	SILVER	GOLD	PLATINUM	DIAMOND
Actor start	$0.003/run	$0.003/run	$0.003/run	$0.003/run	$0.003/run	$0.003/run
Per post	$0.00115	$0.001	$0.00078	$0.0006	$0.0004	$0.00028
Per comment	$0.000575	$0.0005	$0.00039	$0.0003	$0.0002	$0.00014

At the FREE tier, that's $1.15 per 1,000 posts or $0.58 per 1,000 comments. Higher usage tiers get significant discounts — DIAMOND tier is 75% cheaper per post.

Real-world cost examples (FREE tier):

Input	Results	Duration	Cost
1 subreddit, 100 posts	100 posts	~15s	~$0.12
5 subreddits, 50 posts each	250 posts	~30s	~$0.29
1 post + 200 comments	201 items	~5s	~$0.12
Search "AI", 100 results	100 posts	~15s	~$0.12
1 subreddit, 5 posts + 3 comments each	20 items	~12s	~$0.02

How do I search Reddit posts and comments?

Go to the Reddit Scraper input page
Add Reddit URLs to the Reddit URLs field — any of these formats work:
- https://www.reddit.com/r/technology/
- https://www.reddit.com/r/AskReddit/comments/abc123/post-title/
- https://www.reddit.com/user/spez/
- r/technology or just technology
Or enter a Search Query to search across Reddit
Set Max Posts per Source to control how many posts to scrape
Enable Include Comments if you also want comment data
Click Start and wait for results
Download your data as JSON, CSV, or Excel from the Dataset tab

Optional MCP connector export

By default, Reddit Scraper only writes results to the Apify dataset. If you explicitly select Optional MCP destination connector, the Actor will also send a small bounded run summary to that user-selected connector after dataset records are saved. This is designed for Slack/Notion-style reporting workflows.

Guardrails:

This is opt-in per run; leaving the connector empty preserves the existing dataset-only behavior.
Third-party credentials are handled by Apify MCP connectors and are not exposed to the Actor.
The Actor may write/post only to the connector selected in the run input, using the safe send/post/write/create/append tools allowed by the input schema.
Connector export failures are logged as warnings and do not hide or delete scraped dataset results.
Use MCP destination target for a connector-specific channel, page, database, or destination ID when the selected tool requires one.

Example MCP summary input:

{
    "urls": ["https://www.reddit.com/r/technology/"],
    "maxPostsPerSource": 10,
    "mcpDestinationConnector": "conn_your_connector_id",
    "mcpDestinationTarget": "optional-channel-or-page-id",
    "mcpExportMaxItems": 5
}

Example input:

{
    "urls": ["https://www.reddit.com/r/technology/"],
    "maxPostsPerSource": 100,
    "sort": "hot",
    "includeComments": false
}

Scraping a specific post with comments:

{
    "urls": ["https://www.reddit.com/r/technology/comments/abc123/some-post-title/"],
    "includeComments": true,
    "maxCommentsPerPost": 50,
    "commentDepth": 3
}

Focused context for a specific comment URL:

{
    "urls": ["https://www.reddit.com/r/AskReddit/comments/abc123/post-title/def456/"],
    "commentContextMode": true,
    "commentContextDepth": 3,
    "maxCommentsPerPost": 25
}

Targeted comment context mode preserves the comment ID in the URL and asks Reddit's public RSS/context page for the original post plus a focused set around that comment. It is best-effort: Reddit may return fewer context comments, omit parent/depth metadata, or not expose the requested comment; when that happens, output warnings explain the limitation instead of silently claiming a complete thread reconstruction.

MCP / AI assistant single-URL aliases:

The canonical field is urls, but MCP and LLM tools sometimes send a single URL as url, comment_url, or commentUrl. Those aliases are accepted and normalized automatically:

{
    "comment_url": "https://www.reddit.com/r/AskReddit/comments/abc123/post-title/def456/",
    "commentContextMode": true,
    "commentContextDepth": 3,
    "maxCommentsPerPost": 25
}

If no valid urls, url, comment_url, commentUrl, or searchQuery is provided, the run log prints an actionable example instead of silently returning zero records.

Searching Reddit with keyword filtering:

{
    "searchQuery": "best project management tools",
    "searchSubreddit": "productivity",
    "sort": "relevance",
    "timeFilter": "month",
    "maxPostsPerSource": 50,
    "filterKeywords": ["Notion", "Asana", "Monday"]
}

Input parameters

Parameter	Type	Default	Description
urls	string[]	—	Reddit URLs to scrape (subreddits, posts, users, search URLs)
url	string/string[]	—	Compatibility alias for tools that send one URL instead of `urls`
comment_url / commentUrl	string/string[]	—	Compatibility aliases for single Reddit comment URLs; can be used with `commentContextMode`
searchQuery	string	—	Search Reddit for this query
searchSubreddit	string	—	Limit search to a specific subreddit
sort	enum	hot	Sort order: hot, new, top, rising, relevance
timeFilter	enum	week	Time filter for top/relevance: hour, day, week, month, year, all
maxPostsPerSource	integer	100	Max posts per subreddit/search/user. 0 = unlimited
includeComments	boolean	false	Also scrape comments for each post
maxCommentsPerPost	integer	100	Max comments per post
commentDepth	integer	3	Max reply nesting depth (1-10)
commentContextMode	boolean	false	For Reddit comment URLs, fetch a focused best-effort context around the target comment instead of generic post comments
commentContextDepth	integer	3	Best-effort context levels to request from Reddit public RSS/context output
filterKeywords	string[]	[]	Only keep posts containing at least one keyword (case-insensitive). Leave empty to keep all

Coverage notes

The default mode is recommended for most users. Coverage varies by input type and by what Reddit exposes publicly at run time:

Subreddit listings and Reddit search are the strongest paths.
User pages and direct post pages are best-effort and may return fewer fields.
Comment extraction is best-effort. Deep nested threads, expanded "more comments", and complete historical parity are not guaranteed.
Targeted comment context mode is a focused RSS/context view, not a full tree reconstruction; parentId/depth can be unavailable and target-comment availability depends on Reddit's public page.
Vote fields are best-effort: score, upvoteRatio, and comment scores may be 0/missing when Reddit does not expose vote data on public RSS/recovery pages, even though comment counts can still be recovered.
Some fields such as awards, subscriber counts, image preview arrays, and full media metadata may be missing or less complete on recovered public pages.
Logs include warnings when an input returns partial or degraded results so output caveats are visible in the run.

Output example

Post:

{
    "type": "post",
    "id": "1qw5kwf",
    "title": "3 Teen Sisters Jump to Their Deaths from 9th Floor Apartment After Parents Remove Access to Phone",
    "author": "Sandstorm400",
    "subreddit": "technology",
    "score": 0,
    "upvoteRatio": 0,
    "numComments": 1363,
    "createdAt": "2026-02-05T00:04:58.000Z",
    "url": "https://www.reddit.com/r/technology/comments/1qw5kwf/3_teen_sisters_jump_to_their_deaths_from_9th/",
    "permalink": "/r/technology/comments/1qw5kwf/3_teen_sisters_jump_to_their_deaths_from_9th/",
    "selfText": "",
    "link": "https://people.com/3-sisters-jumping-deaths-online-gaming-addiction-11899069",
    "domain": "people.com",
    "isVideo": false,
    "isSelf": false,
    "isNSFW": false,
    "isSpoiler": false,
    "isStickied": false,
    "thumbnail": "https://external-preview.redd.it/...",
    "linkFlairText": "Society",
    "totalAwards": 0,
    "subredditSubscribers": 17101887,
    "imageUrls": [],
    "scrapedAt": "2026-02-05T12:33:50.000Z",
    "warnings": ["Some fields are limited because Reddit restricts automated access. Score/upvoteRatio are best-effort and may be 0 when Reddit does not expose vote data publicly. Awards, subscriber totals, and media metadata may also be unavailable or approximate."]
}

Comment:

{
    "type": "comment",
    "id": "m3abc12",
    "postId": "1qw5kwf",
    "postTitle": "3 Teen Sisters Jump to Their Deaths...",
    "author": "commenter123",
    "body": "This is heartbreaking. Phone addiction in teens is a serious issue.",
    "score": 0,
    "createdAt": "2026-02-05T01:15:00.000Z",
    "permalink": "/r/technology/comments/1qw5kwf/.../m3abc12",
    "depth": 0,
    "isSubmitter": false,
    "parentId": "t3_1qw5kwf",
    "replies": 12,
    "scrapedAt": "2026-02-05T12:33:52.000Z",
    "warnings": ["Some comment metadata is limited because Reddit restricts automated access. Comment score is best-effort and may be 0 when Reddit does not expose vote data publicly. Nested tree fields may be unavailable or approximate."]
}

How do I get the best results from Reddit Scraper?

🎯 Start small — test with 5-10 posts before running large scrapes 📊 Use sort + time filter — sort: "top" with timeFilter: "month" gets the most popular content 💬 Comments cost extra — only enable includeComments when you need them 📋 Multiple subreddits — add multiple URLs to scrape several subreddits in one run 🔍 Search within subreddit — use searchSubreddit to limit search to a specific community 🔗 Direct post URLs — paste a specific post URL to get that post + its comments 🏷️ Keyword filtering — use filterKeywords to keep only relevant posts when Reddit search returns loose matches 📅 Time-slice large scrapes — for larger collections, run multiple searches with different timeFilter values (month, year, all) 🔄 Scheduled runs — set up recurring runs to monitor subreddits daily or weekly

How do I keep Reddit scrapes reliable?

Start with small test runs, then increase limits gradually once the output matches your needs:

🕐 Plan around pagination limits — very large collections may need multiple runs with different timeFilter values (e.g., month, year, all) to cover older content.

🔍 Prefer targeted inputs — specific subreddits, search terms, and keyword filters usually produce cleaner datasets than very broad searches.

📦 Split very large jobs — for high-volume monitoring, use several smaller scheduled runs instead of one oversized run.

Reddit data export: how to download Reddit posts to CSV, Excel, or JSON

Reddit Scraper outputs data to Apify's cloud dataset, which you can download in multiple formats immediately after each run:

📄 JSON — full structured output, best for developers and data pipelines. Each post and comment is a separate JSON object with all available fields.

📊 CSV — flat table format, opens directly in Excel, Google Sheets, or any data tool. Nested fields (like imageUrls) are serialized as strings in CSV mode.

📗 Excel (.xlsx) — download a ready-to-open spreadsheet. Same structure as CSV but formatted for Microsoft Excel.

🔗 Direct dataset URL — after a run, your dataset gets a permanent URL. You can download data programmatically at any time:

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv" \
  -H "Authorization: Bearer YOUR_API_TOKEN" > reddit_posts.csv

📅 Scheduled exports — combine with Apify's scheduler to run the scraper daily and auto-export new posts to Google Sheets or a webhook endpoint.

🗄️ Data warehouse integration — pipe exports directly to BigQuery, Snowflake, or PostgreSQL using Apify integrations for long-term storage and trend analysis.

How to monitor Reddit for brand mentions

Reddit is one of the most valuable sources for unsolicited brand feedback — people discuss products candidly in topic-specific communities. Here is how to set up ongoing brand monitoring with Reddit Scraper:

1. Search for brand name mentions:

{
    "searchQuery": "YourBrandName",
    "sort": "new",
    "timeFilter": "day",
    "maxPostsPerSource": 100,
    "includeComments": false
}

2. Monitor specific product subreddits:

{
    "urls": [
        "https://www.reddit.com/r/YourProductSubreddit/new/",
        "https://www.reddit.com/r/CompetitorSubreddit/new/"
    ],
    "maxPostsPerSource": 50,
    "includeComments": true,
    "maxCommentsPerPost": 20
}

3. Schedule it to run daily — use Apify's built-in scheduler to run the actor every 24 hours. New posts and mentions land in a fresh dataset each time.

4. Connect to Slack or email — use Apify's Slack integration or a webhook to get notified immediately when new brand mentions are found.

5. Filter for sentiment signals — use filterKeywords to focus only on posts containing words like your product name, competitor names, or problem keywords:

{
    "searchQuery": "project management software",
    "filterKeywords": ["Notion", "Asana", "ClickUp", "Monday"]
}

This workflow gives you a near-real-time stream of Reddit brand mentions without maintaining a developer account or paying for the Reddit API.

How to use Reddit data for AI and LLM workflows

Reddit Scraper includes three output formats designed for different AI/ML workflows. Set the outputFormat input field to choose your format.

Pricing note for AI formats: When using jsonl-finetune or rag-markdown, you are charged only for posts (not comments). Comments are bundled into the post record at no extra charge, making these formats cost-effective for large-scale training dataset collection.

Format: `default` (standard JSON)

The default output is backward-compatible structured JSON with all post and comment fields. Use this for general-purpose analysis, brand monitoring, and data pipelines.

Format: `jsonl-finetune` — LLM supervised fine-tuning (SFT)

Each output record is a single training example in the OpenAI chat format (system / user / assistant messages), ready for supervised fine-tuning with OpenAI, Hugging Face, or any SFT framework.

System message: analytical role instruction
User message: post title + body text
Assistant message: available comments in the scraped order; comment score is included only as best-effort metadata
Metadata: subreddit, best-effort score/upvote ratio, post date — use vote fields cautiously because Reddit may omit them on public recovery paths

Example input:

{
    "urls": ["https://www.reddit.com/r/MachineLearning/"],
    "maxPostsPerSource": 500,
    "includeComments": true,
    "maxCommentsPerPost": 10,
    "commentDepth": 1,
    "outputFormat": "jsonl-finetune"
}

Example output record:

{
    "type": "finetune",
    "messages": [
        { "role": "system", "content": "You are analyzing a Reddit discussion. Summarize the key viewpoints from the community." },
        { "role": "user", "content": "r/MachineLearning — What's the best approach for few-shot learning in 2026?\n\nI'm building a text classifier with only 50 labeled examples per class..." },
        { "role": "assistant", "content": "1. [user_a, score: 0] Try SetFit — it fine-tunes sentence transformers on just a handful of examples and outperforms GPT-3 prompting on most benchmarks.\n2. [user_b, score: 0] LLM + chain-of-thought prompting with 5-10 examples can match full fine-tuning for classification tasks, especially with GPT-4o..." }
    ],
    "metadata": {
        "postId": "abc123",
        "subreddit": "MachineLearning",
        "score": 0,
        "upvoteRatio": 0,
        ...
    }
}

Use cases: fine-tuning domain-specific chatbots, building Q&A models on niche topics, creating instruction-tuning datasets from community knowledge.

Format: `rag-markdown` — RAG pipeline / vector DB ingestion

Each output record is a self-contained markdown document combining the post and its top comments. Optimized for chunking and embedding into vector databases like Pinecone, Weaviate, Chroma, or Qdrant.

markdown field: full document ready for embedding — title, post body, metadata header, top comments as H3 sections
chunkId: stable identifier (reddit-{subreddit}-{postId}) for deduplication and upserts
metadata object: all filterable attributes (subreddit, best-effort score, date, URL, domain, flair, NSFW flag) as flat fields for metadata filtering in vector DBs

Example input:

{
    "searchQuery": "LLM fine-tuning best practices",
    "maxPostsPerSource": 200,
    "includeComments": true,
    "maxCommentsPerPost": 10,
    "commentDepth": 1,
    "outputFormat": "rag-markdown"
}

Example output record:

{
    "type": "rag-chunk",
    "chunkId": "reddit-MachineLearning-abc123",
    "markdown": "# What's the best approach for few-shot learning in 2026?\n\n**Subreddit:** r/MachineLearning  \n**Author:** u/researcher42  \n**Score:** 0 (vote data unavailable)  \n...\n\n## Top Comments\n\n### u/user_a (score: 0)\n\nTry SetFit...",
    "metadata": {
        "source": "reddit",
        "subreddit": "MachineLearning",
        "postId": "abc123",
        "title": "What's the best approach for few-shot learning in 2026?",
        "score": 0,
        "upvoteRatio": 0,
        "createdAt": "2026-03-15T10:22:00.000Z",
        "url": "https://www.reddit.com/r/MachineLearning/comments/abc123/...",
        "isNSFW": false,
        ...
    }
}

Use cases: building domain-specific RAG systems, enriching knowledge bases with community knowledge, powering chatbots that answer questions from Reddit discussions.

Quality filtering tips for AI datasets

When building training data or RAG corpora from Reddit, filter for quality. Treat vote-based filters as optional because Reddit may omit public vote data:

Score threshold — only use metadata.score >= 50 when your run actually returns non-zero scores; otherwise prefer subreddit, keyword, and recency filters
Upvote ratio — only use metadata.upvoteRatio >= 0.85 when non-zero ratios are present; public recovery paths may return 0
NSFW filter — filter metadata.isNSFW == false for general-purpose datasets
Subreddit targeting — scrape expert communities (r/MachineLearning, r/learnprogramming, r/personalfinance) for domain-specific knowledge
Time filter — use sort: "top" + timeFilter: "year" to get the highest-quality content from the past year

Integrations

Connect Reddit Scraper to other apps and services using Apify integrations:

📗 Google Sheets — automatically export Reddit posts and comments to a spreadsheet for tracking trends or building content calendars 💬 Slack / Discord — get notifications when scraping finishes, or set up alerts for posts matching specific keywords ⚡ Zapier / Make — trigger workflows based on new Reddit data, e.g., save high-engagement posts to a CRM or send weekly reports 🔔 Webhooks — send results to your own API endpoint for custom processing pipelines 🗓️ Scheduled runs — run the scraper daily or weekly to monitor subreddits for new discussions 🗄️ Data warehouses — pipe data to BigQuery, Snowflake, or PostgreSQL for large-scale analysis 🤖 AI/LLM pipelines — feed Reddit discussions into sentiment analysis, topic modeling, or lead qualification workflows

API usage

Use the Apify API to run Reddit Scraper programmatically from your own code. Available in Python, Node.js, and any language that can call web APIs.

Node.js:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/reddit-scraper').call({
    urls: ['https://www.reddit.com/r/technology/'],
    maxPostsPerSource: 100,
    sort: 'hot',
    includeComments: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run = client.actor('automation-lab/reddit-scraper').call(run_input={
    'urls': ['https://www.reddit.com/r/technology/'],
    'maxPostsPerSource': 100,
    'sort': 'hot',
    'includeComments': False,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL:

curl "https://api.apify.com/v2/acts/automation-lab~reddit-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "urls": ["https://www.reddit.com/r/technology/"],
    "maxPostsPerSource": 100,
    "sort": "hot"
  }'

To retrieve results after the run completes:

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

Use with AI agents via MCP

Reddit Scraper is available as a tool for AI assistants that support the Model Context Protocol (MCP). This lets you use natural language to scrape data — just ask your AI assistant and it will configure and run the scraper for you.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/reddit-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/reddit-scraper"
        }
    }
}

Your AI assistant will use OAuth to authenticate with your Apify account on first use.

Example prompts

Once connected, try asking your AI assistant:

"Get the top 100 posts from r/technology this month"
"Scrape comments from this Reddit thread"
"Search Reddit for discussions about 'AI coding'"
"Find posts mentioning our product in r/SaaS"

Learn more in the Apify MCP documentation.

Legality

Scraping publicly available data from Reddit is generally considered legal. Here are the key points:

⚖️ Public data — Reddit Scraper only accesses publicly available posts and comments. It does not log in, bypass authentication, or access private content.

📜 Legal precedent — The US Ninth Circuit ruling in hiQ Labs v. LinkedIn (2022) established that scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA).

🔒 No personal data extraction — The scraper collects usernames (which are public pseudonyms on Reddit) but does not attempt to deanonymize users or collect private information.

📋 Terms of Service — Reddit's ToS restricts automated access, but ToS violations are a contractual matter, not a criminal one. Many courts have ruled that ToS alone cannot make scraping illegal.

🇪🇺 GDPR considerations — If you scrape data that includes EU users, ensure your use case complies with GDPR. Aggregated, anonymized analysis is generally safe. Storing individual user data for profiling may require additional compliance steps.

This information is for educational purposes and does not constitute legal advice. Consult a qualified attorney for guidance specific to your use case and jurisdiction.

FAQ

Can I scrape any subreddit? Yes, as long as the subreddit is public. Private subreddits will return a 403 error and be skipped.

Does it scrape NSFW content? Yes, NSFW posts are included by default. You can filter them out using the isNSFW field in the output.

How many posts can I scrape? There is no hard limit. Set maxPostsPerSource: 0 for unlimited. Reddit's pagination allows up to ~1,000 posts per listing. For more, use search with different time filters.

Can I scrape comments from multiple posts at once? Yes. Enable includeComments and the scraper will fetch comments for every post it finds. Use maxCommentsPerPost to control how many comments per post.

What happens if Reddit temporarily limits access? The actor may pause or return partial results depending on what public data is available at run time.

Can I export to CSV or Excel? Yes. Apify datasets support JSON, CSV, Excel, XML, and HTML export formats. Use the dataset export buttons or API.

The scraper returns fewer posts than I expected — what's going on? Reddit's pagination API has a limit of approximately 1,000 posts per listing. If you need more, use search with different time filters (e.g., timeFilter: "month" then timeFilter: "year") to access older content. Also note that some subreddits simply have fewer posts than your limit.

I'm getting 403 errors for a subreddit — how do I fix this? This means the subreddit is private, quarantined, or banned. The scraper can only access public subreddits. Check if you can view the subreddit in an incognito browser window — if not, the scraper won't be able to access it either.

Can I use filterKeywords to narrow down search results? Yes. Set filterKeywords to an array of terms and only posts whose title or body contains at least one keyword will be kept. This is useful when Reddit's built-in search returns loosely related results.

How do I scrape a user's post history? Paste the user's profile URL (e.g., https://www.reddit.com/user/spez/) into the URLs field. The scraper will extract their public submissions.

Does it handle deleted or removed posts? Deleted posts may appear with [deleted] as the author and empty body text. Removed posts (mod-removed) may still show the title but have [removed] as the body.

Instagram Scraper — Scrape Instagram posts, profiles, comments, and hashtags
Threads Scraper — Extract posts and profiles from Meta's Threads
Twitter/X Scraper — Extract tweets, user profiles, and search results from X
TikTok Scraper — Scrape TikTok videos, profiles, and trending hashtag feeds
Bluesky Scraper — Scrape Bluesky posts, profiles, and search results
Social Media Profile Finder — Find social media profiles across platforms from a list of websites

Reddit Scraper

macrocosmos/reddit-scraper

Scrape Reddit data, via URL, subreddit, keyword, username.

Macrocosmos

715

5.0

Reddit Scraper Lite

trudax/reddit-scraper-lite

Pay Per Result, unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Trudax

28K

4.6

Reddit Scraper - Posts, Comments, Search & Subreddits ($2/1k)

harshmaur/reddit-scraper

Scrape Reddit posts, comments, subreddits, user profiles, and keyword search results - no API key, no rate limits, no login. From $2 per 1,000 results, pay only for what you use. Full comment threads, 60+ fields per post, media and galleries. Works with AI Agents, MCP, n8n, Make, Zapier and more.

Harsh Maur

4.6K

4.9

Reddit Scraper

trudax/reddit-scraper

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Trudax

14K

2.5

Reddit Scraper

alex_claw/reddit-scraper

Alex Claw

Reddit Scraper

khaki_yak/reddit-scraper

AI Automation

Reddit Scraper

scraply/reddit-scraper

Scraply

Reddit Scraper

botflowtech/reddit-scraper

Reddit Scraper

BotFlowTech

Reddit Scraper

scrapebase/reddit-scraper

ScrapeBase

Reddit Scraper

scrapepilotapi/reddit-scraper

ScrapePilot

Reddit Scraper

Reddit Scraper — Search, Listings, Posts & Comments

What does Reddit Scraper do?

Use this with a brand monitoring stack

Who is it for?

Why use Reddit Scraper?

What data can you extract from Reddit?

How much does it cost to scrape Reddit?

How do I search Reddit posts and comments?

Optional MCP connector export

Input parameters

Coverage notes

Output example

How do I get the best results from Reddit Scraper?

How do I keep Reddit scrapes reliable?

Reddit data export: how to download Reddit posts to CSV, Excel, or JSON

How to monitor Reddit for brand mentions

How to use Reddit data for AI and LLM workflows

Format: default (standard JSON)

Format: jsonl-finetune — LLM supervised fine-tuning (SFT)

Format: rag-markdown — RAG pipeline / vector DB ingestion

Quality filtering tips for AI datasets

Integrations

API usage

Use with AI agents via MCP

Setup for Claude Code

Setup for Claude Desktop, Cursor, or VS Code

Example prompts

Legality

FAQ

Related scrapers

You might also like

Reddit Scraper

Reddit Scraper Lite

Reddit Scraper - Posts, Comments, Search & Subreddits ($2/1k)

Reddit Scraper

Reddit Scraper

Reddit Scraper

Reddit Scraper

Reddit Scraper

Reddit Scraper

Reddit Scraper

Format: `default` (standard JSON)

Format: `jsonl-finetune` — LLM supervised fine-tuning (SFT)

Format: `rag-markdown` — RAG pipeline / vector DB ingestion