Reddit Scraper avatar

Reddit Scraper

Pricing

from $0.50 / 1,000 results

Go to Apify Store
Reddit Scraper

Reddit Scraper

Scrape Reddit posts, comments, search results, and user profiles. No API keys or browser needed. Supports 4 modes: subreddit posts (hot/new/top/rising), Reddit search, user profiles, and full comment trees. Fast, lightweight HTTP-based scraping with built-in rate limiting and retry logic.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

mick_

mick_

Maintained by Community

Actor stats

0

Bookmarked

30

Total users

13

Monthly active users

16 hours ago

Last modified

Share

Scrape Reddit posts, comments, search results, and user profiles at scale. No API keys, no login, no browser required. Batch search across multiple queries in one run. MCP-ready for AI agent pipelines.

What does it do?

Reddit Scraper pulls structured data from Reddit using old.reddit.com JSON endpoints — no OAuth, no Reddit API credentials, no headless browser. You get clean, consistent JSON output ready for analysis, NLP pipelines, or downstream AI tools.

v1.1.0: Added batch search (searchQueriesList) — run multiple queries in a single job with automatic deduplication by post ID.

👥 Who Uses This

🏢 Brand and Market Researchers

You need to know what real people say about your product, competitors, or industry — not curated press releases, but unfiltered community discussion. Reddit is where honest opinions live. This actor lets you monitor multiple brand terms or competitor names in one run, deduplicated and ready for sentiment analysis.

Typical input:

{
"mode": "search",
"searchQueriesList": ["YourBrand review", "CompetitorA vs CompetitorB", "best CRM 2025"],
"searchSort": "top",
"timeFilter": "year",
"maxResults": 500,
"includeComments": true
}

Run this on a schedule (daily or weekly via Apify schedules) to track brand sentiment shifts over time without touching the Reddit website.


💻 NLP and ML Engineers

You need topic-specific text at scale — Reddit comments and posts for training classifiers, fine-tuning embeddings, building sentiment models, or labeling datasets. The structured output (author, score, depth, timestamp) gives you signal for quality filtering without post-processing.

Collect training data from multiple subreddits:

{
"mode": "subreddit_posts",
"subreddits": ["MachineLearning", "LocalLLaMA", "datascience", "learnmachinelearning"],
"sort": "top",
"timeFilter": "year",
"maxResults": 2000,
"includeComments": true
}

Filter by score (high-upvote posts = community-validated content) and depth (top-level comments = more coherent standalone text). The userContentType field on user profile mode lets you pull comment-only output for dialogue dataset construction.


🛠️ Product Teams and Startups

You want to understand what problems your target market is describing in their own words — not survey responses, but organic complaints, feature requests, and workaround threads. Reddit search across the right subreddits is a fast way to do Jobs-to-Be-Done research before writing a single line of code.

Discovery research across communities:

{
"mode": "search",
"searchQueriesList": ["wish there was a tool for", "looking for software that", "does anyone know how to automate"],
"searchSubreddit": "entrepreneur",
"searchSort": "relevance",
"maxResults": 200
}

Use batch search to sweep multiple pain-point queries across a single subreddit or across all of Reddit. Export to CSV for tagging and clustering in a spreadsheet.


📰 Social Media Analysts and Journalists

You're tracking narratives, investigating communities, or mapping how opinions shift around a topic over time. Reddit's threaded comment structure and upvote system give you signal on consensus and dissent that flat social feeds don't provide.

Pull full comment trees from key posts:

{
"mode": "post_comments",
"postUrls": [
"https://www.reddit.com/r/politics/comments/abc123/some_breaking_story/",
"https://www.reddit.com/r/technology/comments/def456/another_post/"
],
"maxCommentsPerPost": 1000
}

Use user_profile mode to audit a specific account's post and comment history across subreddits — useful for investigating astroturfing, coordinated behavior, or tracking how a public figure's community engagement evolves.


🤖 AI/LLM Engineers and Agent Builders

You're building AI pipelines that need real-time access to community knowledge — RAG systems grounded in current Reddit discussions, agents that can search subreddits on demand, or workflows that pull fresh posts into an LLM context window.

MCP tool config for Claude Desktop / Cursor:

{
"mcpServers": {
"reddit-scraper": {
"url": "https://mcp.apify.com?tools=labrat011/reddit-scraper",
"headers": {
"Authorization": "Bearer <APIFY_TOKEN>"
}
}
}
}

Once configured, your AI agent can call reddit-scraper as a tool to search any subreddit, pull comment threads, or monitor user activity — no infrastructure to manage. Combine with other actors in the healthcare or finance cluster for multi-source research pipelines.


Features

  • 4 scraping modes: subreddit posts, Reddit search, user profiles, post comments
  • Batch search: run multiple search queries in a single job — results merged and deduplicated by post ID
  • Multi-target: subreddits, usernames, and post URLs all accept lists — scrape many at once
  • Sort and filter: hot, new, top (with configurable time range), rising
  • Full comment trees: recursive extraction with depth tracking
  • Search scope: across all of Reddit or restricted to a single subreddit
  • User profiles: posts only, comments only, or both
  • Pagination: automatic via Reddit's after cursor
  • Rate limiting: 7s between requests to stay under Reddit's unauthenticated limits
  • Retry logic: exponential backoff on 429, proxy rotation on 403
  • State persistence: survives Apify actor migrations mid-run

Scraping modes

Mode 1: Subreddit Posts

Scrape posts from one or more subreddits.

{
"mode": "subreddit_posts",
"subreddits": ["python", "machinelearning", "webdev"],
"sort": "top",
"timeFilter": "month",
"maxResults": 200
}

Sort options: hot, new, top, rising. timeFilter applies only when sort is top: hour, day, week, month, year, all.


Mode 2: Search Reddit

Search across all of Reddit or within a specific subreddit. Use searchQueriesList to run multiple queries in one job.

Single query:

{
"mode": "search",
"searchQuery": "best python web framework 2025",
"searchSort": "relevance",
"maxResults": 100
}

Batch search (v1.1.0):

{
"mode": "search",
"searchQueriesList": ["ChatGPT vs Claude", "best LLM 2025", "AI coding assistant"],
"searchSort": "top",
"timeFilter": "year",
"maxResults": 300
}

Results across all queries are merged and deduplicated by post ID. searchQueriesList overrides searchQuery when provided.

Restricted to a subreddit:

{
"mode": "search",
"searchQuery": "fastapi vs django",
"searchSubreddit": "python",
"searchSort": "top",
"maxResults": 50
}

Search sort options: relevance, hot, top, new, comments.


Mode 3: User Profile

Scrape posts and/or comments from Reddit user profiles.

{
"mode": "user_profile",
"usernames": ["user1", "user2"],
"userContentType": "overview",
"maxResults": 200
}

Content type options: overview (posts + comments), submitted (posts only), comments (comments only).


Mode 4: Post Comments

Extract the full comment tree from specific Reddit posts.

{
"mode": "post_comments",
"postUrls": [
"https://www.reddit.com/r/Python/comments/1r19hu1/after_25_years_using_orms_i_switched_to_raw/",
"https://www.reddit.com/r/machinelearning/comments/abc123/some_post/"
],
"maxCommentsPerPost": 500
}

Input parameters

ParameterTypeDefaultDescription
modestringsubreddit_postsScraping mode: subreddit_posts, search, user_profile, post_comments
subredditsstring[]Subreddit names (without r/ prefix). Mode: subreddit_posts
sortstringhotSort order: hot, new, top, rising
timeFilterstringweekTime range for Top sort: hour, day, week, month, year, all
searchQuerystringSingle search term. Mode: search
searchQueriesListstring[][]Multiple search queries — merged and deduplicated. Overrides searchQuery. Mode: search
searchSubredditstringRestrict search to one subreddit. Leave empty for all of Reddit
searchSortstringrelevanceSearch sort: relevance, hot, top, new, comments
usernamesstring[]Reddit usernames (without u/ prefix). Mode: user_profile
userContentTypestringoverviewoverview (posts+comments), submitted, comments
postUrlsstring[]Full Reddit post URLs. Mode: post_comments
maxCommentsPerPostinteger100Max comments per post. 0 = no limit
maxResultsinteger100Max total results (1–10,000). Free tier: 25 per run
includeCommentsbooleanfalseAlso fetch comments for each post in subreddit/search mode. Slower, higher proxy cost
proxyConfigurationobjectResidentialProxy settings. Residential proxies required

Output

Results are saved to the default dataset. Download as JSON, CSV, Excel, or XML from the Output tab.

Post fields

FieldTypeDescription
typestringAlways "post"
idstringReddit post ID
subredditstringSubreddit name
titlestringPost title
authorstringAuthor username
selftextstringPost body text (empty for link posts)
urlstringReddit permalink
externalUrlstringLinked URL (for link posts)
scoreintegerNet upvotes
upvoteRatiofloatUpvote percentage (0.0–1.0)
numCommentsintegerTotal comment count
createdstringISO 8601 UTC timestamp
isNSFWbooleanNSFW flag
isSpoilerbooleanSpoiler flag
isPinnedbooleanStickied/pinned flag
flairstringPost flair text
awardsintegerTotal awards received
domainstringLink domain
isVideobooleanVideo post flag
thumbnailstringThumbnail URL

Comment fields

FieldTypeDescription
typestringAlways "comment"
idstringComment ID
postIdstringParent post ID
subredditstringSubreddit name
authorstringAuthor username
bodystringComment text
scoreintegerNet upvotes
createdstringISO 8601 UTC timestamp
parentIdstringParent comment or post ID
depthintegerNesting depth (0 = top-level)
isSubmitterbooleanWhether author is the post's OP
awardsintegerTotal awards received
urlstringReddit permalink

Example output

{
"type": "post",
"id": "1r19hu1",
"subreddit": "Python",
"title": "After 25 years using ORMs, I switched to raw SQL",
"author": "example_user",
"selftext": "Here's what I learned after making the switch...",
"url": "https://www.reddit.com/r/Python/comments/1r19hu1/...",
"externalUrl": "",
"score": 1842,
"upvoteRatio": 0.97,
"numComments": 312,
"created": "2025-03-01T09:14:22+00:00",
"isNSFW": false,
"isSpoiler": false,
"isPinned": false,
"flair": "Discussion",
"awards": 5,
"domain": "self.Python",
"isVideo": false,
"thumbnail": "self"
}

Cost

This actor uses pay-per-event (PPE) pricing — you pay only for results you get.

  • Proxy traffic is billed separately (residential proxies run ~$12.50/GB on Apify)
  • Typical cost: $0.50–$1.00 per 1,000 results depending on proxy usage and whether comments are included
  • Free tier: 25 results per run — no subscription required
  • Paid tier: up to 10,000 results per run

Reddit's rate limits mean roughly 8–10 requests per minute. A 100-post subreddit run takes 1–2 minutes. Enabling includeComments multiplies run time by the average number of comments per post.


MCP Integration

This actor works as an MCP tool via Apify's hosted MCP server. No custom server needed — AI agents can call it directly.

  • Endpoint: https://mcp.apify.com?tools=labrat011/reddit-scraper
  • Auth: Authorization: Bearer <APIFY_TOKEN>
  • Transport: Streamable HTTP
  • Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI

Claude Desktop / Cursor config:

{
"mcpServers": {
"reddit-scraper": {
"url": "https://mcp.apify.com?tools=labrat011/reddit-scraper",
"headers": {
"Authorization": "Bearer <APIFY_TOKEN>"
}
}
}
}

AI agents can search Reddit for discussions, scrape subreddit posts, pull comment threads, and monitor user activity — all as a callable tool without managing any infrastructure.


Technical details

  • Uses old.reddit.com JSON endpoints — no API credentials, no OAuth, no browser rendering
  • Rate limited to ~10 requests/minute (7-second interval between requests)
  • Exponential backoff on 429 rate limit responses (30s base, doubles per retry)
  • Proxy rotation on 403 IP blocks
  • Pagination via Reddit's after cursor (up to ~1,000 items per listing)
  • Results pushed in batches of 25 for memory efficiency
  • Actor state persisted across Apify platform migrations

Limitations

  • Reddit caps unauthenticated listing pagination at roughly 1,000 items per subreddit/user endpoint
  • "Load more comments" nodes in deep comment trees are not expanded — only the initially loaded tree is extracted
  • Datacenter proxies will not work — Reddit has blocked nearly all datacenter IP ranges since mid-2025. Residential proxies are required.
  • High-volume runs (1,000+ results) take time due to Reddit's rate limits

FAQ

Web scraping of publicly available data is generally legal, as established by the hiQ Labs v. LinkedIn ruling. This actor only accesses public Reddit content visible to any anonymous browser visitor. It does not bypass login walls, CAPTCHAs, or access private content.

Why are residential proxies required?

Reddit blocks nearly all datacenter IP ranges. Residential proxies route requests through real ISP connections that Reddit does not filter. Without them, most requests will return 403s.

How does batch search work?

Set searchQueriesList to an array of query strings. The actor runs each query sequentially and merges results into a single dataset, removing duplicate posts (matched by Reddit post ID). This is useful for brand monitoring (track multiple product names in one run), competitive research, or collecting data across related topics.

Can I use this with the Apify API?

Yes. Call the actor via the Apify REST API and poll for results, or use the Apify Python or JavaScript client libraries. Results are available in JSON, CSV, Excel, and XML formats.

What happens if a subreddit, user, or post URL doesn't exist?

The scraper logs a warning and skips the invalid target. All remaining valid targets in the same run continue as normal.


ActorWhat it doesPairs well with Reddit Scraper when...
Academic Paper ScraperGoogle Scholar, Semantic Scholar, arXivYou find a paper discussed on Reddit and want the full metadata and abstract
PubMed Scraper35M+ biomedical abstracts from NCBIr/science or health subreddit posts reference medical studies you want to retrieve
Clinical Trials ScraperClinicalTrials.gov study dataReddit health communities discuss ongoing trials you want to track
LinkedIn Jobs ScraperJob postings and company dataYou monitor r/cscareerquestions or industry subreddits and want matching job listings
NPI Provider Contact FinderHealthcare provider directoryHealth subreddit discussions lead to provider lookup needs

Feedback

Found a bug or have a feature request? Open an issue on the Issues tab in Apify Console.