Reddit Scraper avatar

Reddit Scraper

Pricing

from $0.60 / 1,000 posts

Go to Apify Store
Reddit Scraper

Reddit Scraper

Extract posts, comments, user profiles, and search results from Reddit. Pure HTTP, no API key required.

Pricing

from $0.60 / 1,000 posts

Rating

0.0

(0)

Developer

Arnas

Arnas

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

2 days ago

Last modified

Share

A no-API-key Reddit scraper covering posts, comments, users, and search — input/output/pricing snapshot-compatible with automation-lab/reddit-scraper as of the published version date. Pure HTTP, no headless browser, three output formats (default JSON, OpenAI fine-tune JSONL, RAG-ready markdown).

What does Reddit Scraper do?

Reddit Scraper extracts structured data from Reddit at $1.15 per 1,000 posts (FREE tier, scaling down to $0.28 per 1,000 at DIAMOND). Reddit has 1.7 billion monthly visits and 100,000+ active communities, making it the largest public discussion platform on the web. This actor scrapes posts, comments, search results, and user profiles from any public subreddit. Just paste any Reddit URL or enter a search query and get clean JSON, CSV, or Excel output. No Reddit account or API key needed.

It supports subreddit listings (hot, new, top, rising), individual posts with nested comments, user submission history, and full-text search across all of Reddit or within a specific subreddit.

Built on pure HTTP requests (no browser), it runs fast and keeps costs low.

Who is Reddit Scraper for?

  • Researchers — collect public opinion data, survey sentiment on topics, build datasets for academic studies
  • Market analysts — track brand mentions, product feedback, and competitor discussions across subreddits
  • SEO & content marketers — discover trending topics, find content ideas, monitor keyword discussions
  • AI/ML engineers — gather training data, build sentiment analysis datasets, feed LLM pipelines with real conversations
  • Journalists — monitor communities for breaking stories, track public reactions to events
  • Product managers — collect user feedback from product subreddits, track feature requests and bug reports
  • Lead generation teams — find potential customers asking for solutions your product solves
  • Social listening agencies — monitor Reddit alongside other platforms for brand and reputation tracking

Why use Reddit Scraper?

  • Posts + comments in one actor — no need to run separate scrapers
  • All input types — subreddits, posts, users, search queries, or just paste any Reddit URL
  • Pure HTTP — no browser, low memory, fast execution
  • Clean, AI-ready output — three formats including OpenAI fine-tune JSONL and RAG markdown
  • Pagination built in — scrape hundreds or thousands of posts automatically
  • Pay only for results — pay-per-event pricing, no monthly subscription
  • No API key required — works without Reddit developer credentials
  • Keyword filtering — filter results to only keep posts matching specific terms (filtered posts are not charged)

What data can you extract from Reddit?

Post fields

FieldDescription
titlePost title
authorReddit username
subredditSubreddit name
scoreNet upvotes
upvoteRatioUpvote percentage (0-1)
numCommentsComment count
createdAtISO 8601 timestamp
urlFull Reddit URL
selfTextPost body text
linkExternal link (for link posts)
domainLink domain
isVideo, isSelf, isNSFW, isSpoiler, isStickiedContent flags
linkFlairTextPost flair
totalAwardsAward count
subredditSubscribersSubreddit size
imageUrlsExtracted image URLs (gallery- and preview-aware, HTML-decoded)
thumbnailThumbnail URL
scrapedAtWhen this actor scraped it

Comment fields

FieldDescription
authorCommenter username
bodyComment text
scoreNet upvotes
createdAtISO 8601 timestamp
depthNesting level (0 = top-level)
isSubmitterWhether commenter is the post author
parentIdParent comment/post ID
repliesNumber of direct replies
postIdParent post ID
postTitleParent post title
scrapedAtWhen this actor scraped it

How much does it cost to scrape Reddit?

Pay-per-event pricing — you pay only for what you scrape. No monthly subscription. The tier price applied to your run depends on your Apify subscription plan.

EventFREEBRONZESILVERGOLDPLATINUMDIAMOND
Actor start$0.003/run$0.003/run$0.003/run$0.003/run$0.003/run$0.003/run
Per post$0.00115$0.001$0.00078$0.0006$0.0004$0.00028
Per comment$0.000575$0.0005$0.00039$0.0003$0.0002$0.00014

At the FREE tier, that's roughly $1.15 per 1,000 posts or $0.58 per 1,000 comments.

AI format pricing note

When outputFormat is jsonl-finetune or rag-markdown, you are charged only for posts (not comments). Comments are bundled into the post record at no extra charge — cost-effective for large-scale training-set collection.

Real-world cost examples (FREE tier)

InputResultsApprox. durationApprox. cost
1 subreddit, 100 posts100 posts~30s~$0.12
5 subreddits, 50 posts each250 posts~75s~$0.30
1 post + 200 comments201 items~10s~$0.12
Search "AI", 100 results100 posts~30s~$0.12
1 subreddit, 5 posts + 3 comments each20 items~15s~$0.02

Times reflect typical observed runs on RESIDENTIAL proxy in 2026. Actual times depend on Reddit response latency and proxy session warm-up. The deployment runbook captures observed numbers per release; check there for current measurements.

How to scrape Reddit

  1. Open the actor input page.
  2. Add Reddit URLs to the Reddit URLs field — any of these formats work:
    • https://www.reddit.com/r/technology/
    • https://www.reddit.com/r/AskReddit/comments/abc123/post-title/
    • https://www.reddit.com/user/spez/
    • r/technology or just technology
  3. Or enter a Search query to search across Reddit.
  4. Set Max posts per source to control how many posts to scrape.
  5. Enable Include comments if you also want comment data.
  6. Click Start and wait for results.
  7. Download your data as JSON, CSV, or Excel from the Dataset tab.

Example input

{
"urls": ["https://www.reddit.com/r/technology/"],
"maxPostsPerSource": 100,
"sort": "hot",
"includeComments": false
}

Scraping a specific post with comments

{
"urls": ["https://www.reddit.com/r/technology/comments/abc123/some-post-title/"],
"includeComments": true,
"maxCommentsPerPost": 50,
"commentDepth": 3
}

Searching Reddit with keyword filtering

{
"searchQuery": "best project management tools",
"searchSubreddit": "productivity",
"sort": "relevance",
"timeFilter": "month",
"maxPostsPerSource": 50,
"filterKeywords": ["Notion", "Asana", "Monday"]
}

Input parameters

ParameterTypeDefaultDescription
urlsstring[]Reddit URLs to scrape (subreddits, posts, users, search URLs). Shortcut forms r/x and bare x are accepted as subreddit names.
searchQuerystringSearch Reddit for this query. Either urls or searchQuery must be provided.
searchSubredditstringLimit search to a specific subreddit.
sortenumhotSort order: hot, new, top, rising, relevance.
timeFilterenumweekTime filter for top and relevance: hour, day, week, month, year, all.
maxPostsPerSourceinteger100Max posts per subreddit/search/user. 0 = unlimited (capped by Reddit's ~1000-item-per-listing ceiling).
includeCommentsbooleanfalseAlso scrape comments for each post.
maxCommentsPerPostinteger100Max comments per post. Hard-capped at 1000 to bound proxy/compute cost.
commentDepthinteger3Max reply nesting depth (1-10).
filterKeywordsstring[][]Only keep posts containing at least one keyword (case-insensitive title/body match). Filtered posts are not charged.
outputFormatenumdefaultdefault (standard JSON), jsonl-finetune (OpenAI chat-format SFT records), rag-markdown (vector-DB-ready markdown documents).
maxRequestRetriesinteger5Retry attempts for failed requests (1-10).
proxyConfigurationobjectRESIDENTIALApify proxy. RESIDENTIAL is the default — Reddit aggressively blocks datacenter IP ranges in 2026. Override only if you understand the trade-off.

Output examples

Default format (post)

{
"type": "post",
"id": "1qw5kwf",
"title": "Reddit AMA highlights from this week",
"author": "Sandstorm400",
"subreddit": "technology",
"score": 18009,
"upvoteRatio": 0.92,
"numComments": 1363,
"createdAt": "2026-02-05T00:04:58.000Z",
"url": "https://www.reddit.com/r/technology/comments/1qw5kwf/...",
"permalink": "/r/technology/comments/1qw5kwf/...",
"selfText": "",
"link": "https://example.com/article",
"domain": "example.com",
"isVideo": false,
"isSelf": false,
"isNSFW": false,
"isSpoiler": false,
"isStickied": false,
"thumbnail": "https://external-preview.redd.it/...",
"linkFlairText": "Society",
"totalAwards": 0,
"subredditSubscribers": 17101887,
"imageUrls": [],
"scrapedAt": "2026-04-18T12:33:50.000Z"
}

Default format (comment)

{
"type": "comment",
"id": "m3abc12",
"postId": "1qw5kwf",
"postTitle": "Reddit AMA highlights from this week",
"author": "commenter123",
"body": "Phone addiction in teens is a serious issue.",
"score": 542,
"createdAt": "2026-02-05T01:15:00.000Z",
"permalink": "https://www.reddit.com/r/technology/comments/1qw5kwf/.../m3abc12",
"depth": 0,
"isSubmitter": false,
"parentId": "t3_1qw5kwf",
"replies": 12,
"scrapedAt": "2026-04-18T12:33:52.000Z"
}

jsonl-finetune format (one record per post, comments bundled in assistant)

{
"type": "finetune",
"messages": [
{ "role": "system", "content": "You are analyzing a Reddit discussion. Summarize the key viewpoints from the community." },
{ "role": "user", "content": "r/MachineLearning — What's the best approach for few-shot learning in 2026?\n\nI'm building a text classifier with only 50 labeled examples per class..." },
{ "role": "assistant", "content": "1. [user_a, score: 482] Try SetFit — fine-tunes sentence transformers on a handful of examples.\n2. [user_b, score: 217] LLM + chain-of-thought prompting with 5-10 examples..." }
],
"metadata": {
"postId": "abc123",
"subreddit": "MachineLearning",
"score": 1240,
"upvoteRatio": 0.97,
"numComments": 87,
"createdAt": "2026-03-15T10:22:00.000Z",
"url": "https://www.reddit.com/r/MachineLearning/...",
"domain": "reddit.com",
"isNSFW": false,
"linkFlairText": null
}
}

rag-markdown format (one self-contained markdown doc per post)

{
"type": "rag-chunk",
"chunkId": "reddit-MachineLearning-abc123",
"markdown": "# What's the best approach for few-shot learning in 2026?\n\n**Subreddit:** r/MachineLearning \n**Author:** u/researcher42 \n**Score:** 1240 (97% upvoted) \n\n## Top Comments\n\n### u/user_a (score: 482)\n\nTry SetFit...\n",
"metadata": {
"source": "reddit",
"postId": "abc123",
"subreddit": "MachineLearning",
"title": "What's the best approach for few-shot learning in 2026?",
"author": "researcher42",
"score": 1240,
"upvoteRatio": 0.97,
"numComments": 87,
"createdAt": "2026-03-15T10:22:00.000Z",
"url": "https://www.reddit.com/r/MachineLearning/comments/abc123/...",
"domain": "reddit.com",
"isNSFW": false,
"linkFlairText": null
}
}

Note about Console preview: the dataset preview in Apify Console renders the default post schema. When outputFormat is jsonl-finetune or rag-markdown, the records still land in the posts dataset but with their respective shapes — most preview columns will appear empty. Use the JSON or JSONL exports to inspect the actual records.

How to monitor Reddit for brand mentions

  1. Search for brand-name mentions:
    { "searchQuery": "YourBrandName", "sort": "new", "timeFilter": "day", "maxPostsPerSource": 100 }
  2. Monitor specific product subreddits with comments enabled:
    {
    "urls": ["https://www.reddit.com/r/YourProductSubreddit/new/", "https://www.reddit.com/r/CompetitorSubreddit/new/"],
    "maxPostsPerSource": 50,
    "includeComments": true,
    "maxCommentsPerPost": 20
    }
  3. Schedule the actor to run daily via Apify's built-in scheduler.
  4. Pipe the output to Slack/email/your CRM via Apify integrations.
  5. Filter for sentiment signals with filterKeywords.

How to use Reddit data for AI and LLM workflows

jsonl-finetune for supervised fine-tuning

Each record is a single training example in OpenAI chat format (system / user / assistant). The assistant message is the top K comments by score (K = min(maxCommentsPerPost, available)), formatted as a ranked list with score + author.

Use cases: fine-tuning domain-specific chatbots, building Q&A models on niche topics, creating instruction-tuning datasets from community knowledge.

rag-markdown for vector-DB ingestion

Each record is a self-contained markdown document with metadata header + top K comments as ### u/<author> (score: N) H3 sections. The chunkId follows the stable format reddit-{subreddit}-{postId} for deduplication and upsert into Pinecone, Weaviate, Chroma, or Qdrant.

Use cases: building domain-specific RAG systems, enriching knowledge bases with community knowledge, powering chatbots that answer from Reddit discussions.

Quality filtering tips

When building training data or RAG corpora from Reddit, filter for quality:

  • Score threshold — keep only metadata.score >= 50
  • Upvote ratiometadata.upvoteRatio >= 0.85 excludes controversial posts
  • NSFW filtermetadata.isNSFW === false for general-purpose datasets
  • Time filtersort: "top" + timeFilter: "year" gets the highest-quality content from the past year

Reddit data export: CSV, Excel, JSON

Apify datasets support JSON, CSV, Excel, XML, and HTML export formats. Use the dataset export buttons in Console or the dataset URL programmatically:

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv" \
-H "Authorization: Bearer YOUR_API_TOKEN" > reddit_posts.csv

How to scrape Reddit without getting blocked

The actor handles Reddit's rate limits and anti-bot protections automatically:

  • Reactive 429 backoff — when Reddit returns 429, the actor honors the Retry-After header (or falls back to exponential backoff) before retrying. Persistent blocks trigger session retirement (= new proxy IP).
  • Real-Chrome User-Agent — Reddit filters generic UAs; the actor sends a real-Chrome-shaped UA + the actor's identifier suffix.
  • RESIDENTIAL proxy default — Reddit blocks datacenter IPs aggressively in 2026; residential is the practical baseline.
  • Pure HTTP — no browser fingerprinting to mismatch.

For very high-volume scraping (tens of thousands of posts per hour), split the workload across multiple smaller runs.

How to use Reddit Scraper with the API

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/reddit-scraper').call({
urls: ['https://www.reddit.com/r/technology/'],
maxPostsPerSource: 100,
sort: 'hot',
includeComments: false,
});
const { items } = await client.dataset(run.namedDatasetIds.posts).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('automation-lab/reddit-scraper').call(run_input={
'urls': ['https://www.reddit.com/r/technology/'],
'maxPostsPerSource': 100,
'sort': 'hot',
'includeComments': False,
})
items = client.dataset(run['namedDatasetIds']['posts']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~reddit-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"urls": ["https://www.reddit.com/r/technology/"],
"maxPostsPerSource": 100,
"sort": "hot"
}'

Use with AI agents via MCP

Reddit Scraper is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Setup for Claude Code

claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/reddit-scraper"

Setup for Claude Desktop, Cursor, or VS Code

{
"mcpServers": {
"apify": { "url": "https://mcp.apify.com?tools=automation-lab/reddit-scraper" }
}
}

Your AI assistant will use OAuth to authenticate with your Apify account on first use. Then ask:

  • "Get the top 100 posts from r/technology this month"
  • "Scrape comments from this Reddit thread"
  • "Search Reddit for discussions about 'AI coding'"

Integrations

  • Google Sheets — auto-export Reddit posts and comments to a spreadsheet
  • Slack / Discord — get notifications when scraping finishes or when posts match keywords
  • Zapier / Make — trigger workflows on new Reddit data
  • Webhooks — send results to your own API endpoint
  • Scheduled runs — daily/weekly subreddit monitoring via Apify scheduler
  • Data warehouses — pipe data to BigQuery, Snowflake, or PostgreSQL

Scraping publicly available data from Reddit is generally considered legal:

  • Public data only. This actor only accesses publicly available posts and comments. It does not log in, bypass authentication, or access private content.
  • Legal precedent. The US Ninth Circuit ruling in hiQ Labs v. LinkedIn (2022) established that scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA).
  • No personal-data extraction. Usernames are public pseudonyms on Reddit; the actor does not attempt to deanonymize users or collect private information.
  • Terms of Service. Reddit's ToS restricts automated access, but ToS violations are a contractual matter, not a criminal one.
  • GDPR. If you scrape data that includes EU users, ensure your use case complies with GDPR. Aggregated, anonymized analysis is generally safe; storing individual user data for profiling may require additional compliance steps.

This information is for educational purposes and does not constitute legal advice.

FAQ

Can I scrape any subreddit? Yes, as long as it's public. Private subreddits return 403 and are skipped.

Does it scrape NSFW content? Yes, NSFW posts are included by default with the isNSFW: true flag. Filter them client-side if you want them excluded.

How many posts can I scrape? Set maxPostsPerSource: 0 for unlimited, capped by Reddit's ~1000-post pagination ceiling per listing. For more, use search with multiple time filters (e.g., timeFilter: month then year) to access older content.

What happens if Reddit rate-limits me? The actor reads Retry-After headers when present, falls back to exponential backoff otherwise, and retires sessions on persistent blocks. No configuration needed.

Can I export to CSV or Excel? Yes. Apify datasets support JSON, CSV, Excel, XML, and HTML formats.

The scraper returns fewer posts than I expected. Reddit's pagination caps at ~1000 posts per listing. For deeper history, use search with different time filters.

I'm getting 403 errors for a subreddit. The subreddit is private, quarantined, or banned. Check in an incognito browser — if you can't see it there, the scraper can't either.

Can I use filterKeywords to narrow search results? Yes. Set filterKeywords to an array of terms; only posts whose title or body contains at least one keyword are kept. Filtered posts are not charged.

How do I scrape a user's post history? Paste the user's profile URL (e.g., https://www.reddit.com/user/spez/) into the URLs field.

Does it handle deleted or removed posts? Deleted authors appear as [deleted]; removed bodies appear as [removed]. Records are emitted, not dropped.

What proxy should I use? RESIDENTIAL is the default and recommended for any sustained run. DATACENTER is offered as a low-cost option but Reddit blocks datacenter IPs aggressively in 2026 — expect 429s and dropped sources for runs above ~100 posts on DATACENTER.

Do AI formats (jsonl-finetune, rag-markdown) require comments? They work without comments, but the assistant message / comments section will be a placeholder. For meaningful AI training data, set includeComments: true.