Pricing

from $0.50 / 1,000 results

Reddit Scraper - Posts, Comments, Subreddits & Users

Fast, reliable Reddit scraper. Extract posts, comments, subreddits & users from any subreddit without Reddit API keys or login. AI-ready JSON for LLM training, sentiment analysis, lead generation. Export JSON/CSV/Excel.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

deusex machine

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

Reddit Scraper — Posts, Comments, Subreddits & Users API

Reddit Scraper is a fast, reliable Reddit data extraction tool that lets you scrape posts, comments, subreddits, and users from any subreddit or Reddit search query — without a Reddit API key, without login, and without rate limits. Extract structured JSON data ready for LLM pipelines, AI training, sentiment analysis, lead generation, market research, and brand monitoring.

Unlike the official Reddit API, this scraper has no OAuth setup, no app registration, no 60-requests-per-minute cap, and no 10-post limit per subreddit listing. Just give it a list of subreddits or a search query and it returns clean, normalized Reddit data in JSON, CSV, or Excel.

29 post fields · 10 comment fields · Nested comment threads · Image galleries · Flairs · Subreddit stats · Search any query · Export to JSON/CSV/Excel

Why use this Reddit scraper?

The Reddit API is powerful, but it's also:

Rate-limited: 60 requests per minute per OAuth client, 10 posts per listing page.
Authenticated: Requires app registration, OAuth flow, and token refresh logic.
Incomplete: Doesn't return all fields the public web UI shows (upvote ratios, galleries, media metadata).
Unpredictable: Terms-of-service changes have repeatedly broken third-party Reddit API consumers.

This Reddit scraper gives you:

✅ No Reddit API key needed — extracts Reddit data from public JSON endpoints.
✅ No login, no OAuth — scrape Reddit anonymously, no credentials to manage.
✅ High throughput — 1,000+ Reddit posts per run, concurrent subreddits, session persistence.
✅ Rich Reddit data — 29 fields per post, 10 fields per comment, nested replies up to depth 3.
✅ AI-ready JSON output — plug directly into LLM pipelines, RAG systems, or sentiment analysis models.
✅ Multiple export formats — JSON, CSV, Excel (XLSX), or direct API access via the Apify Dataset API.
✅ 99%+ success rate — automatic proxy rotation, retries, and session reuse keep your scraping jobs stable.

What Reddit data does this scraper extract?

Every Reddit post is returned with 29 normalized fields covering text content, media, scoring, flairs, subreddit metadata, and timestamps. When comments are enabled, the scraper also returns the nested comment tree as structured JSON.

Post fields (29)

Field	Description
`id`	Reddit post ID (e.g. `t3_1s6e3dp`)
`subreddit`	Subreddit name (e.g. `technology`)
`title`	Post title
`author`	Reddit username of the post author
`score`	Net upvotes minus downvotes
`upvoteRatio`	Upvote ratio (e.g. `0.95` = 95% upvotes)
`numComments`	Total comment count
`url`	Reddit permalink to the post
`selftext`	Post body for text posts (up to 5,000 chars)
`thumbnail`	Thumbnail preview URL
`imageUrls`	All image URLs from galleries and image posts
`media`	Video URL + duration, or image URL
`created`	Post creation time (ISO 8601)
`edited`	Last edit timestamp, or `false`
`isVideo`	Video post flag
`isSelf`	Text post (`true`) vs link post (`false`)
`isGallery`	Multi-image gallery post
`domain`	Source domain (e.g. `youtube.com`, `self.technology`)
`linkUrl`	External URL for link posts
`flair`	Post flair (e.g. `Discussion`, `News`, `Privacy`)
`awards`	Total Reddit awards
`isNSFW`	NSFW flag
`isSpoiler`	Spoiler flag
`isPinned`	Pinned/stickied by mods
`numCrossposts`	Times crossposted to other subreddits
`subredditSubscribers`	Subreddit subscriber count
`postType`	Classification: `text`, `link`, `video`, `image`, `gallery`
`scrapedAt`	Scraping timestamp (ISO 8601)
`comments`	Array of comments (when enabled)

Comment fields (10)

Field	Description
`id`	Comment ID
`author`	Commenter Reddit username
`body`	Comment text (up to 2,000 chars)
`score`	Net upvotes
`created`	Timestamp (ISO 8601)
`depth`	Nesting level (0 = top-level, 1 = reply, 2 = reply-to-reply)
`isSubmitter`	Whether the commenter is the post author
`parentId`	Parent comment or post ID
`controversiality`	Controversy flag (0 or 1)
`replies`	Number of direct replies

Using this data in AI agents & LLM pipelines

This scraper outputs clean JSON that drops straight into LLM context windows. Pipe the dataset into GPT-5, Claude Opus, or any embedding model for summarization, sentiment, classification, or RAG indexing. The 5,000-char selftext cap plus nested comments keep you well within token budgets for most Reddit posts.

Looking for a native Model Context Protocol (MCP) server for Claude Desktop, Cursor, ChatGPT Desktop, Codex, or the OpenAI Agents SDK? Use our dedicated MCP actor: makework36/reddit-mcp-server — seven Reddit tools exposed over MCP Streamable HTTP, no glue code required.

Use cases

Reddit data powers some of the most valuable public datasets for AI, research, and market intelligence. Here's how teams use this Reddit scraper:

1. AI & LLM training data

Reddit posts and comments are a gold mine for training conversational AI, instruction-tuning LLMs, and building RAG systems. The scraper outputs clean JSON that drops straight into your embedding pipeline. Use the searchQuery input to filter for domain-specific Reddit data (e.g. medical, legal, finance).

2. Sentiment analysis & brand monitoring

Extract Reddit posts and comments mentioning your brand, product, or competitors. Feed them into VADER, RoBERTa, or an LLM and track sentiment trends over time. The scraper's comment threading means you capture full discussion context, not isolated quotes.

3. Lead generation

Search Reddit for people asking for solutions your product solves. The scraper supports filters like searchQuery: "best CRM for small business" or sort: top, timeFilter: month to find high-intent prospects. Combine with the author field to build contact lists.

4. Market research

Monitor entire subreddits (e.g. r/smallbusiness, r/saas, r/entrepreneur) for trending topics, pain points, and recurring questions. The scraper returns subreddit subscriber counts, post engagement, and time-based filters so you can segment by reach and recency.

5. Academic research

Researchers use Reddit data for computational social science, public health monitoring, and linguistic analysis. This scraper gives you reproducible, timestamp-stamped Reddit data without the friction of the Reddit API's OAuth flow.

6. Content discovery & trend spotting

Journalists, newsletter writers, and content marketers scrape Reddit to surface emerging topics before they hit mainstream media. Sort by rising or top/day to catch conversations at the right moment.

7. Competitor intelligence

Scrape Reddit discussions about competitor products to extract feature requests, complaints, and comparison threads. The flair, numComments, and upvoteRatio fields help you prioritize signal over noise.

How to use the Reddit scraper

Example 1 — Scrape hot posts from multiple subreddits

{
    "subreddits": ["technology", "programming", "webdev"],
    "maxPosts": 50,
    "sort": "hot"
}

maxPosts applies per subreddit — this returns up to 150 Reddit posts total across the 3 subreddits.

Example 2 — Search across all of Reddit

{
    "searchQuery": "best CRM for small business",
    "maxPosts": 100,
    "sort": "top",
    "timeFilter": "month"
}

This searches Reddit globally for posts matching your query, sorted by top scores from the past month.

Example 3 — Get Reddit posts with nested comments

{
    "subreddits": ["AskReddit"],
    "maxPosts": 25,
    "sort": "top",
    "timeFilter": "week",
    "includeComments": true,
    "maxCommentsPerPost": 20
}

This returns 25 top posts from r/AskReddit this week, each with up to 20 nested comments (including replies up to depth 3).

Example 4 — Scrape Reddit from Python

from apify_client import ApifyClient

client = ApifyClient("<YOUR_APIFY_TOKEN>")

run_input = {
    "subreddits": ["MachineLearning", "LocalLLaMA"],
    "maxPosts": 200,
    "sort": "top",
    "timeFilter": "week",
    "includeComments": True,
    "maxCommentsPerPost": 25,
}

run = client.actor("makework36/reddit-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], "·", item["score"], "upvotes")

Example 5 — Scrape Reddit from Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: '<YOUR_APIFY_TOKEN>' });

const run = await client.actor('makework36/reddit-scraper').call({
    searchQuery: 'apify reddit scraper',
    maxPosts: 50,
    sort: 'relevance',
    timeFilter: 'all',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} Reddit posts`);

Example 6 — Trigger a Reddit scraping run from cURL

curl -X POST "https://api.apify.com/v2/acts/makework36~reddit-scraper/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "subreddits": ["news"],
    "maxPosts": 10,
    "sort": "new"
  }'

Input parameters

Parameter	Type	Default	Description
`subreddits`	array	`[]`	List of subreddit names to scrape (no `r/` prefix)
`searchQuery`	string	—	Search Reddit globally for this term
`maxPosts`	integer	`50`	Max posts per subreddit (1-500)
`sort`	string	`hot`	Sort order: `hot`, `new`, `top`, `rising`, `relevance`
`timeFilter`	string	`day`	Time range for `top` / `relevance`: `hour`, `day`, `week`, `month`, `year`, `all`
`includeComments`	boolean	`false`	Fetch comments for each Reddit post
`maxCommentsPerPost`	integer	`10`	Comments per post (1-100), includes nested replies up to depth 3

Output example

Each item in the dataset is a single Reddit post as a JSON object. When includeComments: true, each post includes a comments array.

{
    "id": "t3_1s6gkmj",
    "subreddit": "pics",
    "title": "About 100,000 attended the No Kings protest in St. Paul, Minnesota",
    "author": "katotooo",
    "score": 24915,
    "upvoteRatio": 0.97,
    "numComments": 235,
    "url": "https://www.reddit.com/r/pics/comments/1s6gkmj/about_100000_attended_the_no_kings_protest/",
    "selftext": null,
    "thumbnail": "https://preview.redd.it/oy94fh2hnvrg1.jpg?width=140&height=93",
    "imageUrls": [
        "https://preview.redd.it/oy94fh2hnvrg1.jpg?width=3024&format=pjpg",
        "https://preview.redd.it/9pmbe5aqnvrg1.jpg?width=4032&format=pjpg"
    ],
    "media": null,
    "created": "2026-03-29T00:21:20.000Z",
    "edited": false,
    "isVideo": false,
    "isSelf": false,
    "isGallery": true,
    "domain": "old.reddit.com",
    "linkUrl": "https://www.reddit.com/gallery/1s6gkmj",
    "flair": "Politics",
    "awards": 0,
    "isNSFW": false,
    "isSpoiler": false,
    "isPinned": false,
    "numCrossposts": 2,
    "subredditSubscribers": 33336092,
    "postType": "gallery",
    "scrapedAt": "2026-03-29T08:05:31.904Z",
    "comments": [
        {
            "id": "od1rtqc",
            "author": "YJSubs",
            "body": "When I see Bernie, I thought, did you just identify him as bald eagle?",
            "score": 1,
            "created": "2026-03-29T00:21:22.000Z",
            "depth": 0,
            "isSubmitter": false,
            "parentId": "t3_1s6gkmj",
            "controversiality": 0,
            "replies": 1
        },
        {
            "id": "od1s2fp",
            "author": "rclonecopymove",
            "body": "Same, he's bald but not very aquiline.",
            "score": 1,
            "created": "2026-03-29T00:22:45.000Z",
            "depth": 1,
            "isSubmitter": false,
            "parentId": "t1_od1rtqc",
            "controversiality": 0,
            "replies": 0
        }
    ]
}

Export formats

The Reddit scraper writes every post to the run's dataset. You can download the data in several formats:

JSON — structured Reddit data, one object per post, ideal for AI / LLM pipelines.
CSV — flat Reddit data, great for Excel, Google Sheets, and BI tools.
Excel (XLSX) — ready-to-open spreadsheet with one row per Reddit post.
JSONL — newline-delimited JSON for streaming pipelines.
RSS / XML — alternative formats supported by the Apify Dataset API.
Direct API access — GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json

Nested comment trees are preserved in JSON export. In CSV / Excel export, comments are flattened to a JSON-encoded string in the comments column.

Performance & pricing

Scenario	Cost	Typical runtime
1,000 Reddit posts, no comments	~$0.016	~5-10 min
1,000 Reddit posts + 10 comments each	~$0.05	~12-18 min
3 subreddits × 50 posts	~$0.003	~40 sec
Full subreddit scrape (500 posts + 50 comments)	~$0.08	~15-25 min

Pricing follows the Apify Compute Unit model — you pay only for the compute used. No Reddit API subscription, no proxy add-ons, no hidden fees.

Tips to reduce cost:

Set includeComments: false if you only need post metadata.
Use a tight timeFilter (day or week) instead of all to avoid paginating deep archives.
Scrape specific subreddits instead of broad search queries when possible.

How it works (technical details)

Request routing — each subreddit or search query becomes a seeded request with its own session cookie and proxy IP.
Public JSON endpoints — the scraper hits Reddit's public .json endpoints (e.g. /r/{sub}/hot.json) which return the same data the Reddit web UI consumes. No Reddit API key required.
Proxy rotation — datacenter IPs are blocked by Reddit; the scraper rotates residential proxies and retries up to 12 times per request.
Session persistence — working proxy + session cookie pairs are reused across subreddits to reduce block rates.
Comment tree reconstruction — comment replies are fetched recursively and linked via parentId so you can rebuild the full thread.
Schema normalization — Reddit's raw JSON is mapped to 29 clean fields with consistent types and ISO 8601 timestamps.

Best practices

Respect Reddit's terms of service. This scraper accesses public data only. Do not use it to harvest private messages or bypass subreddit bans.
Throttle your concurrency on sensitive subreddits — the default is tuned for reliability, not maximum speed.
Cache results instead of re-scraping the same Reddit posts. Apify datasets are persistent.
Deduplicate by id when merging multiple runs — Reddit post IDs are globally unique.
Rebuild comment threads using parentId if you need the full conversation flow.

Troubleshooting

Q: Some Reddit requests return 403. Reddit aggressively blocks datacenter IPs. The scraper rotates proxies and retries automatically — 3-12 attempts per request is normal. If every request fails for a single subreddit, the subreddit may be private, banned, or quarantined.

Q: Comments are missing. Ensure includeComments: true and that maxCommentsPerPost is greater than 0. Some posts have 0 comments or locked comment sections.

Q: maxPosts returns fewer results than requested. maxPosts is a cap, not a guarantee. Low-traffic subreddits or narrow timeFilter windows may not have enough posts.

Q: Can I scrape user profiles? Yes — pass a subreddit name formatted as u_username (e.g. u_spez) to scrape posts from a user's profile page.

Q: Can I scrape Reddit data for AI training? Yes, that's one of the primary use cases. The JSON output is designed for direct ingestion into LLM pipelines.

Q: How do I export Reddit data to CSV? After the run finishes, open the dataset tab and click "Export → CSV". Or hit the Dataset API: GET /v2/datasets/{id}/items?format=csv.

FAQ

Does this Reddit scraper need a Reddit API key? No. It extracts Reddit data from public JSON endpoints with a stealth browser, no Reddit API key or OAuth required.

How is this different from the official Reddit API? The Reddit API has strict rate limits (60 req/min), requires OAuth, and caps listings at 10 posts per page. This scraper has no such limits, supports search and comment extraction out of the box, and returns richer data.

What does maxPosts mean? It's per subreddit, not global. 3 subreddits × 50 maxPosts = up to 150 Reddit posts.

How deep do comments go? Up to 3 levels: top-level (depth 0), replies (depth 1), replies-to-replies (depth 2). Each comment has a parentId so you can rebuild the thread.

Can I scrape NSFW subreddits? Yes, but results may include adult content. Use the isNSFW field to filter.

Is this legal? Scraping public Reddit data for research, journalism, and business intelligence is generally allowed under fair-use principles. Consult your legal team for your specific use case and review Reddit's User Agreement.

Can I schedule recurring Reddit scrapes? Yes. Use Apify's Scheduler to run this scraper hourly, daily, or weekly. Pair it with the Apify Webhooks to push new Reddit data to your own database.

Comparison — this Reddit scraper vs alternatives

Choosing the right Reddit scraper depends on what Reddit data you need, how often you need it, and whether you're willing to deal with the Reddit API's OAuth flow and rate limits. Here's an honest comparison:

Feature	This Reddit Scraper	Official Reddit API	Reddit PRAW library	Generic web scrapers
Reddit API key required	❌ No	✅ Yes (OAuth)	✅ Yes (OAuth)	❌ No
Rate limit	None	60 req/min	60 req/min	Varies
Comment threads	✅ Nested up to depth 3	✅ Full tree	✅ Full tree	❌ Usually not
Search across Reddit	✅ Yes	✅ Yes	✅ Yes	❌ Manual
Subreddit scraping	✅ Multi-subreddit, parallel	✅ One at a time	✅ One at a time	❌ Manual
Export to JSON / CSV / Excel	✅ All three	❌ JSON only	❌ Python objects	Varies
Maintenance burden	Apify handles it	You handle OAuth + retries	You handle OAuth + retries	You handle everything
Cost for 10K Reddit posts	~$0.50	Free but slow	Free but slow	Free + your dev time
Setup time	<1 minute	30-60 minutes	15 minutes	Hours or days

When to use this Reddit scraper:

You need Reddit data fast, without setting up OAuth.
You want structured, normalized JSON without writing parsers.
You're feeding Reddit data into an AI / LLM pipeline.
You want to scrape multiple subreddits in parallel.
You need CSV / Excel export for non-engineers.

When to use the official Reddit API instead:

You're building a Reddit bot that posts, votes, or messages.
You need real-time WebSocket events.
You're fine with 60-requests-per-minute and OAuth setup.

Integrations

The Reddit scraper integrates with everything the Apify platform supports. Popular integrations:

n8n — Reddit to anywhere automation

Trigger the Reddit scraper on a schedule and push Reddit data to Google Sheets, Airtable, Notion, Slack, or any database. Use the HTTP Request node with:

POST https://api.apify.com/v2/acts/makework36~reddit-scraper/run-sync-get-dataset-items?token=<YOUR_TOKEN>

Make (Integromat) — Reddit to CRM / email / Slack

The Apify module in Make lets you chain Reddit scraping with HubSpot, Mailchimp, Slack, or Discord. Great for social listening workflows that turn Reddit discussions into CRM leads.

Zapier — Reddit data into 5,000+ apps

Connect the Reddit scraper to Zapier's huge catalog of integrations: Airtable, Google Sheets, Notion, Trello, ClickUp, Salesforce, and more.

LangChain — Reddit data for RAG

Feed Reddit posts and comments into LangChain's document loaders:

from langchain_community.document_loaders import ApifyDatasetLoader
from langchain_core.documents import Document

loader = ApifyDatasetLoader(
    dataset_id="<RUN_DATASET_ID>",
    dataset_mapping_function=lambda d: Document(
        page_content=f"{d['title']}\n\n{d.get('selftext') or ''}",
        metadata={"subreddit": d["subreddit"], "score": d["score"], "url": d["url"]},
    ),
)

documents = loader.load()

LlamaIndex — Reddit knowledge base

Use LlamaIndex's Apify reader to index Reddit data into a vector store (Pinecone, Weaviate, Qdrant, Chroma) for semantic search over Reddit conversations.

OpenAI / Anthropic — Reddit-powered assistants

Pipe Reddit scrape results into GPT-4, GPT-5, Claude Opus, or Claude Sonnet for summarization, classification, or generative responses. The 5,000-char selftext cap plus nested comments keep you well within token budgets for most Reddit posts.

Apify Webhooks — push Reddit data on completion

Configure a webhook so that every time the Reddit scraper finishes a run, Apify POSTs the dataset URL to your endpoint. Ideal for event-driven pipelines.

Tutorial 1 — Build a Reddit sentiment dashboard

A practical end-to-end example showing how to turn scraped Reddit data into a live sentiment dashboard.

Step 1 — scrape Reddit posts for your brand.

{
    "searchQuery": "your-brand-name",
    "maxPosts": 500,
    "sort": "new",
    "timeFilter": "week",
    "includeComments": true,
    "maxCommentsPerPost": 25
}

Step 2 — score sentiment with an LLM.

from apify_client import ApifyClient
from openai import OpenAI

client = ApifyClient("<YOUR_APIFY_TOKEN>")
oai = OpenAI()

dataset = client.dataset("<DATASET_ID>")
for post in dataset.iterate_items():
    text = f"{post['title']}\n\n{post.get('selftext') or ''}"
    r = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Rate sentiment from -1 (negative) to 1 (positive). Return JSON: {\"score\": number}"},
            {"role": "user", "content": text},
        ],
    )
    print(post["subreddit"], post["score"], r.choices[0].message.content)

Step 3 — push to your dashboard.

Write results to PostgreSQL, Snowflake, or BigQuery. Visualize with Grafana, Metabase, or Superset. The scraper's scrapedAt field gives you a clean time axis.

Tutorial 2 — Reddit lead generation pipeline

Turn Reddit into a qualified leads engine by scraping subreddits where your buyers hang out.

Step 1 — identify target subreddits.

For a SaaS CRM: r/sales, r/smallbusiness, r/saas, r/Entrepreneur, r/crm.

Step 2 — scrape Reddit posts with intent signals.

{
    "subreddits": ["sales", "smallbusiness", "saas", "Entrepreneur", "crm"],
    "searchQuery": "recommendation OR looking for OR alternative to",
    "maxPosts": 200,
    "sort": "new",
    "timeFilter": "week"
}

Step 3 — classify with an LLM.

Use GPT-5 or Claude Opus 4.6 to extract: pain point, product category, buyer role, urgency. Push qualified Reddit leads to HubSpot, Salesforce, or Attio.

Step 4 — personalized outreach.

The Reddit username (author field) gives you a starting point for genuine, non-spammy engagement. Always comment or DM with real value — Reddit users detect and downvote cold outreach instantly.

Tutorial 3 — Train an LLM on Reddit data

High-quality Reddit conversations are excellent training data for instruction-tuned and conversational models.

Step 1 — scrape high-signal subreddits.

{
    "subreddits": ["explainlikeimfive", "AskHistorians", "AskScience", "AskReddit"],
    "maxPosts": 5000,
    "sort": "top",
    "timeFilter": "year",
    "includeComments": true,
    "maxCommentsPerPost": 50
}

Step 2 — filter for quality.

Keep only posts with score > 100 and top comments with score > 50. This removes low-signal Reddit data.

Step 3 — format as instruction pairs.

[
    {"instruction": "<post title>", "input": "<selftext>", "output": "<top comment body>"},
    ...
]

Step 4 — fine-tune.

Use LoRA / QLoRA on an open-weight base model, or ship the JSONL to OpenAI / Anthropic for supervised fine-tuning. Reddit data makes models sound more natural and conversational.

Sample Reddit data output

Below are three real examples of Reddit data returned by this scraper (anonymized).

Sample 1 — tech discussion post

{
    "id": "t3_abc123",
    "subreddit": "programming",
    "title": "Why I switched from Postgres to SQLite for my side project",
    "author": "indie_dev_42",
    "score": 3284,
    "upvoteRatio": 0.94,
    "numComments": 412,
    "flair": "Discussion",
    "postType": "text",
    "subredditSubscribers": 5100000
}

Sample 2 — image gallery post

{
    "id": "t3_def456",
    "subreddit": "EarthPorn",
    "title": "Sunrise over Torres del Paine, Patagonia",
    "author": "nature_photog",
    "score": 18920,
    "upvoteRatio": 0.99,
    "numComments": 78,
    "postType": "gallery",
    "imageUrls": ["https://preview.redd.it/....jpg"]
}

Sample 3 — video post with comments

{
    "id": "t3_ghi789",
    "subreddit": "nextfuckinglevel",
    "title": "Robot parkour demo from Boston Dynamics",
    "score": 42018,
    "upvoteRatio": 0.96,
    "media": {"videoUrl": "https://v.redd.it/....mp4", "duration": 47},
    "postType": "video",
    "comments": [{"author": "user1", "body": "...", "score": 2100, "depth": 0}]
}

Benchmarks

Measured on Apify's default Compute Unit, November 2025.

Workload	Posts	Comments	Total runtime	Cost	Throughput
Hot feed, `r/news`	50	0	18s	$0.002	~170 posts/min
Top of the week, `r/technology`	100	0	32s	$0.004	~190 posts/min
3 subreddits + 10 comments each	150	1,500	1m 12s	$0.008	~125 posts/min
Search query, `r/all`	200	0	55s	$0.007	~220 posts/min
Full archive scrape	500	25,000	14m 40s	$0.068	~34 posts/min

Cost optimization — advanced

The Reddit scraper is already among the cheapest Reddit data sources per post, but you can push it even lower:

Turn off comments when you don't need them. includeComments: false cuts cost by ~3x.
Scope timeFilter tightly. Scraping top/all pulls deep history; top/week is often enough.
Prefer subreddits over searchQuery when possible. Search pages are heavier than subreddit listings.
Schedule hourly instead of real-time. Reddit's new feed doesn't change fast enough to justify <15-min polling for most use cases.
Dedupe across runs. Store Reddit post IDs in Redis / DynamoDB and skip posts you've already scraped.
Cap maxCommentsPerPost smartly. 10-20 comments per post captures most of the signal; 100 is usually overkill.

Reddit scraping glossary

Subreddit — a community on Reddit, prefixed with r/ (e.g. r/programming).
Upvote / downvote — Reddit's voting system. The net of the two is the score.
Upvote ratio — percentage of votes that are upvotes. 0.95 means 95% upvotes.
Flair — tag attached to a Reddit post or user (e.g. Discussion, News, OC).
Karma — cumulative Reddit user reputation based on post and comment scores.
Crosspost — a Reddit post shared to another subreddit. Tracked via numCrossposts.
AMA — Ask Me Anything, a common Reddit format on r/IAmA and other subreddits.
ELI5 — Explain Like I'm 5, simplified explanations subreddit.
Selftext — the text body of a self-post on Reddit.
Stickied — a Reddit post pinned to the top of a subreddit by moderators.
NSFW — Not Safe For Work flag on Reddit posts.
Thread — a Reddit post and its comment tree as a whole.
OP — Original Poster, the author of the Reddit post. Tracked via isSubmitter on comments.

Security & privacy

This Reddit scraper extracts public Reddit data only. It does not access private messages, modmail, or private subreddits.
Reddit usernames returned in the author field are the same public handles visible on reddit.com.
No Reddit API credentials, OAuth tokens, or personal data are stored by Apify during scraping runs.
If you process Reddit data containing personal information (e.g. GDPR-subject data), ensure your pipeline complies with your local privacy regulations.
Respect Reddit's User Agreement and Content Policy.

Reddit public JSON API documentation — official Reddit API reference.
Apify Dataset API — programmatic access to scraped Reddit data.
Crawlee — the open-source scraping framework that powers this actor.
Apify Client for Python — call the Reddit scraper from Python.
Apify Client for JavaScript — call the Reddit scraper from Node.js.
Actor source code — MIT-licensed, PRs welcome.

Advanced Reddit search patterns

The searchQuery parameter uses Reddit's native search syntax. Some Reddit search operators that work:

Exact phrase search on Reddit

{ "searchQuery": "\"Reddit scraper\"", "sort": "relevance" }

Double quotes make the Reddit search match the exact phrase instead of individual terms.

Subreddit-scoped search

{ "searchQuery": "subreddit:saas pricing", "sort": "relevance" }

Limits the Reddit search to a single subreddit. Great for narrow market research.

Author-scoped search

{ "searchQuery": "author:spez announcement" }

Returns Reddit posts by a specific user. Combine with timeFilter: all to see the full history.

Time-bounded Reddit search

{ "searchQuery": "apify", "sort": "top", "timeFilter": "year" }

Top Reddit posts mentioning your keyword from the last year — the sweet spot for most brand-monitoring jobs.

Multi-term Boolean Reddit search

{ "searchQuery": "(apify OR \"web scraping\") AND NOT pricing" }

Advanced Boolean search is supported — useful for filtering noise when scraping competitor Reddit discussions.

Reddit users & moderator data

Beyond posts and comments, the Reddit scraper surfaces useful metadata about Reddit users and moderators:

User data from post authors

Every scraped Reddit post includes the author field — the public Reddit username. Pair with the Apify Reddit user scraper to enrich with karma, join date, and trophies if you need deeper user profiles.

Identifying active users in a subreddit

Group your scraped Reddit data by author and count posts + comments. The most active users tend to be moderators, power users, or subreddit veterans.

Comment authors vs post authors

The isSubmitter field on comments tells you whether the commenter is the same user who posted the parent Reddit thread — handy for separating OP responses from community replies.

CSV export — handling Reddit data in Excel and BI tools

Although the Reddit scraper is JSON-first, CSV export is a first-class feature.

CSV column layout

Every Reddit post becomes a CSV row. The 29 post fields map 1:1 to CSV columns. Nested fields like comments and imageUrls are serialized as JSON strings inside a single CSV cell.

CSV import to Excel

Download the CSV from the Apify dataset tab, open in Excel, and use Data → Text to Columns if needed to handle locale-specific comma/semicolon separators. The scrapedAt timestamp is ISO 8601 — Excel will parse it automatically.

CSV import to Google Sheets

Use IMPORTDATA() with an Apify dataset CSV URL (you can make a dataset public) or paste the CSV. Google Sheets also handles the nested JSON columns cleanly.

CSV for BI tools (Metabase, Superset, Tableau)

Load the Reddit scraper CSV as a flat table. Most BI tools expose the created timestamp, subreddit name, score, and numComments as dimensions and measures out of the box.

Changelog

1.0 — Initial public release. 29 post fields, 10 comment fields, search and subreddit modes, nested comment threads, AI-ready JSON output, JSON/CSV/Excel export.

Feedback

Found a bug, missing Reddit field, or edge case? Open an issue or leave a review. 5-star reviews help other Reddit scraper users discover this actor.

Who uses this Reddit scraper?

Teams that extract Reddit data at scale use this scraper for:

AI startups — extract Reddit conversations to train and search LLMs.
Market research agencies — extract Reddit posts mentioning client brands; search for emerging trends.
Growth marketers — extract qualified Reddit users as leads; search Reddit for buyer-intent discussions.
Data scientists — extract Reddit data for social science, linguistics, and public health research.
Newsrooms — extract Reddit stories early by searching rising and new feeds.
SaaS product teams — extract feedback from Reddit users to prioritize features and spot churn signals.

If you extract Reddit data for one of these use cases, search for the searchQuery examples in this README as a starting point — and let us know what you built.

🔗 Other actors by makework36

Mining Reddit for leads, trends or content? You might also like:

Shopify Products Scraper — full Shopify catalog: title, SKU, price, variants, inventory
Goodreads Scraper — books, authors, ratings, ISBN & reviews
Substack Scraper — newsletter posts, authors, reactions & paid/free audience
IndiaMART Suppliers Scraper — India B2B suppliers with phone, GST verified & ratings
Reddit Real Estate Scraper — buyer, seller & investor leads
Reddit SaaS Leads Scraper — SaaS pain points & startup buyers
Threads Scraper — Threads.net posts & profiles
Telegram Channel Scraper — public Telegram channels

See all actors by makework36 on the Apify Store.

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

sourabhbgp/similarweb-scraper

Extract SimilarWeb traffic analytics for any domain: rankings, monthly visits, bounce rate, traffic sources, keywords, AI chatbot referrals. Plus RDAP WHOIS and 1-to-5-word keyword density. $1 per 1,000 results. 50 domains in ~10s, 1,000 in under 3 minutes.

Sourabh Kumar

5.0

Reddit Scraper - Posts, Comments & Subreddits

viralanalyzer/reddit-scraper

Extract Reddit posts, comments, subreddit data, and user profiles.

viralanalyzer

5.0

YouTube Search Scraper

api-ninja/youtube-search-scraper

Extract structured YouTube search results with advanced filtering. Search videos, channels, playlists, shorts, movies, and shows using YouTube API. Fastest YouTube scraper on the market

API ninja

1.1K

4.9

Reddit Posts Scraper

vulnv/reddit-posts-scraper

Unlimited Reddit web scraper to crawl posts, comments and subreddits without login.

VulnV

382

5.0

Reddit Search V1 — Posts, Subreddits & Users (3 endpoints)

red_crawler/reddit-search

Search Reddit for posts, subreddits, or users with filters for sort, time, and subreddit scope. No Reddit account or OAuth required.

Red Crawler

5.0

Reddit Scraper — Posts, Comments, Users, Subreddits

good-apis/reddit-scraper

Fast Reddit scraper. Search posts, get subreddit data, user profiles, and comments. No login, no browser, clean JSON output. Launch pricing: $1.25 / 1,000 results.

Danny

Reddit Scraper

janbruinier/jan-reddit-scraper

Scrape posts and comments from Reddit

Jan Bruinier

Reddit Scraper - Posts, Comments, Scores & Subreddits

thirdwatch/reddit-scraper

Scrape Reddit posts, comments, and subreddits. Search globally or within specific subreddits. Get post title, body, score, comments, author, flair, awards, and media URLs. Ultra-fast HTTP-only scraper using Reddit's built-in JSON API.

Thirdwatch

5.0

Reddit Scraper

scrapium/reddit-scraper

🔎 Reddit Scraper (reddit-scraper) extracts posts, comments & metadata from subreddits, users and threads — keywords, timestamps, scores & links. 📤 Export JSON/CSV. 🚀 Ideal for market research, social listening, academic studies & content discovery.

Scrapium

Reddit Scraper

scraperx/reddit-scraper

🔎 Reddit Scraper (reddit-scraper) extracts posts, comments, authors, flair, upvotes & timestamps from subreddits and threads—fast, real-time & reliable. 📊 Perfect for social listening, market research, trend analysis & sentiment. ⚡ Clean JSON/CSV output. 🚀 API-ready.

ScraperX

Reddit Scraper - Posts, Comments, Subreddits & Users

Reddit Scraper — Posts, Comments, Subreddits & Users API

Why use this Reddit scraper?

What Reddit data does this scraper extract?

Post fields (29)

Comment fields (10)

Using this data in AI agents & LLM pipelines

Use cases

1. AI & LLM training data

2. Sentiment analysis & brand monitoring

3. Lead generation

4. Market research

5. Academic research

6. Content discovery & trend spotting

7. Competitor intelligence

How to use the Reddit scraper

Example 1 — Scrape hot posts from multiple subreddits

Example 2 — Search across all of Reddit

Example 3 — Get Reddit posts with nested comments

Example 4 — Scrape Reddit from Python

Example 5 — Scrape Reddit from Node.js

Example 6 — Trigger a Reddit scraping run from cURL

Input parameters

Output example

Export formats

Performance & pricing

How it works (technical details)

Best practices

Troubleshooting

FAQ

Comparison — this Reddit scraper vs alternatives

Integrations

n8n — Reddit to anywhere automation

Make (Integromat) — Reddit to CRM / email / Slack

Zapier — Reddit data into 5,000+ apps

LangChain — Reddit data for RAG

LlamaIndex — Reddit knowledge base

OpenAI / Anthropic — Reddit-powered assistants

Apify Webhooks — push Reddit data on completion

Tutorial 1 — Build a Reddit sentiment dashboard

Tutorial 2 — Reddit lead generation pipeline

Tutorial 3 — Train an LLM on Reddit data

Sample Reddit data output

Sample 1 — tech discussion post

Sample 2 — image gallery post

Sample 3 — video post with comments

Benchmarks

Cost optimization — advanced

Reddit scraping glossary

Security & privacy

Related resources

Advanced Reddit search patterns

Exact phrase search on Reddit

Subreddit-scoped search

Author-scoped search

Time-bounded Reddit search

Multi-term Boolean Reddit search

Reddit users & moderator data

User data from post authors

Identifying active users in a subreddit

Comment authors vs post authors

CSV export — handling Reddit data in Excel and BI tools

CSV column layout

CSV import to Excel

CSV import to Google Sheets

CSV for BI tools (Metabase, Superset, Tableau)

Changelog

Feedback

Who uses this Reddit scraper?

🔗 Other actors by makework36

You might also like

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Reddit Scraper - Posts, Comments & Subreddits

YouTube Search Scraper

Reddit Posts Scraper

Reddit Search V1 — Posts, Subreddits & Users (3 endpoints)

Reddit Scraper — Posts, Comments, Users, Subreddits

Reddit Scraper

Reddit Scraper - Posts, Comments, Scores & Subreddits

Reddit Scraper