Pricing

Pay per event

👾 Reddit Data Extractor

Scrape Reddit data to train AI models or build NLP datasets. Extract posts, comments, and user details via public API endpoints with no browser required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

an hour ago

Last modified

💬 Reddit Scraper

Scrape Reddit at scale to build high-quality datasets for AI training, machine learning, and NLP applications. This developer-focused Reddit Data Extractor bypasses the overhead of a headless web browser, extracting unstructured community conversations and turning them into clean, structured data using public API endpoints. If you need to gather millions of words for text analysis or train large language models, this tool lets you extract posts, nested comments, and thread details with incredible speed.

Data scientists and AI engineers run this scraper to compile extensive linguistic datasets, analyze user sentiment across specific pages, and track digital subcultures. Instead of struggling with rate limits or complex authentication tools, you can seamlessly integrate this scraper into your existing data pipelines. Schedule it to run nightly to capture the newest discussions, or use search filters to scrape historical top posts for comprehensive analysis.

The extracted results include rich metadata essential for advanced processing. Every run yields precise details such as created_utc, score, author, selftext, and full URL links. Once your scraped data is ready, you can export the results via API to seamlessly feed your vector databases or analytical models.

Store Quickstart

Start with the Quickstart template (1 subreddit, hot, 25 posts). For sentiment analysis, use Deep Scrape with comments enabled.

Key Features

💬 Official Reddit JSON API — Uses old.reddit.com/r/{sub}/{sort}.json
🔀 Multiple sort modes — hot, new, top, rising with time filters
💭 Comments included — Optional nested comment extraction
📊 Post metadata — Score, author, subreddit, created_utc, num_comments
🧩 Self + link posts — Both text posts and URL submissions
🔑 No API key needed — Uses public JSON endpoints

Use Cases

Who	Why
Market researchers	Analyze consumer sentiment on brand/product subreddits
Crisis monitoring	Track negative mentions in real-time
Content marketers	Discover trending topics and user pain points
Gaming/media analysts	Monitor fan community reactions
Academic researchers	Collect Reddit datasets for NLP research

Input

Field	Type	Default	Description
subreddits	string[]	(required)	Subreddit names (max 20)
sort	string	hot	hot, new, top, rising
maxItems	integer	25	Max posts per subreddit (1-500)
includeComments	boolean	false	Include nested comments

Input Example

{
  "subreddits": ["programming", "technology"],
  "sort": "hot",
  "maxItems": 50,
  "includeComments": true
}

Output

Field	Type	Description
`id`	string	Reddit post ID
`title`	string	Post title
`author`	string	Username of poster
`subreddit`	string	Subreddit name
`url`	string	Permalink to post
`score`	integer	Upvote score
`numComments`	integer	Comment count
`createdAt`	string	ISO timestamp
`selftext`	string	Post body (for text posts)
`comments`	object[]	Top comments (if includeComments enabled)

Output Example

{
  "title": "New JavaScript framework released",
  "author": "dev_user",
  "score": 1250,
  "url": "https://example.com/framework",
  "selftext": "Detailed writeup inside...",
  "subreddit": "programming",
  "createdUtc": 1712345678,
  "numComments": 342,
  "comments": [{"author": "...", "body": "..."}]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~reddit-data-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "subreddits": ["programming", "technology"], "sort": "hot", "maxItems": 50, "includeComments": true }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/reddit-data-scraper").call(run_input={
  "subreddits": ["programming", "technology"],
  "sort": "hot",
  "maxItems": 50,
  "includeComments": true
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/reddit-data-scraper').call({
  "subreddits": ["programming", "technology"],
  "sort": "hot",
  "maxItems": 50,
  "includeComments": true
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

⚠️ Proxy Required on Apify Datacenter

Reddit blocks the majority of Apify's shared datacenter IPs. Without a proxy:

Runs on Apify infrastructure will fail with runStatus: all_blocked and exit code 1.
The output meta.subredditResults shows which subreddits were blocked vs. successful.

To fix: In the actor's .env, set APIFY_USE_APIFY_PROXY=true and APIFY_PROXY_GROUPS=RESIDENTIAL before running npm run apify:cloud:setup, or set PROXY_URL to your own residential proxy.

Local / home ISP runs work without a proxy; the block only affects datacenter IPs.

If you bootstrap recurring cloud tasks with npm run apify:cloud:setup, set APIFY_USE_APIFY_PROXY=true, APIFY_PROXY_GROUPS=RESIDENTIAL, and APIFY_RESTART_ON_ERROR=false in .env so the cloud run uses the internal Apify residential proxy path and does not auto-retry four identical blocked runs.

Other Tips

Use sort: "top" with a time filter for high-quality content discovery.
Set includeComments: true for sentiment analysis workflows.
Track subreddits in your industry to spot trends and customer pain points.

FAQ

Does Reddit block this?

Yes — Reddit blocks most Apify datacenter IPs. On Apify infrastructure, runs without a proxy will fail with runStatus: all_blocked (exit code 1) and 0 posts. Configure a residential proxy in the actor's Proxy tab or via PROXY_URL env var. Runs from a home/ISP IP work fine. For scheduled cloud runs, set APIFY_RESTART_ON_ERROR=false to avoid repeated retries after a known block.

What is runStatus in the output?

Value	Meaning
`ok`	All subreddits fetched successfully
`partial`	Some subreddits succeeded; others were blocked or errored
`all_blocked`	Every subreddit was blocked — no posts collected (exit code 1)

Can I scrape private subreddits?

No. Only public subreddits accessible to unauthenticated users.

How many comments per post?

Top-level comments only, limited to ~200 per post (Reddit API default).

What's the difference from Apify's reddit-scraper-lite?

No DOM dependency, cleaner output schema, proxy fallback built-in, and honest degraded-path reporting when requests are blocked.

DevOps & Tech Intel cluster — explore related Apify tools:

🌐 DNS Propagation Checker — Check DNS propagation across 8 global resolvers (Google, Cloudflare, Quad9, OpenDNS).
🔍 Subdomain Finder — Discover subdomains for any domain using Certificate Transparency logs (crt.
🧹 CSV Data Cleaner — Clean CSV data: trim whitespace, remove empty rows, deduplicate by columns, sort.
📦 NPM Package Analyzer — Analyze npm packages: download stats, dependencies, licenses, deprecation status.
GitHub Release & Changelog Monitor API — Track GitHub releases, tags, release notes, and changelog drift over time with one summary-first repository row per repo.
Docs & Changelog Drift Monitor API — Monitor release notes, changelog pages, migration guides, and key docs pages with one summary-first target row per monitored repo, SDK, or product.
Tech Events Calendar API | Conferences + CFP — Aggregate tech conferences and CFPs across multiple sources into a deduplicated event calendar for DevRel and recruiting workflows.
🔒 OSS Vulnerability Monitor — Monitor open-source packages for known security vulnerabilities using OSV and GitHub Security Advisories.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

Reddit User Profile Posts And Comments Scraper

scrapier/reddit-user-profile-posts-and-comments-scraper

Pull full Reddit user activity—posts and comments—via a fast, reliable scraper. Useful for building datasets, training NLP models, or powering dashboards with real Reddit behavior data. No login or API key required.

Scrapier

🧠 Reddit Data Extractor

taroyamada/reddit-all-in-one-scraper

Scrape Reddit data to train AI models. Extract nested comments, post URLs, and user profiles into clean datasets for vector databases and RAG pipelines.

太郎山田

Reddit Scraper

janbruinier/jan-reddit-scraper

Scrape posts and comments from Reddit

Jan Bruinier

Reddit User Posts Scraper

scrapearchitect/reddit-user-posts-scraper

Reddit User Posts Scraper

Scrape Architect

Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

sovanza.inc/reddit-data-scraper---scrape-posts-comments-upvotes-more

Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.

Sovanza

Reddit Scraper - Posts, Comments & Subreddits

viralanalyzer/reddit-scraper

Extract Reddit posts, comments, subreddit data, and user profiles.

viralanalyzer

5.0

Reddit Scraper

brilliant_gum/reddit-scraper

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL. No API key required. Built for market research, brand monitoring, sentiment analysis, and AI/LLM training datasets.

Yuliia Kulakova

Reddit Api Scraper

scrapier/reddit-api-scraper

Extract structured data from Reddit quickly and reliably with the Reddit API Scraper. Collect posts, comments, subreddit details, user profiles, upvotes, timestamps, and more using the official API. Ideal for research, monitoring trends, sentiment analysis, and building Reddit datasets at scale.

Scrapier

Reddit User Profile Posts Comments Scraper

scrapers-hub/reddit-user-profile-posts-comments-scraper

Reddit user profile posts comments scraper to extract posts, comments, and activity from Reddit user profiles 💬📊 Ideal for research, sentiment analysis, and audience insights. Fast and efficient.