Reddit Scraper avatar
Reddit Scraper

Pricing

$30.00/month + usage

Go to Apify Store
Reddit Scraper

Reddit Scraper

Batch-process hundreds of Reddit searches in one run. Each result includes your profileUrl for easy routing. Fast Mode saves 45% on costs. Perfect for social listening, brand monitoring, and research teams tracking multiple topics simultaneously.

Pricing

$30.00/month + usage

Rating

0.0

(0)

Developer

Yevhenii Molodtsov

Yevhenii Molodtsov

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

5 days ago

Last modified

Share

Reddit Combined Scraper

Batch-process hundreds of Reddit searches in a single run. Built for teams that need to monitor multiple topics, brands, or people across Reddit without running separate scrapers.

Why This Scraper?

Most Reddit scrapers handle one search at a time. This one processes multiple queries in parallel and includes your routing identifier (profileUrl) in every result—so you can easily match posts back to the correct profile, topic, or campaign in your downstream systems.

Perfect for:

  • Social listening platforms tracking multiple brands/celebrities
  • Research teams monitoring various topics simultaneously
  • Marketing agencies handling multiple client campaigns
  • Anyone who needs to batch Reddit searches efficiently

Key Features

FeatureBenefit
Bulk Query ProcessingRun 100+ searches in one actor run instead of 100 separate runs
profileUrl MirroringYour identifier echoed in every result for easy routing
Fast Mode (45% Cheaper)DOM extraction vs HTTP fetching - same data, lower cost
Crash RecoveryPosts saved incrementally, state checkpointed between queries
Migration ResilientResumes from checkpoint if Apify migrates the actor

How profileUrl Mirroring Works

When you search for "Kim Kardashian" with profileUrl: "https://myapp.com/profiles/kim", every returned post includes that URL:

{
"profileUrl": "https://myapp.com/profiles/kim",
"title": "Kim Kardashian's new business venture...",
"url": "https://reddit.com/r/entertainment/...",
...
}

This eliminates the need for post-processing joins—your pipeline immediately knows which profile each post belongs to.

Quick Start

{
"queries": [
{ "profileUrl": "https://myapp.com/kim", "searchQuery": "Kim Kardashian" },
{ "profileUrl": "https://myapp.com/taylor", "searchQuery": "Taylor Swift" }
],
"searchTime": "week",
"maxItemsPerQuery": 50,
"fastMode": true
}

Input Parameters

Required

ParameterTypeDescription
queriesArrayList of {profileUrl, searchQuery} objects. Batch as many as you need.

Search Options

ParameterTypeDefaultDescription
searchTimeString"all"Time filter: hour, day, week, month, year, all
sortString"new"Sort: new, hot, relevance, top, comments
includeNSFWBooleantrueInclude adult content
searchModeString"raw"Query mode: raw, exact, and, or (see below)
searchTypeString"posts"Content type: posts (recommended) or all

Search Mode Explained

ModeInputTransformed QueryUse Case
rawEmily MillerEmily MillerReddit default (implicit AND)
exactEmily Miller"Emily Miller"Exact phrase, prevents typo fixes
andEmily MillerEmily AND MillerExplicit AND (same as raw)
orTaylor SwiftTaylor OR SwiftMatch any word

Recommendation: Use exact mode for person names and usernames. It prevents Reddit's spell correction (e.g., nitpicknate"nitpicknate" instead of being auto-corrected to "nitpick rate") and enables "no results" detection.

Results Control

ParameterTypeDefaultDescription
maxItemsPerQueryNumber25Posts per query (1-500)
maxItemsNumberGlobal cap across all queries

Performance

ParameterTypeDefaultDescription
fastModeBooleantrue45% cheaper DOM extraction (recommended)
maxConcurrencyNumber3Parallel queries (2 for 1GB RAM, 3 for 2GB)

Output Format

Each post is pushed individually to the dataset:

{
"profileUrl": "https://myapp.com/kim",
"id": "t3_abc123",
"dataType": "post",
"title": "Kim Kardashian announces new skincare line",
"body": "Full post content here...",
"url": "https://www.reddit.com/r/entertainment/comments/abc123/...",
"communityName": "r/entertainment",
"username": "redditor123",
"createdAt": "2025-01-27T14:30:00.000Z",
"upVotes": 1542,
"numberOfreplies": 234,
"isNSFW": false
}

Fast Mode vs Standard Mode

AspectFast ModeStandard Mode
Cost45% cheaperBaseline
Speed~60 posts/min~30 posts/min
Post BodyMay be truncated for very long postsAlways complete
Data Completeness95%+ identical100%
Recommended ForMost use casesWhen you need full text of long posts

How Fast Mode Saves Money

Standard Mode: Browser scrolls search → Collects URLs → HTTP fetches each post individually → Full data extraction

Fast Mode: Browser scrolls search → Extracts data directly from DOM → Done

By skipping the HTTP fetch phase, Fast Mode reduces both compute time and network overhead by approximately 45%.

Real-World Cost Comparison

Based on actual test runs (January 2025):

ScenarioModeTimeCost
3 queries, 25 posts eachFast75s$0.0021
3 queries, 25 posts eachStandard137s$0.0038
10 queries, 50 posts eachFast~4 min~$0.035
50 queries, 25 posts eachFast~15 min~$0.11

Bottom line: Fast Mode delivers the same results at nearly half the cost.

Memory Recommendations

WorkloadMemoryConcurrencyNotes
1-5 queries1024 MB2Minimum viable
5-15 queries2048 MB3Recommended
15-50 queries2048 MB2-3Lower concurrency for stability
50+ queries4096 MB3Large batch jobs

Proxy Requirements

Residential proxies are required. Reddit aggressively blocks datacenter IPs.

The default configuration uses Apify's residential proxy group, which works out of the box. If you use custom proxies, ensure they're residential.

Error Handling & Recovery

This scraper is built for reliability:

  • Incremental Saving: Each post is pushed to the dataset immediately after extraction. If the actor crashes mid-query, you keep everything collected so far.
  • State Checkpoints: After each query completes, state is saved. If Apify migrates the actor, it resumes from the last checkpoint.
  • Rate Limit Handling: Built-in delays and exponential backoff prevent Reddit blocks.
  • 80% Failure Threshold: If 80%+ of requests fail, the actor stops early to save costs.

Limitations

  • HTTP Fetch Limit: Standard mode fetches up to ~100 posts per query in the HTTP phase. For more posts, results come from browser extraction.
  • Fast Mode Text: Very long post bodies (1000+ characters) may be truncated in Fast Mode. Switch to Standard Mode if you need complete text for long-form content.
  • Rate Limits: Reddit may throttle aggressive scraping. The scraper handles this automatically, but very large jobs may take longer.

Example Use Cases

Social Listening Platform

Monitor mentions of 50 celebrities across Reddit, routing each mention to the correct profile:

{
"queries": [
{ "profileUrl": "https://platform.com/celeb/1", "searchQuery": "Taylor Swift" },
{ "profileUrl": "https://platform.com/celeb/2", "searchQuery": "Bad Bunny" }
// ... 48 more
],
"searchTime": "day",
"maxItemsPerQuery": 25,
"fastMode": true
}

Brand Monitoring

Track your brand and competitors:

{
"queries": [
{ "profileUrl": "brand:ours", "searchQuery": "\"Acme Corp\" OR \"Acme Inc\"" },
{ "profileUrl": "brand:competitor1", "searchQuery": "\"Big Corp\"" },
{ "profileUrl": "brand:competitor2", "searchQuery": "\"Other Inc\"" }
],
"searchTime": "week",
"sort": "relevance",
"maxItemsPerQuery": 100
}

Research Data Collection

Gather posts about specific topics for analysis:

{
"queries": [
{ "profileUrl": "topic:ai", "searchQuery": "artificial intelligence" },
{ "profileUrl": "topic:ml", "searchQuery": "machine learning" },
{ "profileUrl": "topic:llm", "searchQuery": "large language models" }
],
"searchTime": "month",
"sort": "top",
"maxItemsPerQuery": 200,
"fastMode": false
}

Development

npm install # Install dependencies
apify run # Run locally with Apify proxies
npm test # Run tests
npm run format # Format code

Changelog

  • v1.0 - Initial release with Fast Mode, bulk queries, profileUrl mirroring, and migration persistence

License

MIT


Questions? Open an issue on the GitHub repository.