Reddit Scraper avatar
Reddit Scraper

Pricing

$1.00 / 1,000 results

Go to Apify Store
Reddit Scraper

Reddit Scraper

Developed by

ben

ben

Maintained by Community

Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sentiment analysis & market research. Includes bulk comment scraping with progress tracking.

0.0 (0)

Pricing

$1.00 / 1,000 results

0

2

1

Last modified

2 days ago

Reddit Scraper - Fast & AI-Ready Data Extraction

Extract Reddit posts, comments, and user data in markdown format perfect for AI training, market research, and sentiment analysis. No API keys needed!

What can Reddit Scraper extract?

This Reddit Scraper can extract comprehensive data from Reddit including:

  • Posts: Titles, content (text/markdown/HTML), scores, comments count, awards, timestamps
  • Comments: Nested comment threads with full hierarchy, scores, and timestamps
  • User Data: Post history, karma scores, account information
  • Subreddit Info: Community statistics, descriptions, member counts
  • Search Results: Find posts across Reddit or within specific communities
  • Images & Media: Extract image URLs, thumbnails, and media metadata
  • Engagement Metrics: Upvote ratios, comment counts, award counts
  • AI-Ready Output: Token counts and markdown formatting for LLM training

Why choose Reddit Scraper?

25% Cheaper - Only $1.50 per 1,000 results vs $2.00+ from competitors ✅ Faster - Uses Reddit's JSON API (no heavy browser needed) ✅ Bulk Comment Loading - Efficient scraping with up to 500 comments per request ✅ AI-Optimized - Markdown output with token counts for ML training ✅ No API Keys - Works without Reddit API authentication ✅ Progress Tracking - Real-time updates on scraping progress ✅ Easy to Use - Simple input configuration, no coding required

How do I use Reddit Scraper?

1. Create a free Apify account

Sign up at apify.com - you get $5 free credit (enough for 3,300+ posts!)

2. Start the Actor

Visit the Reddit Scraper page and click "Try for free"

3. Configure your scrape

Choose what to scrape:

Subreddit Posts:

{
"mode": "subreddit",
"subreddit": "ArtificialInteligence",
"sort": "hot",
"maxPosts": 100
}

Single Post + Comments:

{
"mode": "post",
"postUrl": "https://www.reddit.com/r/python/comments/abc123/example/",
"maxComments": 500
}

User Posts:

{
"mode": "user",
"username": "example_user",
"maxPosts": 100
}

Search Reddit:

{
"mode": "search",
"searchQuery": "machine learning",
"searchSubreddit": "python",
"maxPosts": 200
}

4. Download your data

Export in JSON, CSV, Excel, XML, or HTML format

Input Parameters

ParameterTypeDescription
modestringScraping mode: subreddit, post, user, or search
subredditstringSubreddit name (e.g., "python")
postUrlstringFull URL of post to scrape
usernamestringReddit username to scrape
searchQuerystringSearch query
sortstringSort order: hot, new, top, rising, controversial
timeFilterstringTime filter: hour, day, week, month, year, all
maxPostsintegerMaximum posts to scrape (0 = unlimited)
maxCommentsintegerMaximum comments per post (0 = unlimited, applies to both post mode and subreddit mode with includeComments enabled)
includeCommentsbooleanInclude comments in subreddit mode (enables bulk comment scraping with progress tracking)
sinceDatestringOnly posts after this date (YYYY-MM-DD)
outputFormatstringContent format: markdown, html, or text
includeImagesbooleanExtract image URLs
delaySecondsnumberDelay between requests (default: 1.0)

Output Example

{
"id": "abc123",
"title": "How I built an AI agent that scrapes Reddit",
"url": "https://reddit.com/r/artificial/comments/abc123/",
"selftext_markdown": "Here's my complete guide...",
"author": "ai_developer",
"subreddit": "artificial",
"score": 1250,
"upvote_ratio": 0.97,
"num_comments": 89,
"created_utc": "2025-01-15T10:30:00Z",
"word_count": 850,
"token_count": 1200,
"images": [
{
"url": "https://i.redd.it/example.jpg",
"width": 1200,
"height": 800
}
]
}

Use Cases

1. AI Training Data 🤖

Reddit is goldmine for LLM training:

  • Real human conversations and discussions
  • Expert Q&A across 100K+ communities
  • Diverse topics and writing styles
  • Already in markdown format for easy processing

Example: Train a customer service chatbot on 50K support-related Reddit posts

2. Market Research 📊

Understand what people really think:

  • Track brand mentions and sentiment
  • Monitor competitor discussions
  • Identify trending topics and pain points
  • Analyze customer feedback in real-time

Example: Scrape r/SaaS to understand startup challenges and opportunities

3. Content Research ✍️

Find ideas and inspiration:

  • Discover viral content patterns
  • Identify popular discussion topics
  • Research audience questions and pain points
  • Find engaging headlines and angles

Example: Scrape top posts from r/Entrepreneur for blog content ideas

4. Sentiment Analysis 😊😡

Analyze public opinion at scale:

  • Track sentiment on products/brands
  • Monitor crisis situations
  • Understand community mood shifts
  • Identify influencers and thought leaders

Example: Analyze 10K comments about a new product launch

5. Academic Research 🎓

Study online communities:

  • Social network analysis
  • Language and communication patterns
  • Community dynamics and moderation
  • Misinformation spread patterns

Example: Research how scientific information spreads on Reddit

6. Competitive Intelligence 🔍

Stay ahead of competitors:

  • Monitor competitor mentions
  • Track industry discussions
  • Identify emerging trends early
  • Understand customer pain points

Example: Track all mentions of competitors in your industry subreddits

How much will it cost to scrape Reddit data?

Reddit Scraper uses pay-per-result pricing - you only pay for the data you extract.

Pricing: $1.50 per 1,000 results

Cost Examples:

Posts ScrapedCostWhat You Get
100 posts$0.15Small subreddit sample
1,000 posts$1.50Medium dataset
10,000 posts$15.00Large research dataset
100,000 posts$150.00Enterprise AI training data

Free Tier:

With Apify's free plan ($5 credit), you get:

  • ~3,300 posts FREE to try the Actor
  • Perfect for testing and small projects

ROI Calculation:

Manual Scraping:

  • Time: ~2 minutes per post manually
  • 1,000 posts = 33 hours of work
  • At $25/hour = $825 cost

Reddit Scraper:

  • Time: ~2 minutes total (automated)
  • 1,000 posts = $1.50
  • Savings: $823.50 (99.8% cost reduction!)

Pro Tips

Optimize for Speed

  • Use hot or new sort - they're faster than top
  • Set reasonable maxPosts limits
  • Use includeComments: false unless you need them

Get Quality Data

  • Use markdown output format for AI training
  • Filter by timeFilter to get recent content
  • Use sinceDate for incremental scraping
  • Sort by top + week for high-quality posts
  • Enable includeComments for complete conversation data

Efficient Comment Scraping

  • Set maxComments to limit comments per post (default: 100)
  • Uses bulk loading (up to 500 comments per request)
  • Includes progress tracking showing scraped/failed posts
  • Nested comments are preserved with full hierarchy
  • Failed posts are logged but don't stop the scraping

Avoid Rate Limits

  • Keep delaySeconds at 1.0 or higher
  • Scrape during off-peak hours (US nighttime)
  • Don't scrape the same subreddit repeatedly

Save Money

  • Set maxPosts to avoid over-scraping
  • Use search mode for targeted data
  • Scrape only what you need

Technical Details

How It Works

Reddit Scraper uses Reddit's official JSON API (not web scraping):

  1. Converts Reddit URLs to JSON API endpoints
  2. Fetches data using HTTP requests (no browser)
  3. Parses and structures data into clean models
  4. Converts HTML to markdown for AI compatibility
  5. Counts tokens for LLM training estimation

Data Quality

  • ✅ Real-time data (not cached)
  • ✅ Complete post and comment threads
  • ✅ Nested comment structure preserved
  • ✅ All metadata included (scores, timestamps, awards)
  • ✅ Markdown formatting cleaned and optimized

Performance

  • Speed: ~100-200 posts per minute
  • Reliability: 99%+ success rate
  • Scale: Tested with 100K+ posts

Limitations

  • Cannot access deleted/removed posts
  • Cannot scrape private subreddits
  • Reddit's API has 100 posts/page limit (we handle pagination)
  • Comments are limited by Reddit's API (usually ~500 top-level comments per post)

Comparison: Reddit Scraper vs Alternatives

FeatureReddit ScraperManual ScrapingReddit APIOther Scrapers
Price$1.50/1K$825/1K$12K+/50M$2-5/1K
No API KeyN/AVaries
Markdown Output
Token Counts
SpeedFastSlowFastVaries
Easy Setup
ScaleUnlimitedLimitedLimitedUnlimited

Frequently Asked Questions

Yes! Reddit Scraper only accesses publicly available data that Reddit makes available through their JSON API. We respect robots.txt and rate limits.

Do I need a Reddit API key?

No! Reddit Scraper uses Reddit's public JSON API which doesn't require authentication for public content.

Can I scrape private subreddits?

No, only public content is accessible without authentication.

How fast is it?

Approximately 100-200 posts per minute, depending on content size and settings.

Can I scrape comments?

Yes! Use mode: "post" to scrape a specific post with all its comments, or enable includeComments in subreddit mode.

What's the maximum I can scrape?

There's no hard limit, but we recommend batching large scrapes (10K+ posts) to avoid timeouts.

Why markdown format?

Markdown is perfect for AI training because it:

  • Preserves text structure (bold, links, lists)
  • Is lightweight and clean
  • Works great with LLMs like GPT, Claude, etc.
  • Easy to convert to other formats

Can I schedule regular scrapes?

Yes! Use Apify's Schedules feature to run the Actor automatically.

How do I integrate with my application?

Use Apify's API or webhooks to trigger scrapes and receive data programmatically.

What if I hit Reddit's rate limits?

Increase the delaySeconds parameter. Our default (1.0 seconds) works for most cases.

Can I get historical data?

Reddit's API only provides recent posts (usually last 1000 per subreddit). For historical data, you'll need specialized datasets.

Support

Need help? Have a feature request?

  • 📧 Email: contact via Apify
  • 🐛 Issues: Report in the Run console
  • 💬 Questions: Ask in the Apify community

We typically respond within 24 hours!

Check out my other data extraction tools:

  • Newsletter Scraper - Scrape Substack, Beehiiv & Ghost newsletters with full content extraction

More scrapers coming soon! Follow @benthepythondev for updates.


Ready to extract Reddit data? Start scraping now →

🤖 Built with the Apify SDK | Made by benthepythondev