Reddit Intelligence Scraper (Pay per Event)
Pricing
from $3.00 / 1,000 results
Reddit Intelligence Scraper (Pay per Event)
Scrape Reddit posts, full comment trees, user profiles, and search results. Features subreddit monitoring with webhook alerts, batch comparison across multiple subreddits, and AI-native markdown output ready for LLM pipelines and vector databases.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Eimantas V
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
3 days ago
Last modified
Categories
Share
Reddit Intelligence Scraper
Extract posts, full comment trees, user profiles, search results, and trending topics from Reddit — with AI-native structured output designed to drop directly into LLM pipelines, vector databases, and RAG systems without preprocessing.
🚀 What Can This Reddit Scraper Extract?
| Data Type | Fields Extracted |
|---|---|
| Posts | Title, body (markdown + plain text), score, upvote ratio, awards, flair, author, timestamps, crosspost data |
| Comments | Full nested tree (all depths), per-comment score, author, edited flag, reply count |
| Users | Karma breakdown, account age, post/comment history, profile bio |
| Search Results | Full-text Reddit search with subreddit filtering, sorting, and time windows |
| Subreddit Metadata | Subscriber count, active users, description, creation date, icons |
| Batch Comparison | Side-by-side stats for 10+ subreddits in a single run |
✨ Key Features
- 🔄 Subreddit monitoring mode — Poll any subreddit for new posts matching keyword filters and deliver alerts via webhook in real-time
- 🌲 Full comment tree traversal — Not just top-level comments. Fetches deeply nested replies via Reddit's
morechildrenAPI, up to configurable depth - 🤖 AI-native output — Every result includes a
_markdown_documentfield: a clean, structured markdown document ready for LLM context windows or vector embedding - 📊 Batch subreddit comparison — Pull top posts from up to 20 subreddits in one run with aggregated stats — ideal for market research and competitive analysis
- ⚡ Reliable session rotation — Rotates User-Agents, respects
X-Ratelimit-*headers, and uses exponential backoff — the #1 failure mode for Reddit scrapers, solved - 🔍 Advanced filtering — Filter by flair, keyword, score threshold, date range, NSFW flag, and sort order (hot/new/top/rising)
- 📋 Schema-versioned output — Every item carries
_schema: "reddit-intelligence/v1"so your pipeline always knows what it's getting
📖 How to Use the Reddit Intelligence Scraper
Step 1 — Choose a mode
| Mode | What it does |
|---|---|
subreddit | Scrape posts from one or more subreddits |
post | Scrape a specific post URL with all comments |
user | Scrape a user's profile, posts, and comment history |
search | Full-text Reddit search with filters |
batch | Compare top posts across multiple subreddits |
monitor | Watch subreddits for new posts and deliver webhook alerts |
Step 2 — Configure your run
Scrape the top posts from r/MachineLearning this week:
{"mode": "subreddit","subreddits": ["MachineLearning"],"sortBy": "top","timeFilter": "week","maxPostsPerSubreddit": 50,"includeComments": true,"maxCommentsPerPost": 100,"outputFormat": "both"}
Compare 5 subreddits for market research:
{"mode": "batch","subreddits": ["entrepreneur", "startups", "SaaS", "indiehackers", "smallbusiness"],"sortBy": "top","timeFilter": "month","maxPostsPerSubreddit": 10}
Monitor r/ArtificialIntelligence for mentions of "GPT" and alert via webhook:
{"mode": "monitor","subreddits": ["ArtificialIntelligence"],"keywordFilter": ["GPT", "Claude", "Gemini", "LLM"],"monitoringInterval": 5,"webhookUrl": "https://your-server.com/webhooks/reddit"}
Scrape a specific post with full comment tree:
{"mode": "post","postUrls": ["https://www.reddit.com/r/MachineLearning/comments/abc123/example_post/"],"maxCommentsPerPost": 500,"commentDepth": 10}
Step 3 — Use the output
Every post result includes a ready-to-use markdown document in the _markdown_document field:
# Why GPT-4 is changing enterprise software**r/MachineLearning** | Score: **4,231** (96% upvoted) | Comments: **312**Author: u/ml_researcher | Posted: 2024-03-15T14:22:00Z## Post ContentThe shift from rule-based to generative AI...## Top Comments### u/ai_engineer (Score: 847)This is exactly what we're seeing in production...> #### u/skeptic99 (Score: 234)> Worth noting the cost implications here...
Paste this directly into your LLM prompt or chunk it for RAG.
💰 How Much Does It Cost to Scrape Reddit?
Reddit Intelligence Scraper is priced per result (pay-per-event):
| Task | Approximate Cost |
|---|---|
| 1,000 posts (metadata only) | ~$3.00 |
| 1,000 posts with 100 comments each | ~$3.00 |
| User profile (1 user, 25 posts) | ~$0.12 |
| Batch comparison (10 subs × 10 posts) | ~$0.30 |
| Monitor run (24h, low-traffic sub) | ~$1.50–6.00 |
Pricing: $3.00 per 1,000 results. Each post, comment thread, user profile, or search result page counts as one result.
Tip: Disable includeComments and set outputFormat: "json" for faster runs when you only need post metadata.
📤 Output Format
Post object (JSON)
{"_schema": "reddit-intelligence/v1","_scraped_at": "2024-03-15T14:30:00.000Z","type": "post","id": "abc123","url": "https://www.reddit.com/r/MachineLearning/comments/abc123/...","subreddit": "MachineLearning","title": "Why GPT-4 is changing enterprise software","body_markdown": "The shift from rule-based to generative AI...","body_text": "The shift from rule-based to generative AI...","score": 4231,"upvote_ratio": 0.96,"num_comments": 312,"total_awards_received": 7,"flair_text": "Discussion","author": "ml_researcher","created_utc": "2024-03-15T14:22:00.000Z","comments": [...],"subreddit_meta": {"subscribers": 2800000,"active_user_count": 4200,...},"_markdown_document": "# Why GPT-4 is changing..."}
Webhook payload (monitor mode)
{"event": "keyword_match","timestamp": "2024-03-15T14:35:00.000Z","subreddit": "ArtificialIntelligence","matched_keywords": ["GPT", "LLM"],"post": {"id": "xyz789","title": "New GPT-4 benchmark results are wild","url": "https://www.reddit.com/r/...","score": 142,"body_preview": "Just ran the full MMLU suite..."}}
🤔 Frequently Asked Questions
Is scraping Reddit legal?
Reddit's public data is accessible without authentication. This actor only scrapes publicly available content — the same data accessible in a browser without logging in. Always review Reddit's Terms of Service and ensure your use complies with applicable laws. This tool is intended for research, analytics, and AI training use cases.
Why is the actor not returning all 1000 posts I requested?
Reddit's top sort with longer time windows (year, all) is the best way to get high-quality historical posts. The API occasionally returns fewer results than requested — this is a Reddit limitation. Try increasing maxPostsPerSubreddit and setting sortBy: "new" for completeness.
What's the difference between outputFormat: "json", "markdown", and "both"?
json— returns the full structured JSON object, ideal for data pipelines and databasesmarkdown— returns only the_markdown_documentfield (the AI-ready version), minimal storageboth— returns full JSON and the markdown document in every result
Can I use this to feed Reddit data into a vector database?
Yes — this is a primary use case. Use outputFormat: "markdown", split on ## Top Comments to get post and comment chunks, and embed each chunk separately.
Does it handle private or restricted subreddits?
No. This actor only accesses public Reddit content. Private subreddits require OAuth authentication with approved account credentials.
How does the monitoring mode work exactly?
On first run, the actor seeds its state with the current latest posts (no webhook fires). On subsequent polling cycles (default: every 5 minutes), any new post that matches your filters triggers a dataset push and optionally a webhook. State is persisted in Apify Key-Value Store so it survives between runs.
Why use a proxy?
Without a proxy, repeated scraping from a single IP can trigger Reddit's rate limiting (HTTP 429). Apify's residential proxy pool rotates IPs automatically, making your scraper much more reliable at scale.
🔗 Related Actors
- Web Scraper — General-purpose web scraping
- Twitter Scraper — Social media monitoring on X/Twitter
- YouTube Scraper — Video and comment data from YouTube
📬 Support & Feedback
Found a bug or have a feature request? Open an issue or contact us through the Apify platform. We monitor this actor actively and publish updates regularly.