Reddit Posts Scraper
Pricing
$19.99/month + usage
Reddit Posts Scraper
🔎 Reddit Posts Scraper pulls posts & comments from subreddits or users—titles, bodies, upvotes, score, flair, author, timestamps, links & media. 📊 Great for research, social listening, SEO, sentiment & trend analysis. ⚙️ Filters & keywords. 💾 Export CSV/JSON. 🚀
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
Scraply
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
22 days ago
Last modified
Categories
Share
Reddit Posts Scraper
Reddit Posts Scraper is a fast, reliable Reddit scraper that collects public posts (and optional comments) from subreddits, full Reddit URLs, or search keywords. It solves the pain of manual copying and API limits by returning clean, structured JSON ready for analysis. Built for marketers, developers, data analysts, and researchers, this subreddit scraper helps you scrape Reddit posts at scale for trend tracking, SEO research, social listening, and NLP pipelines. With parallel processing, smart retries, and proxy fallback, it enables high-volume Reddit thread scraping with production reliability. 🚀
What data / output can you get?
Below are the exact fields this Reddit web scraper pushes to the Apify dataset (one row per post):
| Data type | Description | Example value |
|---|---|---|
| subreddit | Community name the post belongs to | "technology" |
| title | Post title text | "Open-source LLM hits new benchmark" |
| author | Reddit username of the poster | "u_datawizard" |
| score | Post score/upvotes | 842 |
| num_comments | Number of comments on the post | 126 |
| created_utc | Unix timestamp (UTC) when the post was created | 1703123456 |
| permalink | Full permalink to the Reddit thread | "https://www.reddit.com/r/technology/comments/abc123/example_post/" |
| body | Selftext/body content for text posts | "Here’s a quick summary of the paper..." |
| thumbnail_url | Thumbnail image URL (if any) | "https://preview.redd.it/..." |
| image_url | Main media URL (if provided) | "https://i.redd.it/xyz.png" |
| comments | Array of nested comments (author, body, score, created_utc, replies) | [ { "author": "u_commenter", ... } ] |
| post_id | Unique Reddit post ID | "abc123" |
| success | Whether this post was processed successfully | true |
| error_message | Error details if processing failed | null |
Notes:
- Nested comments include replies with the same structure (author, body, score, created_utc, replies).
- You can export results as JSON, CSV, or Excel from the Apify dataset UI or via the API.
Key features
- ⚡ Bold-scale extraction Parallel comment fetching and batched processing to scrape subreddit posts efficiently across multiple sources.
- 🧩 Flexible targeting Scrape Reddit posts by subreddit names, full Reddit URLs, or keywords in a single run—perfect for a Reddit thread scraper or Reddit post extractor.
- 🔄 Sort & time filtering Choose hot, new, top, or rising with a time range for top/rising—ideal to scrape subreddit posts for trend snapshots.
- 🛡️ Resilient proxy fallback Automatic fallback from no proxy → datacenter → residential if blocked, plus smart retries on 403/429/5xx and timeouts.
- 💬 Optional comments Configure how many comments to fetch per post, or set zero to skip—streamlines both Reddit posts and Reddit comments scraper use cases.
- 💾 Structured outputs Export-ready JSON for dashboards or to export Reddit posts to CSV; consistent fields for easy joins and analytics.
- 🧑💻 Developer-friendly Built in Python (Python Reddit scraper) with Apify SDK—trigger via API and integrate with Make, Zapier, or n8n.
- 🏗️ Production-ready reliability Request pacing, parallelism controls, and detailed logging ensure a stable Reddit scraping tool for large runs.
How to use Reddit Posts Scraper - step by step
- Create or log in to your Apify account.
- Open the Reddit Posts Scraper in Apify Console.
- Add your sources in “Reddit URLs / Subreddits / Keywords”:
- Enter subreddit names (e.g., “news” or “r/technology”), full Reddit URLs, or search keywords (e.g., “artificial intelligence”). One per line.
- Set optional controls:
- Sort order (hot, new, top, rising) and time filter (hour, day, week, month, year, all) for top/rising.
- Limits for maximum posts and maximum comments per post.
- Proxy configuration (recommended for larger volumes).
- Start the run and monitor logs as posts are collected and comments are processed in parallel.
- Download results:
- Go to the Dataset in the Output tab to preview results and export to JSON, CSV, or Excel.
- Pro Tip: Trigger runs via the Apify API and pipe dataset URLs to your data stack or automation (n8n, Make, Zapier) for scheduled Reddit scraping without API credentials.
Use cases
| Use case | Description |
|---|---|
| Market & trend research | Track trending topics by keyword or subreddit to quantify engagement and sentiment over time. |
| Content & SEO research | Discover high-performing topics and questions to inform content calendars and SERP targeting. |
| Brand & competitor monitoring | Monitor mentions across relevant communities and compare share of voice across subreddits. |
| NLP / ML datasets | Collect titles, bodies, and structured comment trees for training or evaluation datasets. |
| Academic & journalism research | Compile public quotes and discussions from Reddit threads for analysis and reporting. |
| Data pipelines & automation | Schedule a Reddit scraping script via API, then export Reddit posts to CSV for ETL or BI dashboards. |
Why choose Reddit Posts Scraper?
This Reddit scraping tool combines precision, automation, and reliability for large-scale, repeatable data collection.
- ✅ Accurate, structured fields ready for analysis and modeling
- 🌍 Keyword and subreddit targeting for broad or niche coverage
- ⚙️ Scales from small tests to bulk runs with parallel processing
- 🧑💻 API- and Python-friendly for developer workflows
- 🛡️ Safer than brittle extensions—handles blocks with proxy fallback and retries
- 💰 Cost-effective automation via Apify infrastructure and dataset exports
- 🔌 Integrations-ready (n8n, Make, Zapier) for end-to-end pipelines
In short, a production-ready Reddit web scraper versus unstable alternatives—built for consistent data extraction at scale.
Is it legal / ethical to use Reddit Posts Scraper?
Yes—when done responsibly. This actor targets publicly available Reddit content and does not access private subreddits or authenticated data.
Guidelines for compliant use:
- Scrape only public data and respect Reddit’s platform policies.
- Do not misuse personal information found in public posts or comments.
- Observe applicable data protection laws (e.g., GDPR, CCPA) in your jurisdiction.
- Use proxy and rate controls to minimize load and reduce the likelihood of blocks.
- Consult your legal team for edge cases or regulated workflows.
Input parameters & output format
Example input (JSON)
{"startUrls": ["https://www.reddit.com/r/news/","news","artificial intelligence"],"maxPosts": 50,"maxComments": 100,"sortOrder": "top","timeFilter": "week","proxyConfiguration": { "useApifyProxy": false }}
Input fields
- startUrls (array of strings, required)
- Description: One per line — mix full Reddit URLs, subreddit names (e.g., news or r/news), or search keywords.
- Default: None
- maxPosts (integer)
- Description: Max posts to scrape per subreddit or keyword (1–1000).
- Default: 50
- maxComments (integer)
- Description: Max comments to fetch per post (0–1000). Set 0 to skip comments.
- Default: 100
- sortOrder (string; one of: hot, new, top, rising)
- Description: How posts are ordered.
- Default: top
- timeFilter (string; one of: hour, day, week, month, year, all)
- Description: Only applies when sortOrder is top or rising.
- Default: week
- proxyConfiguration (object)
- Description: Choose proxies. If blocked, the actor falls back: no proxy → datacenter → residential.
- Default: { "useApifyProxy": false }
Note:
- The actor can also accept “startUrls” as a newline-separated string or as an array with URL objects via API; however, the Console form uses a string list.
Example output item (JSON)
{"post_id": "abc123","title": "Example post title","author": "u_example","created_utc": 1703123456,"num_comments": 42,"score": 156,"permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/","image_url": "https://i.redd.it/xyz.png","thumbnail_url": "https://preview.redd.it/xyz-thumb.jpg","body": "Post content...","comments": [{"author": "u_commenter1","body": "Top-level comment","score": 23,"created_utc": 1703123499,"replies": [{"author": "u_replier","body": "Nested reply","score": 7,"created_utc": 1703123600,"replies": []}]}],"subreddit": "news","success": true,"error_message": null}
Fields like image_url, thumbnail_url, and body may be empty when not present on the original post.
FAQ
Do I need a Reddit account or login to use this?
No. The actor collects publicly available Reddit data without requiring login or cookies. It fetches structured JSON from Reddit endpoints and processes it automatically.
Can it scrape comments as well as posts?
Yes. Set “Maximum comments per post” to a value greater than 0 to fetch comment threads; set it to 0 to skip comments for faster runs.
How many posts can I scrape per source?
You can set “Maximum posts per source” up to 1000. The total output depends on how many sources (subreddits, URLs, or keywords) you provide.
Does it work with proxies and handle blocks?
Yes. It automatically falls back through no proxy → datacenter → residential if Reddit blocks requests, and retries on 403/429, 5xx, timeouts, and connection/SSL issues.
Can I export results to CSV?
Yes. After the run, open the dataset in Apify and export to JSON, CSV, or Excel. You can also access results programmatically via the Apify API.
Is this a Python Reddit scraper I can integrate with my pipeline?
Yes. The actor is implemented in Python and is API-accessible, making it easy to integrate into ETL, analytics, or automation workflows (e.g., n8n, Make, Zapier).
Does it support sorting and time filters like “top this week”?
Yes. Choose hot, new, top, or rising. For top and rising, you can apply a time filter (hour, day, week, month, year, all).
What happens if some posts fail to process?
Each post reports a success flag and error_message if processing fails. The actor saves successful items as they’re scraped so you can still export partial results.
Final thoughts
Reddit Posts Scraper is built to scrape Reddit posts (and optional comments) from subreddits, URLs, or keywords with structured, export-ready output. With sort/time controls, scalable limits, and resilient proxy fallback, it’s ideal for marketers, researchers, analysts, and developers. Trigger it via API to power a Reddit API scraping workflow, connect to automation tools, or export Reddit posts to CSV for downstream analytics. Start extracting smarter Reddit insights—at scale and with confidence.