Reddit Posts Scraper
Pricing
$19.99/month + usage
Reddit Posts Scraper
🧰 Reddit Posts Scraper extracts Reddit post data by subreddit, keyword, or URL—titles, authors, flairs, scores, upvotes, comments, timestamps, links & media. 📊 Export CSV/JSON. 🔎 Perfect for trend tracking, sentiment analysis, content research & social listening. 🚀
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapeMesh
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
16 days ago
Last modified
Categories
Share
Reddit Posts Scraper
Reddit Posts Scraper is a fast, reliable Reddit scraper that lets you scrape Reddit posts and comments by subreddit, full thread URL, or keyword — returning clean, structured JSON for analysis. It solves the challenge of extracting at scale from Reddit by handling sort orders, time filters, blocks, and concurrency automatically. Built for marketers, developers, data analysts, and researchers, this Reddit thread scraper and Reddit comments scraper uses proxy fallback and parallel comment fetching to help you track trends, run sentiment analysis, and power automation workflows at scale. 🚀
What data / output can you get?
The actor saves results to a dataset where each item represents one Reddit post with structured fields. You can export Reddit posts to CSV or JSON.
| Data type | Description | Example value |
|---|---|---|
| subreddit | Community name | "news" |
| title | Post title | "Breaking: Major update in AI policy announced" |
| author | Reddit username of the poster | "u/example_user" |
| score | Post score/upvotes | 156 |
| num_comments | Number of comments | 42 |
| created_utc | Post created time as Unix timestamp (UTC) | 1703123456 |
| permalink | Link to the Reddit thread | "https://www.reddit.com/r/news/comments/abc123/example_post/" |
| body | Selftext/body of the post (if any) | "Here’s what changed and why it matters…" |
| thumbnail_url | Thumbnail image URL | "https://preview.redd.it/..." |
| image_url | Main image/destination URL (if any) | "https://i.redd.it/..." |
| comments | Nested array of comments with replies | [{"author":"commenter1","body":"Great news!","score":23,"created_utc":1703123499,"replies":[]}] |
| post_id | Unique Reddit post ID | "abc123" |
| success | Whether the post was processed successfully | true |
| error_message | Error details if processing failed | null |
Notes:
- Comments are returned as nested arrays with fields: author, body, score, created_utc, replies.
- Exports available as JSON and CSV from the dataset.
Key features
-
⚡ Parallel comment fetching
Fetch comments for multiple posts concurrently with controlled concurrency for speed and stability. -
🛡️ Automatic proxy fallback
Robust Reddit scraping tool with “no proxy → datacenter → residential” fallback to handle blocks (403/429) and keep runs stable. -
🧭 Sort & time filtering
Choose sortOrder: hot, new, top, rising and optionally apply a timeFilter (hour, day, week, month, year, all) when using top or rising. -
📏 Configurable limits
Set maxPosts per source (1–1000) and maxComments per post (0–1000). Use maxComments: 0 to skip comments for faster runs. -
📤 Structured JSON output
Clean post-level records including subreddit, title, author, score, timestamps, links, media, and nested comments — ideal for analytics and dashboards. -
🔁 Resilient retries
Automatic retries for rate limits, timeouts, upstream 5xx errors, and connection/SSL issues to reduce false failures. -
💾 Real-time dataset saving
Results are pushed to the dataset as they’re scraped, so partial data is preserved even if a run stops. -
🐍 Python-based on Apify
Built with Python and aiohttp for efficient, scalable extraction — a great fit for “Reddit scraper Python” workflows.
How to use Reddit Posts Scraper - step by step
- Sign in to Apify and open the Reddit Posts Scraper.
- Add your sources in “Reddit URLs / Subreddits / Keywords” (one per line). You can mix:
- Full URLs like https://www.reddit.com/r/news/
- Subreddits like news or r/technology
- Search keywords like artificial intelligence
- Configure sorting & timing:
- sortOrder: hot, new, top, rising
- timeFilter: hour, day, week, month, year, all (applies only to top or rising)
- Set limits:
- maxPosts per source (1–1000)
- maxComments per post (0–1000; set 0 to skip comments)
- Configure proxyConfiguration if needed. The actor can fall back automatically to datacenter/residential proxies if blocked.
- Start the run and monitor logs for progress. The scraper collects post metadata first, then processes comments (if enabled) in parallel.
- Download results from the dataset. Export Reddit posts to CSV or JSON for use in your BI tools or automations.
Pro Tip: Combine multiple subreddits and keywords in one run to quickly compare topics, track trends, or build richer datasets to scrape Reddit posts at scale.
Use cases
| Use case | Description |
|---|---|
| Market & trend research | Aggregate top posts and discussions by subreddit or keyword to quantify topics over time. |
| Sentiment & NLP datasets | Build labeled text corpora from titles, bodies, and nested comments for modeling and analysis. |
| Content & editorial planning | Discover high-performing threads to guide content strategy and ideation. |
| Social listening | Monitor public conversations and extract signals from Reddit threads for brand/research insights. |
| Competitive & product analysis | Track discussions around features, pain points, and releases across relevant communities. |
| Academic & journalism research | Collect transparent, attributable public discourse for qualitative/quantitative studies. |
| Data pipelines & dashboards | Feed structured JSON into analytics stacks for automated reporting on Reddit search results and subreddits. |
Why choose Reddit Posts Scraper?
Reddit Posts Scraper is engineered for precision, scale, and reliability — a production-ready Reddit data extractor without browser overhead.
- ✅ Accuracy with structured fields: Consistent post- and comment-level JSON, not brittle HTML.
- ⚡ Scale-ready: Parallel comment processing with smart concurrency for large sources.
- 🔐 Reliable under pressure: Automatic proxy fallback and robust retries for blocks, 5xx, timeouts, and connection issues.
- 🧭 Flexible targeting: Scrape by subreddit, full Reddit URL, or keyword in a single list.
- 📤 Analytics-friendly output: Export JSON/CSV and plug into downstream tools with minimal wrangling.
- 🧩 Developer-friendly: Python-based architecture fits automation and integration workflows.
- 🛡️ Safer than extensions: No unstable browser extensions — a server-side Reddit scraping tool built for repeatability.
In short, it’s a robust Reddit scraper that balances speed, stability, and clean output for serious use cases.
Is it legal / ethical to use Reddit Posts Scraper?
Yes — when used responsibly. This actor is designed to extract publicly available Reddit content. Do not scrape private communities without permission and avoid misuse of personal data. Always respect platform terms and applicable data protection laws (e.g., GDPR/CCPA). For edge cases, consult your legal team and ensure your usage complies with your policies.
Input parameters & output format
Example input (JSON):
{"startUrls": ["https://www.reddit.com/r/news/","news","artificial intelligence"],"sortOrder": "top","timeFilter": "week","maxPosts": 50,"maxComments": 100,"proxyConfiguration": { "useApifyProxy": false }}
Parameter reference:
-
startUrls (array of strings, required)
Description: One item per line — mix full URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
Default: none -
maxPosts (integer)
Description: Max number of posts to scrape per subreddit or keyword (1–1000).
Default: 50 -
maxComments (integer)
Description: Max comments to fetch for each post (0–1000). Set to 0 to skip comments and only get post metadata.
Default: 100 -
sortOrder (string enum: hot, new, top, rising)
Description: How Reddit should sort the posts.
Default: top -
timeFilter (string enum: hour, day, week, month, year, all)
Description: Time range for results. Only applies when sortOrder is top or rising; ignored for hot and new.
Default: week -
proxyConfiguration (object)
Description: Choose which proxies to use. If Reddit blocks a request, the actor can fall back automatically: no proxy → datacenter → residential.
Default: { "useApifyProxy": false }
Example output item (JSON):
{"subreddit": "news","title": "Example post title","author": "username","score": 156,"num_comments": 42,"created_utc": 1703123456,"permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/","body": "Post content...","thumbnail_url": "https://...","image_url": "https://...","comments": [{"author": "commenter1","body": "Comment text...","score": 23,"created_utc": 1703123499,"replies": []}],"post_id": "abc123","success": true,"error_message": null}
Notes:
- Some fields can be empty or “Unknown” if Reddit does not provide the value (e.g., author on deleted posts, body for link posts).
- When maxComments is 0, comments is an empty array and the actor skips the comment-fetching phase.
FAQ
Does it scrape comments as well as posts?
Yes. Set maxComments > 0 to fetch nested comments for each post. Set maxComments to 0 if you only need post metadata and want faster runs.
Can I mix subreddits, keywords, and full URLs in one run?
Yes. Add any combination of subreddit names (e.g., news or r/news), full Reddit URLs, and search keywords in startUrls — one per line.
What sort options and time ranges are supported?
sortOrder supports hot, new, top, and rising. timeFilter supports hour, day, week, month, year, and all, and it only applies when sortOrder is top or rising.
How many posts can I scrape per source?
You can scrape between 1 and 1000 posts per subreddit or keyword using maxPosts. Combine multiple sources in startUrls to scale further.
What output formats are supported?
You can export the dataset to JSON or CSV. Each item includes subreddit, title, author, score, timestamps, links, media URLs, and nested comments.
How does the scraper handle blocks and rate limits?
It includes robust retry logic and automatic proxy fallback (no proxy → datacenter → residential) for 403/429 blocks, upstream 5xx, timeouts, and connection/SSL issues.
Is this built with Python? Can I integrate it into my data pipeline?
Yes. The actor is Python-based and produces structured JSON that fits well into analytics pipelines and automation workflows.
Closing CTA / Final thoughts
Reddit Posts Scraper is built to extract structured Reddit post and comment data reliably at scale. With flexible inputs (subreddits, URLs, keywords), configurable limits, sort/time filters, and automatic proxy fallback, it delivers clean JSON that’s ready for research, analytics, and automation. Ideal for marketers, developers, data analysts, and researchers, you can export Reddit posts to CSV/JSON and plug them straight into your dashboards or workflows. Start extracting smarter Reddit insights today with a production-ready Reddit scraper.