Reddit Posts Scraper
Pricing
$19.99/month + usage
Reddit Posts Scraper
Scrape Reddit posts with ease π§΅π½ Extract titles, post text, subreddits, usernames, upvotes, comments, timestamps, and links from Reddit threads. Perfect for trend tracking, sentiment analysis, audience research, and content discovery. Turn Reddit data into actionable insights fast π
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
Scrapium
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 days ago
Last modified
Categories
Share
Reddit Posts Scraper
Reddit Posts Scraper is a production-ready Apify actor that lets you scrape Reddit posts and comment threads by subreddit, full URL, or search keyword β fast. It solves the hassle of manual copy-paste and unreliable tools by returning clean, structured JSON for analysis. Marketers, developers, data analysts, and researchers use this Reddit scraper tool to scrape Reddit posts at scale for trend tracking, sentiment analysis, and content discovery. With proxy fallback, batching, and structured exports, it enables large-scale Reddit data pipelines and automation.
What data / output can you get?
Below are the exact fields the actor saves to the dataset (one row per post). You can export results to JSON or CSV, or fetch them via the Apify API.
| Data field | Description | Example value |
|---|---|---|
| subreddit | Community name the post belongs to | "news" |
| title | Post title | "Example post title" |
| author | Reddit username of the poster | "username" |
| score | Upvotes/score of the post | 156 |
| num_comments | Total number of comments | 42 |
| created_utc | Unix timestamp (UTC) | 1703123456.789 |
| permalink | Direct link to the Reddit thread | "https://www.reddit.com/r/news/comments/abc123/..." |
| body | Selftext/body of the post | "Post content..." |
| thumbnail_url | Thumbnail image URL | "https://..." |
| image_url | Main image/media URL (if any) | "https://..." |
| comments | Nested array of comments with replies | [{"author":"commenter1","body":"Comment text...","score":23,"created_utc":1703123456.789,"replies":[]}] |
| post_id | Reddit post ID | "abc123" |
| success | Whether the post was processed successfully | true |
| error_message | Error detail if processing failed | null |
Note: The actor returns structured data for both posts and comments (including nested replies). Fields like author or title may occasionally be "Unknown" or "No Title" if Reddit does not provide them for the post.
Key features
-
β‘ Parallel comment processing
Fetches comments in parallel for high-throughput scraping β ideal for a Reddit thread scraper capturing full discussions. -
π§© Flexible targeting
Input can be subreddits (e.g., news or r/technology), full Reddit URLs, or search keywords β perfect to scrape subreddit posts or Reddit search results. -
π Sort and time filter
Supports sortOrder (hot, new, top, rising) and timeFilter (hour, day, week, month, year, all) for precise Reddit post extractor workflows. -
π Scalable limits
Control maxPosts per source and maxComments per post to tune depth β great for a Reddit bulk post downloader strategy. -
π‘οΈ Proxy fallback and retries
Automatic fallback from no proxy β datacenter β residential with robust retries for blocks (403/429), 5xx, timeouts, and connection/SSL issues β reliable Reddit crawler behavior. -
πΎ Live dataset saving
Pushes each item as itβs processed to avoid data loss β supports incremental pipelines and monitoring. -
π Developer-friendly outputs
Structured JSON ready for analytics, dashboards, and integrations (use the Apify API from your Python Reddit scraper or apps like Make, n8n, Zapier).
How to use Reddit Posts Scraper - step by step
- Sign in to your Apify account at console.apify.com.
- Open the actor β search for βReddit Posts Scraperβ in the Store.
- Enter your sources in startUrls:
- Subreddit names (e.g., news or r/technology)
- Full URLs (e.g., https://www.reddit.com/r/news/)
- Search keywords (e.g., artificial intelligence)
- Configure sorting and time range:
- sortOrder: hot, new, top, rising
- timeFilter: hour, day, week, month, year, all (applies to top/rising)
- Set limits and comments depth:
- maxPosts: number of posts per source (1β1000)
- maxComments: number of comments per post (0β1000; 0 skips comments)
- Set proxyConfiguration as needed (e.g., enable Apify Proxy). The actor automatically falls back to residential if blocked.
- Click Start to run. Watch logs for progress β the actor crawls sources, then processes comments in parallel.
- Open the Output tab to view the βReddit Posts Dataβ dataset. Export to JSON or CSV, or connect via the Apify API.
Pro Tip: Trigger runs programmatically with the Apify API and pipe results into your analytics stack or automation workflows β a robust alternative to a Reddit posts scraping script or PRAW scrape posts setup.
Use cases
| Use case | Description |
|---|---|
| Market & trend research | Aggregate top posts by keyword or subreddit to quantify discussion volume and surface emerging topics. |
| NLP / ML datasets | Collect titles, bodies, and nested comment threads to build labeled corpora for sentiment analysis and topic modeling. |
| Content & SEO | Identify what resonates in your niche, extract quotes, and plan content around high-engagement threads. |
| Brand monitoring | Track mentions across communities, measure sentiment shifts, and flag high-velocity threads in real time. |
| Journalism & research | Compile public Reddit discussions and quotes with timestamps and permalinks for verifiable sourcing. |
| Automation & pipelines | Schedule runs via the Apify API, export JSON/CSV, and sync to BI tools or data lakes as a Reddit API scraper alternative. |
Why choose Reddit Posts Scraper?
Built for reliability and scale, this Reddit data scraper balances speed with resilience β without requiring a browser.
- β Structured and consistent outputs that are analytics-ready (JSON/CSV).
- βοΈ High-throughput comment fetching with parallel processing for Reddit thread scraper use cases.
- π Automatic proxy fallback and robust retries for blocks, 5xx, and timeouts β production-ready reliability.
- π Developer access via the Apify API for integration with Python, ETL tools, and workflow automation.
- π‘ No browser overhead; efficient HTTP-based collection of public endpoints.
- π° Cost-effective and scalable β suitable for small experiments and larger pipelines alike.
- π Better than flaky extensions or manual scripts: stable infrastructure, monitoring, and dataset storage.
In short, itβs a dependable Reddit scraper tool for teams that need consistent, structured Reddit post extraction at scale.
Is it legal / ethical to use Reddit Posts Scraper?
Yes β when done responsibly. This actor is designed for public Reddit content only and does not access private subreddits or authenticated data.
Guidelines for compliant use:
- Scrape only publicly available Reddit content and respect community norms.
- Review Redditβs terms and apply reasonable rate limits using proxyConfiguration as needed.
- Avoid misuse of personal data in line with applicable regulations (e.g., GDPR, CCPA).
- Do not attempt to bypass authentication to access private resources.
- Consult your legal team for edge cases or jurisdiction-specific requirements.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/news/","news","artificial intelligence"],"sortOrder": "top","timeFilter": "week","maxPosts": 50,"maxComments": 100,"proxyConfiguration": { "useApifyProxy": false }}
Parameter details:
-
startUrls (array, required)
Description: Enter one item per line. Mix full URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
Default: none (required) -
maxPosts (integer)
Description: Max number of posts to scrape per subreddit or keyword (1β1000).
Default: 50 -
maxComments (integer)
Description: Max comments to fetch for each post (0β1000). Set to 0 to skip comments and only get post metadata.
Default: 100 -
sortOrder (string)
Description: How Reddit should sort posts β hot (trending), new (latest), top (most upvoted), rising (gaining traction).
Allowed values: "hot", "new", "top", "rising"
Default: "top" -
timeFilter (string)
Description: Time range for results. Only applies when sortOrder is top or rising; ignored for hot and new.
Allowed values: "hour", "day", "week", "month", "year", "all"
Default: "week" -
proxyConfiguration (object)
Description: Choose which proxies to use. If Reddit blocks a request, the actor automatically falls back: no proxy β datacenter β residential. Recommended for large runs or when you hit blocks.
Default: { "useApifyProxy": false }
Example JSON output item
{"subreddit": "news","title": "Example post title","author": "username","score": 156,"num_comments": 42,"created_utc": 1703123456.789,"permalink": "https://www.reddit.com/r/news/comments/abc123/...","body": "Post content...","thumbnail_url": "https://...","image_url": "https://...","comments": [{"author": "commenter1","body": "Comment text...","score": 23,"created_utc": 1703123456.789,"replies": []}],"post_id": "abc123","success": true,"error_message": null}
Notes:
- The comments field contains nested replies (recursive structure).
- Some fields may be "Unknown" or null if not provided by Reddit for a given post.
FAQ
Is there a free tier to try it?
Yes. You can run small jobs on Apifyβs free plan to evaluate the actor before scaling up. Larger workloads may require enabling proxies for reliability.
Does it include comments and replies?
Yes. Set maxComments > 0 to fetch nested comment threads. If you set maxComments to 0, the actor returns only post metadata without comments.
Can I target multiple subreddits or keywords in one run?
Yes. Add as many as you need to startUrls β you can mix subreddit names, full Reddit URLs, and search keywords in the same list.
How does it handle blocks or rate limits?
The actor automatically falls back from no proxy β datacenter β residential and retries on common errors (403/429, 5xx, timeouts, connection/SSL issues).
Which formats can I export to?
You can export the dataset to JSON or CSV from Apify, or access results programmatically via the Apify API.
Can developers integrate this with Python or other workflows?
Yes. Fetch the dataset via the Apify API and plug it into your Python Reddit scraper, data pipelines, or automation tools like Make, n8n, and Zapier.
What types of sources can I input?
You can input subreddits (e.g., news or r/technology), full Reddit URLs, or search keywords to run a Reddit search results scraper workflow.
Does it require a browser or login?
No. This actor collects public Reddit data efficiently without a browser or login, focusing on structured, reliable output.
Closing CTA / Final thoughts
Reddit Posts Scraper is built for structured extraction of public Reddit posts and comment threads at scale. With flexible inputs, sorting and time filters, scalable limits, proxy fallback, and robust retries, it delivers reliable datasets for analysis and automation. Itβs ideal for marketers, developers, data analysts, and researchers who need a dependable Reddit post extractor with JSON/CSV exports and API access. Use the Apify API to wire it into your pipelines or trigger runs from your Python workflows β start extracting smarter Reddit insights today.