Reddit Scraper
Pricing
$19.99/month + usage
Reddit Scraper
🔎 Reddit Scraper (reddit-scraper) extracts posts, comments & metadata from subreddits, users and threads — keywords, timestamps, scores & links. 📤 Export JSON/CSV. 🚀 Ideal for market research, social listening, academic studies & content discovery.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
Scrapium
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Reddit Scraper
Reddit Scraper is an Apify actor that extracts Reddit posts, comment threads, subreddit listings/metadata, and user profiles using Reddit’s public JSON API — no login required. It solves the pain of manually collecting Reddit data by automating discovery and extraction at scale. As a Reddit API scraper and Reddit web scraper built in Python, it’s ideal for marketers, developers, data analysts, and researchers who need to scrape Reddit posts, scrape subreddit posts, or scrape Reddit comments for social listening, market research, and academic studies. Run it once or schedule it to power always-on Reddit data extraction pipelines. 🚀
What data / output can you get?
Here are the key fields the actor returns in the dataset. Values below reflect real field names and structures used by the actor.
| Data type | Description | Example value |
|---|---|---|
| metadata.timestamp | ISO UTC timestamp of the run summary item | 2026-04-12T13:07:24Z |
| metadata.config.maxItemsToSave | Global limit applied to certain queries | 10 |
| data.search. | Count of posts matching a search term | 25 |
| data.search. | Post title from search results | “Best Python tips for data viz” |
| data.subreddit. | Whether a subreddit listing item is a self post | true |
| data.posts. | Post upvote ratio | 0.92 |
| data.posts. | Nested replies array for each comment | [ { … }, … ] |
| data.users. | Sum of link and comment karma | 15342 |
| data.users. | Link to a user’s submitted post | https://reddit.com/r/python/comments/abc123/... |
| data.users. | Truncated comment body (200 chars) | “I’d recommend using aiohttp with backoff…” |
| data.users. | Count of items in overview (if fetched) | 40 |
| data.subreddit_info.popular[].subscribers | Subscriber count for popular subreddits | 845321 |
Notes:
- The actor pushes streaming items for each section (e.g., per search term, per subreddit, per post, per user, subreddit info batches), and finally pushes a single aggregated “summary” object with metadata and a nested data object. You can export results to JSON or CSV directly from the Apify dataset.
Key features
- 🌐 No-login Reddit data extraction
- Uses Reddit’s public JSON API to collect public data; perfect for a Reddit scraper Python workflow without cookies or auth.
- 🔗 Bulk Start URLs (posts, subreddits, users)
- Paste multiple Reddit URLs to scrape Reddit posts, scrape subreddit posts by sort, and collect user profiles with submitted posts and comments.
- 🔎 Flexible search with sort & time filters
- Run keyword searches across Reddit or within a specific community, with sort (relevance/new/hot/top/comments) and time filter (hour/day/week/month/year/all).
- 🧵 Post + full comment tree (depth control)
- Capture a post and its nested comments, with configurable max depth to tailor “Reddit comment extractor” needs.
- 👤 User profiles, submissions, and comments
- Fetch profile karma and history with per-user item limits; optionally add a concise overview summary.
- 🏘️ Subreddit listings & metadata
- Collect hot/new/top/rising/controversial listings and enrich with subreddit_info: popular, new, and specific communities’ details.
- 🧰 Limits & performance controls
- Cap items per run, per page, and comments per page; set delays between requests; constrain comment tree depth for predictable output sizes.
- 🔄 Smart proxy fallback for reliability
- Starts direct; on blocks automatically falls back to datacenter, then residential proxies (with retries) to keep your Reddit crawler running smoothly.
- 💾 Developer friendly outputs
- Clean JSON structures ready for pipelines, dashboards, or ML — ideal for Reddit data extraction and Reddit sentiment analysis scraper use cases.
- 📤 Easy export
- Export datasets to JSON/CSV via the Apify platform for downstream analytics and automation.
How to use Reddit Scraper - step by step
- Sign in to Apify Console.
- Open the Reddit Scraper actor (reddit-scraper) from your Actors.
- Add input:
- Start URLs: paste Reddit post URLs (/comments/…), subreddit URLs (/r/…), or user URLs (/user/…).
- Or provide Search Term(s) and enable “Ignore start URLs” to run search-only jobs.
- Configure key settings:
- Sort and time filters for search.
- Limits: maximum items to save, posts/comments per page, max comment depth, and request delay.
- Toggles: Enable/disable subreddit posts, post+comments, user scraper, subreddit info.
- Proxy settings: leave off to start direct; the actor auto-falls back to datacenter → residential if blocked.
- Start the run.
- Watch progress in the Log. The actor streams items to the dataset as it completes each section.
- Review results in the Dataset.
- You’ll see per-section items (e.g., type: “search”, “subreddit”, “post”, “user”, “subreddit_info”) and a final aggregated object with metadata and nested “data”.
- Export your data.
- Download JSON/CSV from the Dataset tab or use the Apify API in your Reddit API scraper pipeline.
Pro tip: Schedule recurring runs for ongoing monitoring, or connect the Apify Dataset API to your Reddit scraper Python workflows and data warehouses.
Use cases
| Use case name | Description |
|---|---|
| Market research & trend tracking | Analyze topics, upvotes, and comment volume to quantify interest and sentiment in your niche. |
| Social listening for brands | Monitor subreddit posts and comment threads to surface complaints, ideas, and feature requests. |
| Academic & policy research | Collect reproducible datasets of posts and comments for studies, leveraging a structured Reddit data extractor. |
| Competitive intelligence | Track competitor communities and product feedback across relevant subreddits. |
| Content strategy & SEO | Discover high-performing threads and questions to inform content calendars and keyword targeting. |
| Customer support insights | Harvest user pain points from comments for faster triage and product improvements. |
| Data engineering pipelines | Build automated Reddit crawler pipelines using Apify’s API and export JSON/CSV into your analytics stack. |
Why choose Reddit Scraper?
This production-ready Reddit scraping tool emphasizes precision, automation, and reliability.
- ✅ Accurate JSON from public endpoints: Clean, structured fields straight from Reddit’s public JSON API.
- 🌍 Scalable and flexible: Bulk Start URLs and keyword searches with configurable limits and depths.
- 🧑💻 Developer-ready: Structured outputs fit data pipelines; perfect for Reddit scraper Python integration and API-driven workflows.
- 🔐 Safe by design: No login required; collects public data only.
- 🔄 Resilient infrastructure: Automatic proxy fallback (direct → datacenter → residential) keeps runs stable under rate limiting.
- 💸 Cost-effective alternative: Avoid brittle browser extensions and manual copy-paste with a robust, server-side Reddit web scraper.
Bottom line: It’s a reliable Reddit post scraper and Reddit data extraction tool that balances control, scale, and clean outputs.
Is it legal / ethical to use Reddit Scraper?
Yes — when used responsibly. This actor accesses publicly available Reddit data without authentication and does not target private content. You should:
- Scrape only public data.
- Respect Reddit’s terms and rate limits.
- Observe applicable data protection laws (e.g., GDPR, CCPA) and internal policies.
- Avoid misuse (e.g., unsolicited outreach/spam). Always verify compliance with your legal team for your specific use case.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/python/","https://www.reddit.com/r/dataisbeautiful/","https://www.reddit.com/r/Python/comments/abc123/example_post_title/","https://www.reddit.com/user/spez/"],"searchTerms": ["data visualization", "asyncio"],"searchCommunity": "python","ignoreStartUrls": false,"sortSearch": "new","timeFilter": "all","enableSearch": true,"enableSubreddit": true,"sortSubreddit": "hot","enablePost": true,"enableUser": true,"enableSubredditInfo": true,"skipComments": false,"skipUserPosts": false,"skipCommunity": false,"maxItemsToSave": 10,"limitPostsPerPage": 10,"limitCommentsPerPage": 10,"limitCommunityPages": 2,"limitUserPages": 2,"pageScrollTimeout": 40,"maxCommentDepth": 5,"maxItemsPerUser": 20,"fetchUserProfile": true,"fetchUserSubmitted": true,"fetchUserComments": true,"fetchUserOverview": false,"fetchPopularSubreddits": false,"fetchNewSubreddits": false,"maxSubredditsInfo": 25,"requestDelaySeconds": 1,"proxyConfiguration": {"useApifyProxy": false}}
All input fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrls | array | Yes | — | One or more Reddit URLs to scrape. Supports bulk. Accepts post (/comments/…), subreddit (/r/…), and user (/user/…) URLs. |
| skipComments | boolean | No | false | Do not fetch comments for any post URLs; only post-level data is collected. |
| skipUserPosts | boolean | No | false | Ignore user profile URLs; no user data is scraped. |
| skipCommunity | boolean | No | false | Do not fetch subreddit/community metadata. |
| searchTerms | array | No | — | One or more search phrases. Used in search mode (no Start URLs or Ignore start URLs on). |
| searchCommunity | string | No | — | Optional subreddit name (without r/). Restricts search to a specific community. |
| ignoreStartUrls | boolean | No | false | Skip all Start URLs and run search-only mode (requires at least one Search Term). |
| searchForPosts | boolean | No | true | Include posts in search results. |
| searchForComments | boolean | No | false | Reserved for future use: include comments in the search scope. |
| searchForCommunities | boolean | No | false | Reserved for future use: include communities in the search scope. |
| searchForUsers | boolean | No | false | Reserved for future use: include users in the search scope. |
| sortSearch | string | No | new | Order search results: relevance, new, hot, top, comments. |
| timeFilter | string | No | all | Search time window: hour, day, week, month, year, all. |
| filterByDate | string | No | — | Optional absolute or relative filter for posts (e.g., 2024-01-15 or “2 weeks”). |
| enableSearch | boolean | No | true | Run Reddit search when in search mode and terms are provided. |
| enableSubreddit | boolean | No | true | Process subreddit URLs to fetch listings by sort. |
| sortSubreddit | string | No | hot | Subreddit listing sort: hot, new, top, rising, controversial. |
| enablePost | boolean | No | true | Process post URLs to fetch full post and comment tree (unless Skip comments is on). |
| enableUser | boolean | No | true | Process user URLs to fetch profile, submitted posts, and comments. |
| enableSubredditInfo | boolean | No | true | Fetch subreddit metadata for popular/new/specific communities. |
| maxItemsToSave | integer | No | 10 | Global maximum number of items to collect across sources. |
| limitPostsPerPage | integer | No | 10 | Maximum posts to fetch from a single subreddit listing page. |
| postDateLimit | string | No | — | Only include posts created on or after this date (YYYY-MM-DD). |
| limitCommentsPerPage | integer | No | 10 | Maximum number of comments retrieved per post. |
| limitCommunityPages | integer | No | 2 | Maximum listing pages to paginate for each subreddit URL. |
| limitUserPages | integer | No | 2 | Maximum pages to fetch for user submissions and comments. |
| pageScrollTimeout | integer | No | 40 | Timeout in seconds for page/scroll operations. |
| maxCommentDepth | integer | No | 5 | Maximum nested comment depth to parse (1–20). |
| maxItemsPerUser | integer | No | 20 | Max submitted posts and comments per user profile. |
| fetchUserProfile | boolean | No | true | Fetch user profile details (name, karma, created). |
| fetchUserSubmitted | boolean | No | true | Fetch posts submitted by each user. |
| fetchUserComments | boolean | No | true | Fetch comments made by each user. |
| fetchUserOverview | boolean | No | false | Fetch combined user overview and add summary counts. |
| fetchPopularSubreddits | boolean | No | false | Fetch Reddit’s popular subreddit list. |
| fetchNewSubreddits | boolean | No | false | Fetch recently created subreddits list. |
| maxSubredditsInfo | integer | No | 25 | Maximum number of subreddits for popular/new lists. |
| requestDelaySeconds | integer | No | 1 | Delay between HTTP requests to Reddit in seconds. |
| proxyConfiguration | object | No | { "useApifyProxy": false } | Configure Apify Proxy. Actor auto-falls back to datacenter → residential on blocks if not set. |
Output format
During the run, the actor streams items to the dataset by section and then pushes a final aggregated object. Examples:
- Search item
{"type": "search","query": "data visualization","community": "python","total": 3,"posts": [{"title": "Matplotlib vs Plotly for dashboards","subreddit": "Python","author": "chart_ninja","score": 128,"num_comments": 42,"url": "https://example.com/post-url","permalink": "https://reddit.com/r/Python/comments/abc123/...","created_utc": 1712875200}]}
- Subreddit listing item
{"type": "subreddit","source": "python","total": 2,"posts": [{"title": "Asyncio tips for web scraping","author": "py_async","score": 210,"num_comments": 33,"url": "https://example.com/post-url","permalink": "https://reddit.com/r/Python/comments/def456/...","created_utc": 1712878800,"is_self": true,"selftext": "Here are some tips..."}]}
- Post with comments item
{"type": "post","postId": "abc123","post": {"id": "abc123","title": "Show HN: My Python scraper","author": "dev_user","subreddit": "Python","score": 512,"upvote_ratio": 0.95,"num_comments": 87,"created_utc": 1712871000,"url": "https://example.com/post-url","permalink": "https://reddit.com/r/Python/comments/abc123/...","is_self": true,"selftext": "I built a scraper...","link_flair_text": "Project","over_18": false},"comments": [{"id": "c1","author": "commenter1","body": "Nice work!","score": 15,"created_utc": 1712874600,"depth": 0,"replies": []}],"total_comments_parsed": 1}
- User item
{"type": "user","username": "spez","profile": {"name": "spez","link_karma": 1000,"comment_karma": 500,"total_karma": 1500,"created_utc": 1133212800},"submitted": [{"title": "Announcement","subreddit": "announcements","score": 999,"created_utc": 1712870000,"permalink": "https://reddit.com/r/announcements/comments/ghi789/..."}],"comments": [{"body": "Thanks for the feedback.","subreddit": "redditdev","score": 42,"created_utc": 1712873600,"permalink": "https://reddit.com/r/redditdev/comments/jkl012/..."}],"overview": null}
- Subreddit info items
{"type": "subreddit_info","kind": "popular","count": 2,"subreddits": [{"name": "python","title": "Python","description": "News about the programming language Python","subscribers": 1000000,"active_users": 8500,"created_utc": 1200000000,"over_18": false,"subreddit_type": "public","url": "https://reddit.com/r/python/","icon_img": "","banner_img": "","community_icon": ""}]}
- Final aggregated object (summary)
{"metadata": {"timestamp": "2026-04-12T13:07:24Z","config": {"maxItemsToSave": 10,"limitPostsPerPage": 10,"maxCommentDepth": 5}},"data": {"search": {"data visualization": {"total": 3,"posts": [{"title": "Matplotlib vs Plotly for dashboards","subreddit": "Python","author": "chart_ninja","score": 128,"num_comments": 42,"url": "https://example.com/post-url","permalink": "https://reddit.com/r/Python/comments/abc123/...","created_utc": 1712875200}]}},"subreddit": {"python": {"total": 2,"posts": [{"title": "Asyncio tips for web scraping","author": "py_async","score": 210,"num_comments": 33,"url": "https://example.com/post-url","permalink": "https://reddit.com/r/Python/comments/def456/...","created_utc": 1712878800,"is_self": true,"selftext": "Here are some tips..."}]}},"posts": {"abc123": {"post": { "id": "abc123", "title": "Show HN: My Python scraper", "author": "dev_user", "subreddit": "Python", "score": 512, "upvote_ratio": 0.95, "num_comments": 87, "created_utc": 1712871000, "url": "https://example.com/post-url", "permalink": "https://reddit.com/r/Python/comments/abc123/...", "is_self": true, "selftext": "I built a scraper...", "link_flair_text": "Project", "over_18": false },"comments": [ { "id": "c1", "author": "commenter1", "body": "Nice work!", "score": 15, "created_utc": 1712874600, "depth": 0, "replies": [] } ],"total_comments_parsed": 1}},"users": {"spez": {"profile": { "name": "spez", "link_karma": 1000, "comment_karma": 500, "total_karma": 1500, "created_utc": 1133212800 },"submitted": [ { "title": "Announcement", "subreddit": "announcements", "score": 999, "created_utc": 1712870000, "permalink": "https://reddit.com/r/announcements/comments/ghi789/..." } ],"comments": [ { "body": "Thanks for the feedback.", "subreddit": "redditdev", "score": 42, "created_utc": 1712873600, "permalink": "https://reddit.com/r/redditdev/comments/jkl012/..." } ],"overview": { "total_items": 0, "posts_count": 0, "comments_count": 0 }}},"subreddit_info": {"popular": [{ "name": "python", "title": "Python", "description": "News about the programming language Python", "subscribers": 1000000, "active_users": 8500, "created_utc": 1200000000, "over_18": false, "subreddit_type": "public", "url": "https://reddit.com/r/python/", "icon_img": "", "banner_img": "", "community_icon": "" }],"new": [],"specific": {"python": { "name": "python", "title": "Python", "description": "News about the programming language Python", "subscribers": 1000000, "active_users": 8500, "created_utc": 1200000000, "over_18": false, "subreddit_type": "public", "url": "https://reddit.com/r/python/", "icon_img": "", "banner_img": "", "community_icon": "" }}}}}
Notes:
- Depending on your toggles, some optional objects may be missing or null (e.g., user.profile when disabled; overview when off).
- Export JSON or CSV from the Dataset UI or via the Apify API.
FAQ
Do I need to log in or use API keys to run this Reddit Scraper?
No. The actor uses Reddit’s public JSON API and does not require login or API keys. It’s a Reddit scraper Python solution that works on publicly available data only.
Can it scrape Reddit comments as well as posts?
Yes. When “Enable post + comments” is on and “Skip comments” is off, it fetches full posts and parses comment trees up to the “Max comment tree depth” you set.
How do I restrict search to a specific subreddit?
Provide your Search Term(s) and set “Community (optional)” to the subreddit name (without r/). Enable “Ignore start URLs” if you want search-only mode.
What limits control the volume of data?
Use “Maximum number of items to be saved,” “Limit of posts scraped inside a single page,” “Limit of comments scraped inside a single page,” “Max items per user,” and “Max comment tree depth.” You can also adjust “Delay between requests (seconds)” to manage pacing and reduce rate limiting.
How reliable is it when Reddit rate-limits or blocks requests?
The actor starts with direct connections and automatically falls back to Apify datacenter proxy, then residential proxy (with retries) if blocked. Once a working proxy is found, it’s used for all remaining requests.
What output formats are supported?
Data is stored in the Apify Dataset. You can export to JSON or CSV and consume via the Apify API for integration into your Reddit API scraper pipelines.
Can I integrate this with my own Python scripts or data pipelines?
Yes. Access the Dataset via the Apify API and process outputs in your scripts. The structured JSON is designed for easy ingestion in automation, analytics, or ML workflows.
Is it safe and compliant to use?
Yes — when used responsibly on public content. The actor does not access private or authenticated data. Always respect Reddit’s terms and applicable data protection laws.
Final thoughts
Reddit Scraper is built for scalable, structured Reddit data extraction without login. With bulk URL support, flexible search, full comment parsing, user activity capture, and resilient proxy fallback, it’s ideal for marketers, developers, data analysts, and researchers. Export clean JSON/CSV, connect via the Apify API, and automate your Reddit web scraper workflows with confidence. Start extracting smarter insights from Reddit today.