Reddit Scraper
Pricing
$19.99/month + usage
Reddit Scraper
Extract posts, comments, and user data from Reddit with the Reddit Scraper. Collect post titles, descriptions, upvotes, comment counts, subreddit names, and author usernames automatically. Ideal for market research, trend discovery, and community analysis.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScrapAPI
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Reddit Scraper
The Reddit Scraper is a production-ready Reddit web scraper that collects public posts, comments, subreddit listings, and user data at scale — without login or OAuth. It solves the pain of manual copy-paste and API credential management by leveraging Reddit’s public JSON endpoints, making it ideal for marketers, developers, data analysts, and researchers who need a reliable Reddit data scraper. With bulk Start URLs and keyword search, you can scrape subreddit posts, extract comment threads, or build user activity datasets — enabling research, monitoring, and enrichment pipelines at scale.
What data / output can you get?
This actor pushes incremental dataset items for each scraped section and finishes with a single summary object. Below are key output fields you can expect across items:
| Data type | Description | Example value |
|---|---|---|
| title | Post title from search or subreddit listings | "Show HN: I built a Python library" |
| subreddit | Subreddit name | "python" |
| author | Post or comment author | "someuser" |
| score | Score/upvotes on posts or comments | 245 |
| num_comments | Number of comments on a post | 56 |
| url | Canonical post URL | "https://www.reddit.com/r/python/comments/abc123/my_post/" |
| permalink | Reddit permalink for the post or item | "https://reddit.com/r/python/comments/abc123/..." |
| created_utc | UNIX timestamp (UTC) | 1712072000 |
| is_self | Self/text post flag (subreddit listings) | false |
| selftext | Post selftext (truncated in listings) | "Here’s how it works..." |
| upvote_ratio | Upvote ratio for a post (post+comments) | 0.95 |
| total_comments_parsed | Top-level comments parsed for the post | 10 |
| profile.name | Scraped user profile name | "reddit_user" |
| profile.total_karma | Total user karma | 12450 |
| submitted[].title | Title of a user’s submitted post | "My first project" |
| comments[].body | Body of a user’s comment (truncated) | "I agree with this..." |
| overview.total_items | Count of items in user overview | 40 |
| subreddit_info[].subscribers | Subscriber count for a subreddit | 1543200 |
| subreddit_info[].subreddit_type | Type of community | "public" |
Notes:
- The dataset can be exported to JSON or CSV, or accessed via the Apify API.
- Bonus metadata is included in the final item under metadata.timestamp and metadata.config.
Key features
-
🚀 Bulk Start URLs & smart detection
Paste post, subreddit, or user profile links into startUrls; the actor detects each type and runs the appropriate scraper (post+comments, subreddit listing, or user profile). -
🔎 Search with sort and time window
Use searchTerms with sortSearch and timeFilter to run keyword-driven collection across Reddit or within a specific community (searchCommunity) — ideal for trend discovery. -
💬 Post + comments with depth control
Scrape full posts and parse comment trees up to maxCommentDepth. Toggle skipComments to collect only post-level data when needed. -
📰 Subreddit listings by sort
Collect posts from subreddits with sortSubreddit (hot, new, top, rising, controversial) and limit the volume with limitPostsPerPage. -
👤 User profiles, submissions, and comments
Fetch user profiles (fetchUserProfile), submitted posts (fetchUserSubmitted), comments (fetchUserComments), and optional overview summaries (fetchUserOverview). -
🧠 Robust proxy fallback
Starts without a proxy; on block or rate-limit, automatically falls back to Apify datacenter and then residential proxies (up to 3 retries). You can also preconfigure proxyConfiguration. -
⏱️ Rate controls and limits
Fine-tune requestDelaySeconds and global caps like maxItemsToSave, limitCommentsPerPage, and maxItemsPerUser for predictable, efficient runs. -
📦 Structured, developer-friendly output
Output is pushed incrementally per section, plus a single final aggregate with metadata and data (search, subreddit, posts, users, subreddit_info) for easy downstream processing via the Apify API. -
🔐 No login required
Collect public Reddit data without authentication, making it a great Reddit scraping tool for “scraping without API” (no OAuth), and a practical alternative to PRAW/Pushshift-based workflows.
How to use Reddit Scraper - step by step
- Sign in to Apify Console and open the Reddit Scraper actor.
- Add input under Start URLs (post, subreddit, or user profile links). For search-only runs, you can leave this empty and enable ignoreStartUrls.
- Configure search (optional): add searchTerms, set searchCommunity (optional), choose sortSearch and timeFilter.
- Choose what to scrape: toggle enableSearch, enableSubreddit, enablePost, enableUser, and enableSubredditInfo as needed. Use skipComments, skipUserPosts, or skipCommunity to omit specific data.
- Set limits and controls: maxItemsToSave, limitPostsPerPage, limitCommentsPerPage, maxCommentDepth, maxItemsPerUser, and requestDelaySeconds.
- Subreddit info (optional): fetchPopularSubreddits, fetchNewSubreddits, and cap with maxSubredditsInfo.
- Proxy setup: leave default (automatic fallback) or configure proxyConfiguration to use Apify Proxy from the start.
- Run the actor. Monitor progress in the Log tab; results will stream into the Dataset.
- Export results to JSON or CSV from the Dataset, or fetch them programmatically via the Apify API.
Pro tip: To run pure keyword searches across Reddit without scraping any specific URLs, provide searchTerms and enable ignoreStartUrls.
Use cases
| Use case name | Description |
|---|---|
| Market research & trend discovery | Track topics and keywords across subreddits using searchTerms with sortSearch/timeFilter to quantify interest and identify emerging trends. |
| Community analysis | Collect subreddit listings and subreddit_info to evaluate engagement (subscribers, active users), post types, and content themes. |
| Voice-of-customer (VoC) mining | Scrape Reddit comments and user submissions around products or features to extract qualitative insights for product and UX teams. |
| Competitor tracking | Monitor posts and comments mentioning competitors, aggregating signals from multiple subreddits and searches. |
| Academic & social research | Build datasets of posts, comments, and community metadata for reproducible studies using consistent, structured fields. |
| Brand monitoring | Focus on brand keywords within a specific subreddit (searchCommunity) to analyze sentiment and feedback patterns. |
| Data engineering pipelines | Use the Apify API to integrate incremental dataset items (search, subreddit, post, user, subreddit_info) into ETL pipelines and analytics stacks. |
Why choose Reddit Scraper?
The Reddit Scraper focuses on reliable, structured extraction with automatic resilience to blocks.
- 🎯 Accurate, structured fields: Unified post/comment/user/subreddit schemas with consistent keys across outputs.
- 📈 Built to scale: Bulk Start URLs and searchTerms with global and per-section limits keep runs predictable.
- 🔧 Developer-friendly: Stream results into the Apify Dataset and fetch via API in any language (e.g., Reddit scraper Python or Node.js workflows).
- 🛡️ Automatic proxy fallback: Seamlessly switches from direct to datacenter to residential proxies on block or rate-limit.
- 🔒 No credentials needed: Works without login or OAuth; leverages public JSON endpoints (reddit.com/*.json).
- 💾 Easy exports: Download JSON/CSV or integrate via Apify API for automation and enrichment.
- 🧩 Better than brittle alternatives: More stable than browser extensions and less operational overhead than maintaining a custom PRAW/Pushshift Reddit API scraper.
In short, this Reddit data extractor combines reliability, flexibility, and structured outputs suited for research and production use.
Is it legal / ethical to use Reddit Scraper?
Yes — when used responsibly. This actor scrapes only publicly available Reddit data and does not log in or access private content.
Guidelines to follow:
- Scrape only public endpoints and respect Reddit’s terms and rate limits.
- Avoid collecting personal/private data and use results for lawful purposes.
- Ensure compliance with applicable regulations (e.g., GDPR, CCPA) in your jurisdiction.
- Consult your legal team if you have edge cases or regulatory constraints.
Input parameters & output format
Example JSON input
{"startUrls": ["https://www.reddit.com/r/python/","https://www.reddit.com/r/datascience/comments/abc123/example_post_title/","https://www.reddit.com/user/someuser/"],"skipComments": false,"skipUserPosts": false,"skipCommunity": false,"searchTerms": ["python scraping", "data analysis"],"searchCommunity": "python","ignoreStartUrls": false,"searchForPosts": true,"searchForComments": false,"searchForCommunities": false,"searchForUsers": false,"sortSearch": "new","timeFilter": "all","filterByDate": "","enableSearch": true,"enableSubreddit": true,"sortSubreddit": "hot","enablePost": true,"enableUser": true,"enableSubredditInfo": true,"maxItemsToSave": 10,"limitPostsPerPage": 10,"postDateLimit": "","limitCommentsPerPage": 10,"limitCommunityPages": 2,"limitUserPages": 2,"pageScrollTimeout": 40,"maxCommentDepth": 5,"maxItemsPerUser": 20,"fetchUserProfile": true,"fetchUserSubmitted": true,"fetchUserComments": true,"fetchUserOverview": false,"fetchPopularSubreddits": false,"fetchNewSubreddits": false,"maxSubredditsInfo": 25,"requestDelaySeconds": 1,"proxyConfiguration": {"useApifyProxy": false}}
Parameters reference
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrls | array | Yes | — | One or more Reddit URLs to scrape. Accepts post (/comments/…), subreddit (/r/…), or user (/user/…) links. Bulk input supported. |
| skipComments | boolean | No | false | Do not fetch comments for post URLs; collect only post-level data. |
| skipUserPosts | boolean | No | false | Ignore user profile URLs; no user data scraped. |
| skipCommunity | boolean | No | false | Do not fetch subreddit/community metadata. Post listings still run if enabled. |
| searchTerms | array | No | — | One or more keyword phrases for Reddit search. Runs when non-empty and you have no Start URLs or ignoreStartUrls is true. |
| searchCommunity | string | No | — | Restrict search to a specific subreddit (without the r/ prefix). |
| ignoreStartUrls | boolean | No | false | Skip all Start URLs and run search-only (requires at least one search term). |
| searchForPosts | boolean | No | true | Include posts in search results. |
| searchForComments | boolean | No | false | Reserved for future use: include comments in search. |
| searchForCommunities | boolean | No | false | Reserved for future use: include communities in search. |
| searchForUsers | boolean | No | false | Reserved for future use: include users in search. |
| sortSearch | string | No | "new" | Order search results by: relevance, new, hot, top, comments. |
| timeFilter | string | No | "all" | Limit search results by time: hour, day, week, month, year, all. |
| filterByDate | string | No | — | Absolute or relative date filter for posts (e.g., 2024-01-15 or "3 days"). |
| enableSearch | boolean | No | true | When on and in search mode, runs Reddit search for your terms. |
| enableSubreddit | boolean | No | true | Process subreddit URLs in Start URLs for listings (hot/new/etc.). |
| sortSubreddit | string | No | "hot" | Subreddit listing order: hot, new, top, rising, controversial. |
| enablePost | boolean | No | true | Process post URLs for post + comments (unless skipComments is on). |
| enableUser | boolean | No | true | Process user profile URLs for profile, submissions, and comments. |
| enableSubredditInfo | boolean | No | true | Fetch subreddit metadata and optional popular/new lists. |
| maxItemsToSave | integer | No | 10 | Global maximum number of items to collect across sources. |
| limitPostsPerPage | integer | No | 10 | Max posts per subreddit listing request. |
| postDateLimit | string | No | — | Include only posts created on/after this date (YYYY-MM-DD). |
| limitCommentsPerPage | integer | No | 10 | Maximum comments to retrieve per post. |
| limitCommunityPages | integer | No | 2 | Max listing pages per subreddit URL. |
| limitUserPages | integer | No | 2 | Max pages to fetch for user submissions and comments. |
| pageScrollTimeout | integer | No | 40 | Timeout in seconds for page/scroll-related operations. |
| maxCommentDepth | integer | No | 5 | Maximum depth of nested comment replies to parse (1–20). |
| maxItemsPerUser | integer | No | 20 | Max submitted posts and comments per user profile. |
| fetchUserProfile | boolean | No | true | Fetch each user’s profile (name, karma, created date). |
| fetchUserSubmitted | boolean | No | true | Fetch posts submitted by each user. |
| fetchUserComments | boolean | No | true | Fetch comments made by each user. |
| fetchUserOverview | boolean | No | false | Fetch user overview and add a summary (total items, counts). |
| fetchPopularSubreddits | boolean | No | false | Fetch Reddit’s list of popular subreddits. |
| fetchNewSubreddits | boolean | No | false | Fetch recently created subreddits. |
| maxSubredditsInfo | integer | No | 25 | Maximum subreddits to fetch for popular/new lists. |
| requestDelaySeconds | integer | No | 1 | Delay between HTTP requests (seconds). |
| proxyConfiguration | object | No | — | Configure Apify Proxy (by default, the actor starts without proxy and auto-falls back on block). |
Example JSON output
The actor pushes incremental items during the run and a final summary object at the end. Example dataset items:
[{"type": "search","query": "python scraping","community": "python","total": 2,"posts": [{"title": "Show HN: I built a Python scraper","subreddit": "python","author": "dev_user","score": 120,"num_comments": 30,"url": "https://www.reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","permalink": "https://reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","created_utc": 1712072000}]},{"type": "subreddit","source": "python","total": 1,"posts": [{"title": "Weekly Python Discussions","author": "mod_team","score": 350,"num_comments": 90,"url": "https://www.reddit.com/r/python/comments/xyz789/weekly_python_discussions/","permalink": "https://reddit.com/r/python/comments/xyz789/weekly_python_discussions/","created_utc": 1712070000,"is_self": true,"selftext": "Share what you're working on..."}]},{"type": "post","postId": "abc123","post": {"id": "abc123","title": "Show HN: I built a Python scraper","author": "dev_user","subreddit": "python","score": 120,"upvote_ratio": 0.97,"num_comments": 30,"created_utc": 1712072000,"url": "https://www.reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","permalink": "https://reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","is_self": false,"selftext": "","link_flair_text": "Show","over_18": false},"comments": [{"id": "c1","author": "commenter1","body": "Nice work!","score": 10,"created_utc": 1712072100,"depth": 0,"replies": []}],"total_comments_parsed": 1},{"type": "user","username": "someuser","profile": {"name": "someuser","link_karma": 3000,"comment_karma": 2500,"total_karma": 5500,"created_utc": 1609459200},"submitted": [{"title": "My first project","subreddit": "learnpython","score": 45,"created_utc": 1711000000,"permalink": "https://reddit.com/r/learnpython/comments/def456/my_first_project/"}],"comments": [{"body": "I recommend requests + BeautifulSoup...","subreddit": "learnpython","score": 12,"created_utc": 1711100000,"permalink": "https://reddit.com/r/learnpython/comments/ghi789/some_thread/"}],"overview": null},{"type": "subreddit_info","kind": "popular","count": 1,"subreddits": [{"name": "python","title": "Python","description": "News about the programming language Python.","subscribers": 1543200,"active_users": 8500,"created_utc": 1200000000,"over_18": false,"subreddit_type": "public","url": "https://reddit.com/r/python/","icon_img": "https://styles.redditmedia.com/...","banner_img": "https://styles.redditmedia.com/...","community_icon": "https://styles.redditmedia.com/..."}]},{"metadata": {"timestamp": "2026-04-02T13:30:00Z","config": {"maxItemsToSave": 10,"limitPostsPerPage": 10,"maxCommentDepth": 5}},"data": {"search": {"python scraping": {"total": 2,"posts": [{"title": "Show HN: I built a Python scraper","subreddit": "python","author": "dev_user","score": 120,"num_comments": 30,"url": "https://www.reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","permalink": "https://reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","created_utc": 1712072000}]}},"subreddit": {"python": {"total": 1,"posts": [{"title": "Weekly Python Discussions","author": "mod_team","score": 350,"num_comments": 90,"url": "https://www.reddit.com/r/python/comments/xyz789/weekly_python_discussions/","permalink": "https://reddit.com/r/python/comments/xyz789/weekly_python_discussions/","created_utc": 1712070000,"is_self": true,"selftext": "Share what you're working on..."}]}},"posts": {"abc123": {"post": {"id": "abc123","title": "Show HN: I built a Python scraper","author": "dev_user","subreddit": "python","score": 120,"upvote_ratio": 0.97,"num_comments": 30,"created_utc": 1712072000,"url": "https://www.reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","permalink": "https://reddit.com/r/python/comments/abc123/show_hn_i_built_a_python_scraper/","is_self": false,"selftext": "","link_flair_text": "Show","over_18": false},"comments": [{"id": "c1","author": "commenter1","body": "Nice work!","score": 10,"created_utc": 1712072100,"depth": 0,"replies": []}],"total_comments_parsed": 1}},"users": {"someuser": {"profile": {"name": "someuser","link_karma": 3000,"comment_karma": 2500,"total_karma": 5500,"created_utc": 1609459200},"submitted": [{"title": "My first project","subreddit": "learnpython","score": 45,"created_utc": 1711000000,"permalink": "https://reddit.com/r/learnpython/comments/def456/my_first_project/"}],"comments": [{"body": "I recommend requests + BeautifulSoup...","subreddit": "learnpython","score": 12,"created_utc": 1711100000,"permalink": "https://reddit.com/r/learnpython/comments/ghi789/some_thread/"}]}},"subreddit_info": {"popular": [{"name": "python","title": "Python","description": "News about the programming language Python.","subscribers": 1543200,"active_users": 8500,"created_utc": 1200000000,"over_18": false,"subreddit_type": "public","url": "https://reddit.com/r/python/","icon_img": "https://styles.redditmedia.com/...","banner_img": "https://styles.redditmedia.com/...","community_icon": "https://styles.redditmedia.com/..."}],"new": [],"specific": {}}}}]
Fields that may be null or empty:
- user.profile may be omitted if fetchUserProfile is off or unavailable.
- user.overview is present only when fetchUserOverview is true.
- subreddit_info sections depend on fetchPopularSubreddits, fetchNewSubreddits, and available subreddits from Start URLs.
FAQ
Do I need to log in or use OAuth to run this Reddit Scraper?
No. This actor works without login or OAuth. It uses Reddit’s public JSON endpoints (reddit.com/*.json), making it a practical Reddit scraping tool for workflows that avoid the official API.
Can it scrape Reddit comments, and can I limit depth?
Yes. When enablePost is on and skipComments is false, the actor fetches posts and comment trees. Use maxCommentDepth to control how deep nested replies are parsed.
How do I scrape a specific subreddit’s posts?
Add the subreddit URL (e.g., https://www.reddit.com/r/python/) to startUrls and keep enableSubreddit on. Choose the order with sortSubreddit and cap volume with limitPostsPerPage.
Can I run keyword searches instead of URLs?
Yes. Provide searchTerms and set ignoreStartUrls to true (or leave Start URLs empty). You can restrict to a community with searchCommunity and tune sortSearch and timeFilter.
What formats can I export the data to?
You can export the Apify Dataset to JSON or CSV, or consume it directly via the Apify API for downstream processing with your Reddit scraper Python or Node.js pipelines.
How does proxy fallback work if Reddit blocks me?
By default, the actor starts without a proxy. If Reddit returns a block or rate-limit, it automatically switches to Apify datacenter proxy; if needed, it retries with residential proxy (up to 3 attempts). You can also preconfigure proxyConfiguration.
How many items can I scrape in one run?
Use maxItemsToSave to set a global cap, and fine-tune with limitPostsPerPage, limitCommentsPerPage, maxItemsPerUser, and maxCommentDepth. These controls help you balance speed, size, and coverage.
Is it legal to scrape Reddit with this tool?
Yes, when done responsibly. The actor accesses only public data and does not use authentication. You are responsible for respecting Reddit’s terms, handling rate limits, and complying with relevant regulations (e.g., GDPR, CCPA).
Closing thoughts
The Reddit Scraper is built to reliably extract public Reddit posts, comments, subreddit listings/metadata, and user activity — at scale and without login. With granular controls (search, sorts, limits, depth) and a structured output, it’s ideal for marketers, analysts, developers, and researchers. Start in Apify Console, export to JSON/CSV, or integrate via the Apify API to automate your Reddit crawler workflows. Start extracting smarter Reddit insights today.