Reddit Scraper | All-In-One | $1.5 / 1K
Pricing
$1.49 / 1,000 results
Reddit Scraper | All-In-One | $1.5 / 1K
Reddit All-in-one Scraper. Scrape posts and full comment threads from any search, subreddit, user, or direct post URL. This enterprise-grade scraper is the fastest in the market and delivers clean and detailed JSON.
Pricing
$1.49 / 1,000 results
Rating
5.0
(5)
Developer

Fatih Tahta
Actor stats
40
Bookmarked
676
Total users
238
Monthly active users
1.4 hours
Issues response
3 days ago
Last modified
Categories
Share
Reddit Scraper
Slug: fatihtahta/reddit-scraper
Overview
This actor retrieves publicly available Reddit data, including posts and optional comment threads.
Data can be collected using keyword-based search, subreddit-scoped queries, or direct Reddit URLs. Results are returned as structured JSON records with stable field names suitable for analytics, monitoring, and downstream processing.
The actor supports configurable limits for posts and comments and exposes separate record types for posts and comments to simplify ingestion into databases, data warehouses, or application pipelines.
Key Capabilities
-
Multiple collection modes: Retrieve data via keyword-based search, subreddit-scoped scraping, or direct Reddit URLs (posts, listings, user profiles, and searches).
-
High-throughput collection: Capable of retrieving 1,000+ posts or 10,000+ comments in under one minute, depending on content structure and limits.
-
Unrestricted scale: Designed to collect millions of posts or comments across runs without artificial rate limits.
-
Complete data capture: Ensures a 100% rate for posts and comments reachable within the configured limits.
-
Operational reliability: Proven 100% successful run rate across production workloads under normal operating conditions.
-
Rich context extraction: Returns a broader set of metadata than the Reddit official API, including moderation status, awards, media previews, crosspost relationships, and author context.
-
Deterministic output schema: Emits stable, typed
postandcommentrecords with predictable fields suitable for analytics, warehousing, and automated pipelines. -
Comment thread traversal: Optionally retrieves full comment trees up to the full depth and maximum count.
Configuration
| Field | Type | Default | Notes |
|---|---|---|---|
queries | string[] | – | Search terms to look up on Reddit. Ignored when urls are provided. |
urls | string[] | – | Specific Reddit URLs to scrape. Takes priority over queries. |
scrapeComments | boolean | false | Set to true to extract comments for each post. |
maxComments | number | 50000 | Maximum comments saved per post when scrapeComments is true. |
maxPosts | number | 50000 | Limit on posts stored for each query or URL. |
includeNsfw | boolean | false | Include results tagged as NSFW. |
sort | "relevance" | "hot" | "top" | "new" | "comments" | relevance | Ordering for search results. |
timeframe | "hour" | "day" | "week" | "month" | "year" | "all" | all | Time range filter for search results when sort is relevance, top, or comments. |
subredditName | string | – | Name of a single subreddit to target (omit the r/ prefix). |
subredditKeywords | string[] | – | Optional keywords combined with subredditName to focus the subreddit search. |
subredditSort | "relevance" | "hot" | "top" | "new" | "comments" | relevance | Ordering applied when searching within the chosen subreddit. |
subredditTimeframe | "hour" | "day" | "week" | "month" | "year" | "all" | all | Time filter that pairs with subredditSort values of relevance, top, or comments. |
strictSearch | boolean | false | When true, wrap query tokens in quotes and join with AND to force Reddit into phrase/AND semantics (cuts noisy matches). |
strictTokenFilter | boolean | false | When true, discard posts whose title/body/url do not contain every query token as a whole word/phrase. Useful to remove "looming" vs. loom false positives. |
Usage Examples
Sample Input
{"includeNsfw": false,"queries": ["Cheesecake","Swimming Pool"],"scrapeComments": true,"sort": "hot","timeframe": "year","urls": ["https://www.reddit.com/r/socialmedia/"]}
Sample Output
Post record (kind: "post")
{"kind": "post","query": "Cheesecake","id": "1oiwt3p","title": "My first cheesecake :)","body": "Turned out a bit short but thats ok cause it tasted amazing. ","author": "No_Opportunity_1502","score": 27,"upvote_ratio": 0.97,"num_comments": 1,"subreddit": "Baking","created_utc": "2025-10-29T05:59:38.000Z","url": "https://www.reddit.com/r/Baking/comments/1oiwt3p/my_first_cheesecake/","flair": "No-Recipe Provided","over_18": false,"is_self": false,"spoiler": false,"locked": false,"is_video": false,"domain": "old.reddit.com","thumbnail": "https://b.thumbs.redditmedia.com/oIOAf9jpp5jUSRjEljGBBvN4EOtH6dJo7sujoeG3Wug.jpg","url_overridden_by_dest": "https://www.reddit.com/gallery/1oiwt3p","media": null,"media_metadata": null,"gallery_data": {"items": [{ "media_id": "iniej0usqzxf1", "id": 782212827 },{ "media_id": "qi29mztsqzxf1", "id": 782212828 },{ "media_id": "fehlpdvsqzxf1", "id": 782212829 }]},"stickied": false,"distinguished": null,"total_awards_received": 3,"all_awardings": [{ "count": 1, "name": "Helpful" },{ "count": 2, "name": "Wholesome" }],"gilded": 0,"num_crossposts": 1,"is_original_content": true,"author_fullname": "t2_abcd1234","author_flair_text": "Pro Baker","author_premium": false,"selftext_html": "<p>Turned out a bit short...</p>","preview": { "images": [{ "id": "preview-id" }] },"secure_media": null,"secure_media_embed": null,"crosspost_parent_list": null}
Comment record (kind: "comment")
{"kind": "comment","query": "https://www.reddit.com/r/technology/...","id": "k5z1x2y","postId": "t3_1d95j4g","parentId": "t3_1d95j4g","body": "Great analysis, but I think you're underestimating the impact of quantum computing on these timelines.","author": "future_thinker","score": 142,"created_utc": "2025-08-05T19:15:22.000Z","url": "https://www.reddit.com/r/technology/comments/1d95j4g/the_state_of_ai_in_2025_a_comprehensive_report/k5z1x2y/","stickied": false,"distinguished": null,"is_submitter": true,"score_hidden": false,"controversiality": 0,"depth": 0}
Output Guide
Post fields (kind: "post")
kind: Record type identifier; always "post" for post items. query: The search term or URL that produced this record. id: Reddit short ID of the post. title: Post title text. body: Post body text; empty for link-only posts. author: Username of the post creator. score: Net upvotes the post has received. upvote_ratio: Fraction of positive votes (0–1). num_comments: Number of comments on the post when fetched. subreddit: Name of the subreddit containing the post. created_utc: ISO timestamp when the post was created. url: Direct link to the Reddit post. flair: Subreddit-assigned flair text; null if none. over_18: Whether the post is marked NSFW; null when unavailable. is_self: True for text posts; false for link/gallery posts; null if unknown. spoiler: Indicates spoiler-tagged posts; null if not provided. locked: Whether new comments are disabled; null if unknown. is_video: True when the post is a native video; null otherwise. domain: Destination domain for link posts; null for self posts. thumbnail: Thumbnail URL when present; often "self" or null for text posts. url_overridden_by_dest: Original outbound URL for link posts; null otherwise. media: Media payload for videos or embeds; null when absent. media_metadata: Per-item metadata for galleries; null otherwise. gallery_data: Gallery structure for multi-image posts; null otherwise. stickied: True for posts pinned to the subreddit; null if not set. distinguished: Moderator/admin marker (e.g., "moderator"); null otherwise. total_awards_received: Count of awards on the post; null if not present. all_awardings: List of award objects applied to the post; null if none. gilded: Number of times the post was gilded; null if missing. num_crossposts: Count of times this post was crossposted; null if unknown. is_original_content: True when marked as original content; null if absent. author_fullname: Internal Reddit user ID for the author; null when hidden. author_flair_text: Author’s flair text; null if none. author_premium: Whether the author has Reddit Premium; null if not provided. selftext_html: HTML-rendered body for text posts; null for links or when unavailable. preview: Image preview metadata for media posts; null otherwise. secure_media: Media details for secure embeds; null when absent. secure_media_embed: Embed metadata for secure media; null when absent. crosspost_parent_list: Source post data for crossposts; null if not a crosspost.
Comment fields (kind: "comment")
kind: Record type identifier; always "comment" for comment items. query: The search term or URL that produced this record. id: Reddit short ID of the comment. postId: ID of the parent post. postUrl: Direct link to the parent post. parentId: ID of the parent comment or post. body: Comment text content. author: Username of the comment creator. score: Net upvotes the comment has received. created_utc: ISO timestamp when the comment was created. url: Direct link to the comment. stickied: True for comments pinned by moderators; null otherwise. distinguished: Moderator/admin marker (e.g., "moderator"); null otherwise. is_submitter: True when the commenter is also the post author; null if unavailable. score_hidden: True when scores are hidden temporarily; null if not provided. controversiality: Reddit controversy score; null when absent. depth: Nesting level in the thread; null if missing.
Pricing
The actor costs $1.50 per 1,000 saved items (posts or comments). Infrastructure and residential proxy expenses are included, and you only pay for successful results.
Example: scraping 10,000 posts and 25,000 comments equals 35,000 saved items, which costs (35,000 / 1,000) * $1.50 = $52.50.
Operational Tips
- Control run time and cost: Enable
scrapeCommentswhen comment-level analysis is required. - Scale in batches for large datasets: For large historical backfills or multi-million item collections, split runs by subreddit, time range, or URL lists to simplify retries and monitoring.
- Use limits defensively: Set
maxPostsandmaxCommentsto explicit values to prevent unbounded runs, especially when scraping high-activity subreddits. - Leverage subreddit mode for monitoring: Subreddit-scoped scraping is well-suited for recurring monitoring jobs where consistent coverage and ordering matter.
- Account for sorting behavior: Time filters apply only to compatible sort modes (e.g.,
top,relevance,comments). Ensure sort and timeframe combinations are aligned with your data goals.
Ethics & Compliance
The scraper collects only publicly available Reddit data and avoids private information. Ensure you have a legitimate reason to process any personal data returned in the results.
Support
Need help or a custom request? Open an issue via the Apify Console Issues tab and it'll be resolved around the clock.
Happy scrapings!
-Fatih