Reddit Scraper | All-In-One | $1.5 / 1K
Pricing
$1.49 / 1,000 results
Reddit Scraper | All-In-One | $1.5 / 1K
Reddit All-in-one Scraper. Scrape posts and full comment threads from any search, subreddit, user, or direct post URL. This enterprise-grade scraper is the fastest in the market and delivers clean and detailed JSON.
Pricing
$1.49 / 1,000 results
Rating
3.3
(9)
Developer

Fatih Tahta
Actor stats
66
Bookmarked
1.6K
Total users
347
Monthly active users
1.3 days
Issues response
16 days ago
Last modified
Categories
Share
Reddit Scraper
Slug: fatihtahta/reddit-scraper
Overview
Reddit Scraper collects publicly available Reddit posts and (optionally) comments, then saves normalized JSON records to your Apify dataset. It supports three real input patterns: direct URL scraping (urls), global keyword search (queries), and subreddit mode (subredditName with optional subredditKeywords). The output schema is stable across runs with explicit kind: "post" and kind: "comment" record types for easier downstream ingestion.
Why Use This Actor
- Monitor one or more subreddits on a recurring schedule.
- Track keywords across Reddit for market, brand, or competitor intelligence.
- Build analytics datasets with normalized post/comment metadata.
- Run sentiment and discussion analysis on comment threads.
- Feed Reddit events into BI tools, CRMs, alerting systems, or warehouses.
Input Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
queries | string[] | Global search queries. Used when urls are not provided. | [] |
urls | string[] | Reddit URLs to scrape directly (posts, listings, subreddit search pages, user pages, and redd.it/{id} links). Takes priority over all other targeting fields. | [] |
subredditName | string | Subreddit to target (with or without r/ prefix). With no subredditKeywords, actor runs subreddit listing mode. | null |
subredditKeywords | string[] | Keywords searched within subredditName. When present with subredditName, actor runs subreddit search mode. | [] |
sort | "relevance" | "hot" | "top" | "new" | "comments" | Sort for global queries and some normalized URL searches. | "relevance" |
timeframe | "hour" | "day" | "week" | "month" | "year" | "all" | Time filter for compatible search sorts (relevance, top, comments). | "all" |
subredditSort | "relevance" | "hot" | "top" | "new" | "comments" | Sort for subreddit mode. Falls back to sort when omitted. | sort |
subredditTimeframe | "hour" | "day" | "week" | "month" | "year" | "all" | Time filter for subreddit mode on compatible sorts. Falls back to timeframe when omitted. | timeframe |
scrapeComments | boolean | Enable comment extraction for each discovered post. | false |
maxPosts | number | Max posts saved per target (query, subreddit keyword, subreddit listing target, or URL target). Values below 1 are coerced to 1. | 50000 |
maxComments | number | Max comments saved per post when scrapeComments=true. Values below 0 are coerced to 0; effective cap is 50,000/post. | 50000 |
includeNsfw | boolean | Include NSFW content in compatible Reddit endpoints. | false |
strictSearch | boolean | Builds stricter Reddit search queries by quoting tokens and joining with AND. | false |
strictTokenFilter | boolean | Post-save filter: requires all query tokens to appear in title/body/URL to reduce false positives. | false |
Mode note:
- Provide at least one targeting source:
urls,queries, orsubredditName(optionally withsubredditKeywords). - If
urlsis non-empty, URL mode is used and other targeting fields are ignored.
Example Input
1) Subreddit monitoring (new posts in a subreddit)
{"subredditName": "technology","subredditSort": "new","subredditTimeframe": "day","scrapeComments": true,"maxPosts": 200,"maxComments": 250,"includeNsfw": false}
2) Keyword query monitoring (global Reddit search)
{"queries": ["llm observability", "vector database"],"sort": "top","timeframe": "week","strictSearch": true,"strictTokenFilter": true,"scrapeComments": false,"maxPosts": 300,"includeNsfw": false}
3) URL list backfill
{"urls": ["https://www.reddit.com/r/MachineLearning/","https://www.reddit.com/r/dataengineering/comments/1abcxyz/example_post/"],"scrapeComments": true,"maxPosts": 100,"maxComments": 500}
Output
Output destination
All results are stored in the run's Apify dataset as JSON records.
Record types
kind: "post"for Reddit submissions.kind: "comment"for comments, emitted only whenscrapeComments=trueand comment limit permits.
Recommended idempotency key
Use:
kind + ":" + id
This is stable across reruns for the same Reddit object.
Examples
Post example (kind: "post")
{"kind": "post","query": "r/technology","id": "1abc123","title": "Major framework release notes","body": "Key updates and migration guidance...","author": "example_user","score": 842,"upvote_ratio": 0.94,"num_comments": 167,"subreddit": "technology","created_utc": "2026-01-15T10:20:33.000Z","url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/","permalink": "/r/technology/comments/1abc123/major_framework_release_notes/","canonical_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/","old_reddit_url": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/","json_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/.json","flair": "News","over_18": false,"is_self": true,"spoiler": false,"locked": false,"is_video": false,"domain": "self.technology","thumbnail": "self","url_overridden_by_dest": null,"media": null,"media_metadata": null,"gallery_data": null,"gallery_images": [],"media_assets": [],"age_hours": 2.75,"engagement_total": 1009,"comment_to_score_ratio": 0.1983,"is_high_engagement": true,"content_flags": [],"stickied": false,"distinguished": null,"total_awards_received": 2,"all_awardings": [],"gilded": 0,"num_crossposts": 0,"is_original_content": false,"author_fullname": "t2_example","author_flair_text": null,"author_premium": false,"selftext_html": "<div class=\"md\"><p>Key updates...</p></div>","preview": null,"secure_media": null,"secure_media_embed": null,"crosspost_parent_list": null}
Comment example (kind: "comment")
{"kind": "comment","query": "r/technology","id": "kxyz789","postId": "1abc123","postUrl": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/.json?raw_json=1&limit=500","parentId": "t3_1abc123","body": "The migration section saved us hours.","author": "data_ops_team","score": 51,"created_utc": "2026-01-15T11:02:44.000Z","url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/","permalink": "/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/","canonical_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/","old_reddit_url": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/","json_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/.json","root_comment_id": "kxyz789","parent_kind": "post","comment_permalink": "/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/","author_deleted": false,"body_deleted": false,"stickied": false,"distinguished": null,"is_submitter": false,"score_hidden": false,"controversiality": 0,"depth": 0}
Field Reference
Post fields (kind: "post")
- Identity/context:
kind,query,id,subreddit,created_utc. - Core content:
title,body,author,url,permalink,canonical_url,old_reddit_url,json_url. - Engagement:
score,upvote_ratio,num_comments,engagement_total,comment_to_score_ratio,is_high_engagement. - Classification/state:
flair,over_18,spoiler,locked,is_self,is_video,content_flags. - Media/detail:
domain,thumbnail,url_overridden_by_dest,media,media_metadata,gallery_data,gallery_images,media_assets,preview,secure_media,secure_media_embed. - Additional metadata:
stickied,distinguished,total_awards_received,all_awardings,gilded,num_crossposts,is_original_content,crosspost_parent_list. - Author metadata:
author_fullname,author_flair_text,author_premium. - Derived/runtime fields:
age_hours.
Comment fields (kind: "comment")
- Identity/context:
kind,query,id,postId,postUrl,parentId,created_utc. - Core content:
body,author,score,url,permalink,canonical_url,old_reddit_url,json_url,comment_permalink. - Threading/deletion:
root_comment_id,parent_kind,author_deleted,body_deleted,depth. - Moderation/visibility metadata:
stickied,distinguished,is_submitter,score_hidden,controversiality.
Data guarantees & handling
- Extraction is best-effort and depends on Reddit endpoint availability and response consistency.
- Optional fields can be
nullwhen Reddit does not return them. - Deleted/removed content can appear with deletion indicators (
author_deleted,body_deleted) and/or placeholder text. timeframeis effective only on compatible sorts/endpoints (relevance,top,commentsfor search-style routes).num_commentson posts may differ from comments actually saved (limit settings, pagination boundaries, unavailable branches, deleted content).- Large runs can end with partial coverage when upstream failures/retries exceed limits; use scheduling + dedupe for resilient pipelines.
Pricing
This actor costs $1.50 per 1,000 saved items (post or comment records).
Example:
- 10,000 posts + 25,000 comments = 35,000 saved items
(35,000 / 1,000) * $1.50 = $52.50
Scheduling & Automation
Recommended production patterns:
- Recurring subreddit snapshots: Run every 15โ60 minutes with
subredditName+subredditSort: "new". - Recurring keyword monitoring: Run hourly/daily with
queries, chosensort, and compatibletimeframe. - Webhook fan-out: Trigger dataset webhooks to sync into warehouses, CRMs, alerting, or Slack.
- Delta strategy: Deduplicate by
kind:id, then compare current vs previous snapshots to detect new posts/comments and metric changes.
How to Run on Apify
- Open the actor in Apify Console.
- Click Start and choose Input.
- Pick one targeting mode (
urls,queries, or subreddit settings). - Set limits (
maxPosts, optionalmaxComments) and togglescrapeCommentsas needed. - Configure ranking filters (
sort,timeframe, subreddit variants) and NSFW behavior. - Run the actor and monitor logs.
- Open the output dataset and export JSON/CSV (or process via API/webhooks).
Ethics & Compliance
This actor is intended for publicly available Reddit data only. Do not use it to bypass access controls, scrape private/non-public areas, or violate Reddit terms. Use collected data responsibly and avoid workflows that enable spam, harassment, or other abusive behavior.
Support
If you need help, open an issue from the Apify Console Issues tab and include:
- your input JSON (redact sensitive values),
- run ID,
- expected behavior vs. actual behavior.
Happy scrapings!