Reddit Scraper | All-In-One | $1.5 / 1K avatar

Reddit Scraper | All-In-One | $1.5 / 1K

Pricing

$1.49 / 1,000 results

Go to Apify Store
Reddit Scraper | All-In-One | $1.5 / 1K

Reddit Scraper | All-In-One | $1.5 / 1K

Reddit All-in-one Scraper. Scrape posts and full comment threads from any search, subreddit, user, or direct post URL. This enterprise-grade scraper is the fastest in the market and delivers clean and detailed JSON.

Pricing

$1.49 / 1,000 results

Rating

3.3

(9)

Developer

Fatih Tahta

Fatih Tahta

Maintained by Community

Actor stats

66

Bookmarked

1.6K

Total users

347

Monthly active users

1.3 days

Issues response

16 days ago

Last modified

Share

Reddit Scraper

Slug: fatihtahta/reddit-scraper

Overview

Reddit Scraper collects publicly available Reddit posts and (optionally) comments, then saves normalized JSON records to your Apify dataset. It supports three real input patterns: direct URL scraping (urls), global keyword search (queries), and subreddit mode (subredditName with optional subredditKeywords). The output schema is stable across runs with explicit kind: "post" and kind: "comment" record types for easier downstream ingestion.

Why Use This Actor

  • Monitor one or more subreddits on a recurring schedule.
  • Track keywords across Reddit for market, brand, or competitor intelligence.
  • Build analytics datasets with normalized post/comment metadata.
  • Run sentiment and discussion analysis on comment threads.
  • Feed Reddit events into BI tools, CRMs, alerting systems, or warehouses.

Input Parameters

ParameterTypeDescriptionDefault
queriesstring[]Global search queries. Used when urls are not provided.[]
urlsstring[]Reddit URLs to scrape directly (posts, listings, subreddit search pages, user pages, and redd.it/{id} links). Takes priority over all other targeting fields.[]
subredditNamestringSubreddit to target (with or without r/ prefix). With no subredditKeywords, actor runs subreddit listing mode.null
subredditKeywordsstring[]Keywords searched within subredditName. When present with subredditName, actor runs subreddit search mode.[]
sort"relevance" | "hot" | "top" | "new" | "comments"Sort for global queries and some normalized URL searches."relevance"
timeframe"hour" | "day" | "week" | "month" | "year" | "all"Time filter for compatible search sorts (relevance, top, comments)."all"
subredditSort"relevance" | "hot" | "top" | "new" | "comments"Sort for subreddit mode. Falls back to sort when omitted.sort
subredditTimeframe"hour" | "day" | "week" | "month" | "year" | "all"Time filter for subreddit mode on compatible sorts. Falls back to timeframe when omitted.timeframe
scrapeCommentsbooleanEnable comment extraction for each discovered post.false
maxPostsnumberMax posts saved per target (query, subreddit keyword, subreddit listing target, or URL target). Values below 1 are coerced to 1.50000
maxCommentsnumberMax comments saved per post when scrapeComments=true. Values below 0 are coerced to 0; effective cap is 50,000/post.50000
includeNsfwbooleanInclude NSFW content in compatible Reddit endpoints.false
strictSearchbooleanBuilds stricter Reddit search queries by quoting tokens and joining with AND.false
strictTokenFilterbooleanPost-save filter: requires all query tokens to appear in title/body/URL to reduce false positives.false

Mode note:

  • Provide at least one targeting source: urls, queries, or subredditName (optionally with subredditKeywords).
  • If urls is non-empty, URL mode is used and other targeting fields are ignored.

Example Input

1) Subreddit monitoring (new posts in a subreddit)

{
"subredditName": "technology",
"subredditSort": "new",
"subredditTimeframe": "day",
"scrapeComments": true,
"maxPosts": 200,
"maxComments": 250,
"includeNsfw": false
}
{
"queries": ["llm observability", "vector database"],
"sort": "top",
"timeframe": "week",
"strictSearch": true,
"strictTokenFilter": true,
"scrapeComments": false,
"maxPosts": 300,
"includeNsfw": false
}

3) URL list backfill

{
"urls": [
"https://www.reddit.com/r/MachineLearning/",
"https://www.reddit.com/r/dataengineering/comments/1abcxyz/example_post/"
],
"scrapeComments": true,
"maxPosts": 100,
"maxComments": 500
}

Output

Output destination

All results are stored in the run's Apify dataset as JSON records.

Record types

  • kind: "post" for Reddit submissions.
  • kind: "comment" for comments, emitted only when scrapeComments=true and comment limit permits.

Use:

kind + ":" + id

This is stable across reruns for the same Reddit object.

Examples

Post example (kind: "post")

{
"kind": "post",
"query": "r/technology",
"id": "1abc123",
"title": "Major framework release notes",
"body": "Key updates and migration guidance...",
"author": "example_user",
"score": 842,
"upvote_ratio": 0.94,
"num_comments": 167,
"subreddit": "technology",
"created_utc": "2026-01-15T10:20:33.000Z",
"url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/",
"permalink": "/r/technology/comments/1abc123/major_framework_release_notes/",
"canonical_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/",
"old_reddit_url": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/",
"json_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/.json",
"flair": "News",
"over_18": false,
"is_self": true,
"spoiler": false,
"locked": false,
"is_video": false,
"domain": "self.technology",
"thumbnail": "self",
"url_overridden_by_dest": null,
"media": null,
"media_metadata": null,
"gallery_data": null,
"gallery_images": [],
"media_assets": [],
"age_hours": 2.75,
"engagement_total": 1009,
"comment_to_score_ratio": 0.1983,
"is_high_engagement": true,
"content_flags": [],
"stickied": false,
"distinguished": null,
"total_awards_received": 2,
"all_awardings": [],
"gilded": 0,
"num_crossposts": 0,
"is_original_content": false,
"author_fullname": "t2_example",
"author_flair_text": null,
"author_premium": false,
"selftext_html": "<div class=\"md\"><p>Key updates...</p></div>",
"preview": null,
"secure_media": null,
"secure_media_embed": null,
"crosspost_parent_list": null
}

Comment example (kind: "comment")

{
"kind": "comment",
"query": "r/technology",
"id": "kxyz789",
"postId": "1abc123",
"postUrl": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/.json?raw_json=1&limit=500",
"parentId": "t3_1abc123",
"body": "The migration section saved us hours.",
"author": "data_ops_team",
"score": 51,
"created_utc": "2026-01-15T11:02:44.000Z",
"url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/",
"permalink": "/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/",
"canonical_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/",
"old_reddit_url": "https://old.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/",
"json_url": "https://www.reddit.com/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/.json",
"root_comment_id": "kxyz789",
"parent_kind": "post",
"comment_permalink": "/r/technology/comments/1abc123/major_framework_release_notes/kxyz789/",
"author_deleted": false,
"body_deleted": false,
"stickied": false,
"distinguished": null,
"is_submitter": false,
"score_hidden": false,
"controversiality": 0,
"depth": 0
}

Field Reference

Post fields (kind: "post")

  • Identity/context: kind, query, id, subreddit, created_utc.
  • Core content: title, body, author, url, permalink, canonical_url, old_reddit_url, json_url.
  • Engagement: score, upvote_ratio, num_comments, engagement_total, comment_to_score_ratio, is_high_engagement.
  • Classification/state: flair, over_18, spoiler, locked, is_self, is_video, content_flags.
  • Media/detail: domain, thumbnail, url_overridden_by_dest, media, media_metadata, gallery_data, gallery_images, media_assets, preview, secure_media, secure_media_embed.
  • Additional metadata: stickied, distinguished, total_awards_received, all_awardings, gilded, num_crossposts, is_original_content, crosspost_parent_list.
  • Author metadata: author_fullname, author_flair_text, author_premium.
  • Derived/runtime fields: age_hours.

Comment fields (kind: "comment")

  • Identity/context: kind, query, id, postId, postUrl, parentId, created_utc.
  • Core content: body, author, score, url, permalink, canonical_url, old_reddit_url, json_url, comment_permalink.
  • Threading/deletion: root_comment_id, parent_kind, author_deleted, body_deleted, depth.
  • Moderation/visibility metadata: stickied, distinguished, is_submitter, score_hidden, controversiality.

Data guarantees & handling

  • Extraction is best-effort and depends on Reddit endpoint availability and response consistency.
  • Optional fields can be null when Reddit does not return them.
  • Deleted/removed content can appear with deletion indicators (author_deleted, body_deleted) and/or placeholder text.
  • timeframe is effective only on compatible sorts/endpoints (relevance, top, comments for search-style routes).
  • num_comments on posts may differ from comments actually saved (limit settings, pagination boundaries, unavailable branches, deleted content).
  • Large runs can end with partial coverage when upstream failures/retries exceed limits; use scheduling + dedupe for resilient pipelines.

Pricing

This actor costs $1.50 per 1,000 saved items (post or comment records).

Example:

  • 10,000 posts + 25,000 comments = 35,000 saved items
  • (35,000 / 1,000) * $1.50 = $52.50

Scheduling & Automation

Recommended production patterns:

  • Recurring subreddit snapshots: Run every 15โ€“60 minutes with subredditName + subredditSort: "new".
  • Recurring keyword monitoring: Run hourly/daily with queries, chosen sort, and compatible timeframe.
  • Webhook fan-out: Trigger dataset webhooks to sync into warehouses, CRMs, alerting, or Slack.
  • Delta strategy: Deduplicate by kind:id, then compare current vs previous snapshots to detect new posts/comments and metric changes.

How to Run on Apify

  1. Open the actor in Apify Console.
  2. Click Start and choose Input.
  3. Pick one targeting mode (urls, queries, or subreddit settings).
  4. Set limits (maxPosts, optional maxComments) and toggle scrapeComments as needed.
  5. Configure ranking filters (sort, timeframe, subreddit variants) and NSFW behavior.
  6. Run the actor and monitor logs.
  7. Open the output dataset and export JSON/CSV (or process via API/webhooks).

Ethics & Compliance

This actor is intended for publicly available Reddit data only. Do not use it to bypass access controls, scrape private/non-public areas, or violate Reddit terms. Use collected data responsibly and avoid workflows that enable spam, harassment, or other abusive behavior.

Support

If you need help, open an issue from the Apify Console Issues tab and include:

  • your input JSON (redact sensitive values),
  • run ID,
  • expected behavior vs. actual behavior.

Happy scrapings!