Reddit Niche Subreddit Scraper | Auto-Tagged | Free avatar

Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Pricing

Pay per usage

Go to Apify Store
Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Scrape posts from any list of niche subreddits with automatic keyword tagging. Filter by date, score, comments. Output: clean JSON ready for LLM training, social listening, or brand monitoring. FREE during launch preview.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Polara Data

Polara Data

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Reddit Niche Subreddit Scraper (Auto-Tagged)

Scrape posts from a curated list of niche subreddits, with optional keyword search and automatic content tagging. Built for ML/LLM training pipelines, social listening, brand monitoring, and trend detection on niche communities that generic scrapers miss.

What it does

  • Pulls posts from any list of subreddits (no auth, no API key)
  • Filters by sort order (hot/new/top/rising), time window, min upvotes, min comments
  • Optional within-subreddit keyword search
  • Auto-tags every post with your custom keyword list — search the body+title for terms you care about, output them as a tags array
  • Returns clean structured JSON, ready to drop into ML pipelines or Slack/Notion automations

Use cases

LLM training data — Curate subreddit-specific corpora for fine-tuning domain models (e.g. r/MachineLearning + r/LocalLLaMA + r/datascience for AI dev models).

Social listening (niche) — Track brand mentions or competitor names across vertical subreddits without paying enterprise tools.

Trend detection — Auto-tag posts in r/startups, r/SaaS, r/Entrepreneur for emerging product categories or pain points.

Content discovery — Find high-engagement posts (>100 score, >50 comments) in your niche for content marketing inspiration.

Input

{
"subreddits": ["MachineLearning", "datascience", "LocalLLaMA"],
"sort": "hot",
"searchQuery": "RAG",
"tagKeywords": ["RAG", "fine-tuning", "Llama", "evaluation", "agent", "embedding"],
"maxPostsPerSubreddit": 25,
"minScore": 5,
"minComments": 0,
"includeBody": true
}
FieldTypeDefaultDescription
subredditsarrayrequiredSubreddit names (without /r/)
sortenumhothot / new / top / rising
timeFilterenumweekhour / day / week / month / year / all (only for sort=top)
searchQuerystring(none)Optional keyword search inside each subreddit
tagKeywordsarray[]Auto-tag keywords applied to title+body
maxPostsPerSubredditint (1-500)25Cap per subreddit
minScoreint5Skip posts below this upvote count
minCommentsint0Skip posts below this comment count
includeBodybooltrueInclude selftext body in output

Output

One dataset item per post:

{
"id": "1abcxyz",
"subreddit": "MachineLearning",
"title": "[D] Best practices for evaluating RAG systems in production",
"body": "...",
"author": "user123",
"url": "https://www.reddit.com/r/MachineLearning/comments/1abcxyz/...",
"linkUrl": "https://arxiv.org/abs/...",
"score": 234,
"upvoteRatio": 0.97,
"numComments": 56,
"createdUtc": 1730000000,
"createdAt": "2026-04-29T10:00:00Z",
"isSelf": true,
"flair": "Discussion",
"domain": "self.MachineLearning",
"tags": ["RAG", "evaluation"]
}

Pricing

Currently FREE during the launch preview — no per-result charges, no monthly cap.

When paid pricing rolls out (notice will be posted at least 14 days in advance):

EventPrice
Actor start$0.01 (one-time per run)
Result item$0.001 (per post)

Cost examples (post-launch):

  • 100 posts: ~$0.11
  • 1.000 posts: ~$1.01
  • 10.000 posts: ~$10.01

Limits

  • Source: Reddit public JSON API (no auth required, no API key)
  • Rate limit: ~1 req/sec (politely paced internally with 0.6s sleep)
  • Max posts per subreddit: 500 per run (cumulative pagination)
  • No private subreddits, no NSFW filtering bypass
  • No comment scraping in v1 (planned for v2)

Source attribution

Data comes from Reddit's public JSON endpoint (/r/{sub}/.json), which does not require authentication. Subject to Reddit's Public Content Policy.

Author

Polara Data — niche scrapers for Italy, EU & global markets.