Reddit Niche Subreddit Scraper | Auto-Tagged | Free avatar

Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Pricing

from $0.001 / reddit post

Go to Apify Store
Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Reddit Niche Subreddit Scraper | Auto-Tagged | Free

Scrape posts from any list of niche subreddits with automatic keyword tagging. Filter by date, score, comments. Output: clean JSON ready for LLM training, social listening, or brand monitoring. FREE during launch preview.

Pricing

from $0.001 / reddit post

Rating

0.0

(0)

Developer

Polara Data

Polara Data

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

25 days ago

Last modified

Share

Reddit Niche Subreddit Scraper (Auto-Tagged)

Scrape posts from a curated list of niche subreddits, with optional keyword search and automatic content tagging. Built for ML/LLM training pipelines, social listening, brand monitoring, and trend detection on niche communities that generic scrapers miss.

What it does

  • Pulls posts from any list of subreddits (no auth, no API key)
  • Filters by sort order (hot/new/top/rising), time window, min upvotes, min comments
  • Optional within-subreddit keyword search
  • Auto-tags every post with your custom keyword list — search the body+title for terms you care about, output them as a tags array
  • Returns clean structured JSON, ready to drop into ML pipelines or Slack/Notion automations

Use cases

LLM training data — Curate subreddit-specific corpora for fine-tuning domain models (e.g. r/MachineLearning + r/LocalLLaMA + r/datascience for AI dev models).

Social listening (niche) — Track brand mentions or competitor names across vertical subreddits without paying enterprise tools.

Trend detection — Auto-tag posts in r/startups, r/SaaS, r/Entrepreneur for emerging product categories or pain points.

Content discovery — Find high-engagement posts (>100 score, >50 comments) in your niche for content marketing inspiration.

Input

{
"subreddits": ["MachineLearning", "datascience", "LocalLLaMA"],
"sort": "hot",
"searchQuery": "RAG",
"tagKeywords": ["RAG", "fine-tuning", "Llama", "evaluation", "agent", "embedding"],
"maxPostsPerSubreddit": 25,
"minScore": 5,
"minComments": 0,
"includeBody": true
}
FieldTypeDefaultDescription
subredditsarrayrequiredSubreddit names (without /r/)
sortenumhothot / new / top / rising
timeFilterenumweekhour / day / week / month / year / all (only for sort=top)
searchQuerystring(none)Optional keyword search inside each subreddit
tagKeywordsarray[]Auto-tag keywords applied to title+body
maxPostsPerSubredditint (1-500)25Cap per subreddit
minScoreint5Skip posts below this upvote count
minCommentsint0Skip posts below this comment count
includeBodybooltrueInclude selftext body in output

Output

One dataset item per post:

{
"id": "1abcxyz",
"subreddit": "MachineLearning",
"title": "[D] Best practices for evaluating RAG systems in production",
"body": "...",
"author": "user123",
"url": "https://www.reddit.com/r/MachineLearning/comments/1abcxyz/...",
"linkUrl": "https://arxiv.org/abs/...",
"score": 234,
"upvoteRatio": 0.97,
"numComments": 56,
"createdUtc": 1730000000,
"createdAt": "2026-04-29T10:00:00Z",
"isSelf": true,
"flair": "Discussion",
"domain": "self.MachineLearning",
"tags": ["RAG", "evaluation"]
}

Pricing

Currently FREE during the launch preview — no per-result charges, no monthly cap.

When paid pricing rolls out (notice will be posted at least 14 days in advance):

EventPrice
Actor start$0.01 (one-time per run)
Result item$0.001 (per post)

Cost examples (post-launch):

  • 100 posts: ~$0.11
  • 1.000 posts: ~$1.01
  • 10.000 posts: ~$10.01

Limits

  • Source: Reddit public JSON API (no auth required, no API key)
  • Rate limit: ~1 req/sec (politely paced internally with 0.6s sleep)
  • Max posts per subreddit: 500 per run (cumulative pagination)
  • No private subreddits, no NSFW filtering bypass
  • No comment scraping in v1 (planned for v2)

Source attribution

Data comes from Reddit's public JSON endpoint (/r/{sub}/.json), which does not require authentication. Subject to Reddit's Public Content Policy.

Author

Polara Data — niche scrapers for Italy, EU & global markets.