®️ Reddit Posts Intelligence Scraper avatar

®️ Reddit Posts Intelligence Scraper

Pricing

from $5.00 / 1,000 results

Go to Apify Store
®️ Reddit Posts Intelligence Scraper

®️ Reddit Posts Intelligence Scraper

Posts-only Reddit scraper using public .json endpoints. Extracts posts and adds lead intent, sentiment, virality, quality, keyword matches, and RAG markdown.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Ian Dikhtiar

Ian Dikhtiar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 days ago

Last modified

Share

Reddit Lead Intel Scraper

Reddit is where buyers complain before they book demos.

People ask for alternatives, rant about broken tools, compare competitors, describe painful workflows, and reveal exactly what they want next. The problem is that Reddit is noisy as hell.

This actor finds the useful posts and ranks them.

It scrapes public Reddit posts through Reddit's .json endpoints, then enriches every post with lead intent, urgency, sentiment, virality, quality, keyword matches, and RAG-ready markdown.

No Reddit OAuth. No official Reddit API key. No browser crawling.

What this is really for

Use it when you want to find posts like:

  • “What’s the best alternative to Apollo?”
  • “I need a tool that can enrich leads without getting blocked.”
  • “Has anyone found a cheaper way to monitor brand mentions?”
  • “Our CRM is a mess — what should we switch to?”
  • “Looking for software that handles outbound and compliance.”

Those are not just posts. They are demand signals.

Who uses it

  • Founders looking for early customers, competitor gaps, and raw market pain
  • Growth teams monitoring Reddit for high-intent conversations
  • Sales teams finding people actively asking for recommendations
  • Market researchers collecting voice-of-customer data without manually scrolling Reddit
  • Content teams discovering topics people actually care about
  • AI teams building clean Reddit datasets for RAG, classification, and analysis

What you get in each row

CategoryFieldsWhy it matters
Reddit posttitle, text, author, subreddit, permalink, timestampThe actual post and source context
Engagementscore, upvote ratio, comments countShows whether the post has traction
Lead signalslead intent score, urgency, matched keywords, signal explanationsTells you which posts deserve attention first
Text signalssentiment, quality score, spam/noise penaltiesHelps separate useful pain from garbage
AI-ready textRAG markdown with metadataReady for LLMs, embeddings, alerts, or CRM enrichment

The intelligence layer

Lead intent score

Every post gets a lead_intent_score from 0 to 100.

The score rises when a post looks commercially useful: recommendation requests, alternative searches, buying language, pain/problem language, strong keyword matches, and meaningful engagement.

High scores usually mean: “someone should look at this.”

Lead urgency

Each post is labeled low, medium, or high urgency.

This makes it easy to route the best posts into Slack, a spreadsheet, a CRM, a lead review queue, or an LLM workflow.

Signal explanations

The actor does not just score posts silently. It tells you why a post was interesting.

Example signals:

  • buying/recommendation intent
  • pain/problem language
  • fast engagement velocity
  • negative sentiment risk
  • possible spam/low-quality content

Sentiment and quality

Reddit contains gold and garbage in the same thread.

The actor scores sentiment and quality so you can find useful complaints, product feedback, and competitor frustration without drowning in memes, spam, and low-effort posts.

Virality velocity

A post with five comments in ten minutes can matter more than a post with fifty comments from last year.

Virality velocity helps surface discussions that are moving now.

RAG-ready markdown

Every row includes rag_markdown, a clean markdown document containing the post title, body, subreddit, author, and source URL.

Use it for:

  • vector databases
  • LLM summarization
  • lead qualification
  • category research
  • alerts and dashboards
  • downstream enrichment

Example: find competitor alternatives

{
"queries": ["Apollo alternative", "best lead generation tool", "need sales intelligence software"],
"subreddits": ["SaaS", "sales", "Entrepreneur"],
"sort": "relevance",
"time": "year",
"maxResults": 100,
"keywords": ["recommend", "alternative", "looking for", "need", "tool", "software"],
"negativeKeywords": ["crypto", "casino", "airdrop", "giveaway"],
"dropNegativeKeywordMatches": true,
"excludeOver18": true
}

Example output

{
"type": "post",
"title": "Evaluating B2B lead generation tool - compliance friendly, enterprise ready",
"author": "Additional-Pop8840",
"subreddit": "Entrepreneur",
"score": 2,
"num_comments": 21,
"permalink": "https://www.reddit.com/r/Entrepreneur/comments/...",
"intelligence": {
"lead_intent_score": 100,
"lead_urgency": "high",
"sentiment_label": "neutral",
"virality_velocity_per_hour": 0.049,
"quality_score": 100,
"matched_keywords": ["tool"],
"signals": ["buying/recommendation intent", "pain/problem language"]
}
}

Good search ideas

Try phrases that sound like real Reddit posts:

GoalSearch examples
Find alternativescompetitor alternative, switching from competitor, best alternative to competitor
Find painstruggling with CRM, outbound is not working, lead data problem
Find buyersneed a tool for, looking for software, what should I use for
Find feedbackis product worth it, has anyone tried product, product review
Find trendsAI tool for sales, automated prospecting, brand monitoring reddit

Input guide

InputBest use
queriesSearch Reddit by buyer phrases, competitor names, pain points, or product categories
subredditsFocus on communities where your buyers hang out
startUrlsScrape specific Reddit URLs directly
keywordsBoost lead scoring for your preferred intent phrases
negativeKeywordsPenalize or remove noisy topics
minLeadIntentScoreSave only stronger leads; try 60+ or 80+
maxResultsControl output size

Use at least one of queries, subreddits, or startUrls.

Data source

The actor uses public Reddit .json endpoints. It does not require Reddit OAuth or an official Reddit API key.

Reddit may throttle or block some cloud traffic, so Apify Proxy is enabled by default. The actor also uses delays, retries, pagination controls, and graceful partial results.

What this actor does not do

  • It does not crawl full comment trees.
  • It does not access private Reddit content.
  • It does not recover deleted, gated, quarantined, or unavailable posts.
  • It does not pretend heuristic scoring is perfect qualification.

This is intentionally posts-only. A dedicated comments actor is a cleaner, separate product.