Dark Funnel  Scraper avatar

Dark Funnel Scraper

Pricing

from $100.00 / 1,000 results

Go to Apify Store
Dark Funnel  Scraper

Dark Funnel Scraper

Find B2B buyers before they fill a form. Monitors Reddit, G2, GitHub, HackerNews and LinkedIn for competitor switching signals and buying intent. Outputs CRM-ready leads with intent scores and outreach angles. Pay only for high-intent leads delivered.

Pricing

from $100.00 / 1,000 results

Rating

1.0

(1)

Developer

Rohith S

Rohith S

Maintained by Community

Actor stats

2

Bookmarked

10

Total users

1

Monthly active users

2 days ago

Last modified

Share

πŸŒ‘ Dark Funnel Intelligence Engine

Hosted on Apify


Stop cold calling. Start listening. Uncover B2B buying intent before prospects enter your CRM.

67% of the B2B buyer journey happens in the "Dark Funnel"β€”private communities, public forums, and peer reviews. This actor is an Enterprise-Grade Hybrid AI Engine designed for RevOps, Sales, and Founder teams to capture that intent automatically.

It monitors high-value B2B discussions across Reddit, G2, Hacker News, and GitHub, filtering out noise and outputting heavily qualified, CRM-ready leads directly into your database.


🎯 Use Cases

1. Sales Development: Find High-Intent Prospects Early

  • Discover companies evaluating solutions in your category.
  • Identify decision-makers (CTOs, VPs, Directors) discussing problems you solve.
  • Prioritize outreach based on buying stage (awareness β†’ consideration β†’ evaluation β†’ decision).

2. Competitive Intelligence: Automated Displacement

  • Monitor competitor mentions alongside your brand on G2 and Reddit.
  • Detect switching signals ("migrating from X to Y").
  • Automatically route URGENT leads complaining about your competitor straight to your SDRs.

3. Customer Success: Prevent Churn

  • Detect early at-risk signals from existing customers in public forums.
  • Identify replacement-buying motions before RFPs are issued.
  • Proactively engage when negative sentiment appears on G2.

4. Market Intelligence: Executive Summaries

  • Generate weekly digests of overall market sentiment.
  • Track competitor risk metrics and feature dissatisfaction.

πŸš€ How It Works: The Hybrid Intelligence Engine

Our engine avoids the fatal flaw of most scrapers: noise. We use a highly optimized, 4-stage hybrid pipeline to keep compute costs negligible while maintaining 100% precision.

1. Multi-Source Signal Collection

We optimize strictly for trustworthy commercial signals, not generic volume.

  • LinkedIn Discovery Support: Captures public buying signals, professional intent, and evaluation requests. Note: This is optimized for discovery of public posts via search engines, not deep authenticated profile extraction.
  • G2 Reviews: Uncovers deep dissatisfaction, pricing complaints, and vendor evaluations (via Yahoo Dorking).
  • Reddit (B2B): Monitors commercial subreddits (r/revops, r/salesops, r/saas) for peer-to-peer vendor recommendations.
  • Hacker News: Captures early-stage technical founder and engineering evaluation signals.
  • GitHub: Monitored to detect technical implementation pains.

2. Fast Heuristics (Zero-Cost Filtering)

Rapidly scans for pain keywords, personas, and commercial relevance. Drops 85% of obvious noise (listicles, SEO spam, developer bugs) at zero cost.

3. Deep Source Weighting

Applies precise multipliers. A mention in r/revops or G2 is boosted (1.5x), while technical chatter in r/reactjs is penalized (0.7x). Also supports GitHub repo star multipliers (1.2x for 1k+ stars, 1.4x for 5k+ stars).

4. Compound Pain & Recency Decay

  • Compound Pain Multipliers: Detects high-value pain combinations (e.g., "pricing + vendor lock" β†’ 1.4x boost).
  • Recency Decay: Fresh signals get higher priority (1x for 0 days, ~0.4x for 30 days, ~0.07x for 90 days).

πŸ“Š Example Output (CRM Ready)

This is what a fully enriched, high-intent lead looks like when generated by the engine.

{
"company": "HubSpot",
"source": "reddit",
"subreddit": "r/revops",
"title": "HubSpot vs Salesforce? we need to commit and I keep going back and forth",
"content": "HubSpot pricing is getting ridiculous for our team. We are actively looking to switch. Any recommendations?",
"intentLevel": "HIGH",
"leadPriority": "URGENT",
"painComboBoost": true,
"painSignals": {
"hasPainSignal": true,
"painTypes": ["pricing", "vendor_lock"],
"compoundComboMatched": "pricing+vendor_lock"
},
"switchSignals": {
"switchingDetected": true,
"switchingFrom": "HubSpot"
},
"recommendedOutreachAngle": "Lead with cost reduction and easy migration",
"createdAt": "2026-06-19T00:00:00.000Z"
}

βš™οΈ Configuration (Inputs)

Required Inputs

  • companies: Array of company names to monitor (e.g., ["Notion", "Stripe", "Airbnb"]). Max 50.

Source Toggles

  • enableLinkedIn: Enable LinkedIn Discovery Support to surface public professional B2B discussions (Recommended).
  • enableG2: Scrape highly commercial G2 Reviews (Recommended).
  • enableReddit: Scrape Reddit posts (Recommended).
  • enableHackernews: Search Hacker News stories and comments.
  • enableGithub: Search GitHub Issues.

Webhook Integration (New!)

  • webhookUrl: URL to send POST requests with high-intent signals (JSON payloads).
  • webhookBatchSize: Number of high-intent signals to send per webhook request (1-100, default: 25).

Advanced Features

  • monitoringMode: Set to DAILY or WEEKLY to track deltas across runs, prevent duplicate leads, and generate smart alerts.
  • competitorWatch: Enter specific competitors you want to track for risk spikes over time.
  • templatePreset: Instantly load configurations for common use cases (e.g., crm_switching, devops_hosting).
  • skipLanguageFilter: If true, skip filtering out non-English content (for non-English markets).
  • forceEnableAll: If true, bypass circuit breakers and enable all scrapers even if they've had consecutive failures (for debugging).
  • maxRequestsPerCrawl: Max results per company per source (1-100, default: 5).

πŸ”— Webhook Payload Format

When webhookUrl is configured, high-intent signals are sent in batches:

{
"event": "high_intent_signal",
"signals": [
{ /* CRM-ready signal object */ }
],
"actorRunId": "your-actor-run-id",
"timestamp": "2026-06-19T00:00:00.000Z"
}

πŸ“ˆ Cost of Usage & Economics

This Actor operates on a Pay-Per-Event (PPE) pricing model. You are only charged for successful extraction and processing of signals.

Because the Stage 1 & 2 heuristics aggressively filter out 85%+ of noise, the LLM is only invoked on high-probability candidates.

  • Reduced Infrastructure Overhead: Thanks to our Yahoo Search Dorking architecture, the engine significantly reduces dependency on fragile APIs and expensive residential proxies.
  • Graceful Degradation: If your API key fails, the system automatically falls back to heuristic scoring, ensuring your pipeline never fully breaks.

πŸ”’ Privacy & Compliance

  • βœ… Public data only: All scraped content is publicly accessible.
  • βœ… No authentication required: Doesn't access private accounts or login-protected content.
  • βœ… Data Minimization: Stores only usernames (public identifiers), not emails or private info. Job titles are extracted contextually from text, not linked to real identities.
  • ⚠️ Legal Disclaimer: This actor is intended for legitimate B2B marketing research. Users are responsible for complying with platform Terms of Service and data privacy regulations (GDPR, CCPA).

🧠 Technical Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MULTI-SOURCE INGESTION β”‚
β”‚ [Reddit] [LinkedIn] [G2 Reviews] [GitHub] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ Raw Unstructured Text
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 1 & 2: FAST HEURISTICS β”‚
β”‚ β€’ Deduplication & Spam Filtering β”‚
β”‚ β€’ NLP Keyword & Sentiment Analysis β”‚
β”‚ β€’ Persona & Entity Extraction β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ Heuristic Intent Score (0-100)
β–Ό
[EARLY DATE FILTER]
/ \
>90 days (actual) ≀90 days
β”‚ β”‚
β–Ό β–Ό
[DISCARD] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ COMPOUND PAIN + RECENCY β”‚
β”‚ Multiplier Application β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
[βœ… HIGH-INTENT CRM LEAD βœ…]

Key Technologies

  • Crawlee: Scalable web scraping framework.
  • Hybrid NLP Engine: Custom AFINN-based sentiment analysis + keyword-based intent detection.
  • Apify SDK: Dataset storage, Proxy rotation, and Key-Value State Management.

πŸ“‰ Performance & Limitations

  • Gold Dataset Validated: The engine is continuously tested against a rigorous internal benchmark dataset, scoring a flawless 100% Precision and 100% Recall on B2B edge cases.
  • The Public Internet is Noisy: Some days, nobody is discussing your niche. Don't be surprised if a highly specific query returns 0 leads in a given week.
  • G2 Indexing: G2 is heavily protected. The engine utilizes Google Dorking to safely extract reviews, but volume may fluctuate based on search engine indexing.

πŸ“ž Support & Contribution

Built for revenue teams who refuse to miss a deal.

  • Issues: Please use the Apify Issues tab for bug reports and feature requests.
  • License: MIT