B2B Lead Scraper & Email Finder - Decision Makers avatar

B2B Lead Scraper & Email Finder - Decision Makers

Pricing

from $0.00005 / actor start

Go to Apify Store
B2B Lead Scraper & Email Finder - Decision Makers

B2B Lead Scraper & Email Finder - Decision Makers

Upload a company list, get verified decision maker emails, phones, LinkedIn, and social profiles. 12-stage pipeline: website discovery, contact extraction, email finder, verification, social enrichment, lead scoring, and Excel export. For email marketing, cold outreach, and B2B prospecting.

Pricing

from $0.00005 / actor start

Rating

5.0

(2)

Developer

Leadslogix LLC

Leadslogix LLC

Maintained by Community

Actor stats

3

Bookmarked

6

Total users

4

Monthly active users

an hour ago

Last modified

Share

B2B Lead Generation Tool & Sales Intelligence Platform -- Extract Verified Decision Maker Emails at Scale

The most powerful B2B lead generation and contact enrichment tool on Apify. Extract verified decision maker emails, phone numbers, LinkedIn profiles, and company intelligence from any company list -- no API keys required. A cost-effective Apollo alternative and ZoomInfo alternative that scrapes company websites, discovers emails through 5 search layers, verifies every address, and scores contacts by seniority -- all in a single 24-stage automated pipeline.

Upload a CSV. Get sales-ready leads back. $2 per 1,000 results.

Built for sales teams, growth marketers, SDRs, recruiters, and agencies who need a reliable business leads database, contact discovery engine, and CRM data enrichment tool for cold email outreach, sales prospecting, account-based marketing, and lead list building at scale.


Why Teams Switch from Apollo, ZoomInfo, and Lusha to This

Pain PointHow This Solves It
Apollo/ZoomInfo costs $100-500/mo for stale dataPay $2 per 1,000 leads -- fresh data scraped in real time, no subscription
Purchased lead lists have 30-50% bounce ratesBuilt-in 6-check email verification with B2B tier classification (TIER_1 = <5% bounce)
Contact databases miss small/mid-size companiesScrapes any company website directly -- not limited to a pre-built database
LinkedIn Sales Navigator requires manual prospectingAutomated LinkedIn employee discovery finds decision makers via search engines
Generic web scrapers miss contacts in JavaScriptHeadless Chromium + 4 extraction methods catch contacts hidden in JSON-LD, JS bundles, and hydration payloads
No way to tell who's a decision makerAI-powered lead scoring with seniority mapping, persona classification, and authority scoring
Exporting data requires manual cleanup14-rule junk removal, dedup, and CRM-ready export in CSV, Excel, and JSON Lines
Running the same list twice wastes timeIncremental delta mode skips recently-enriched companies, saving ~70% on repeat runs

Key Features

Multi-Source B2B Data Extraction

  • Website email extractor with 4-method contact extraction (JSON-LD, team cards, heuristic proximity, LinkedIn URLs)
  • 5-layer email discovery engine: DNS/OSINT, direct crawl, search engines, PDF mining, social platforms
  • LinkedIn employee discovery via multi-query search with role-based variations (CEO, CTO, VP, Director, Manager)
  • 8-platform social media enrichment: LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor
  • SERP intelligence: extract revenue estimates, funding signals, employee counts, and acquisition news from search results
  • File intelligence: download and parse PDFs for contacts, org charts, and emails invisible to HTML scrapers
  • Hidden contact extraction: parse __NEXT_DATA__, __NUXT__, __INITIAL_STATE__, and JS hydration payloads

AI-Powered Lead Scoring & Sales Intelligence

  • Decision maker identification with 5-level seniority mapping (C-Suite, VP/Director, Manager, Staff, Unknown)
  • Persona classification: Economic Buyer, Champion, Technical Evaluator, Influencer
  • Combined priority score (0-100): 60% authority + 40% email confidence
  • Company intelligence profile: tech stack fingerprinting (18+ frameworks), SaaS detection, company maturity scoring
  • Quality gate engine: configurable thresholds filter low-quality contacts before export

Email Verification & Deliverability

  • 6-check verification pipeline: syntax, MX records, catch-all detection, disposable filtering, role detection, DKIM/SPF/DMARC
  • B2B send tiers: TIER_1_SEND (safe), TIER_2_LIKELY_GOOD, TIER_3_REVIEW, SKIP
  • 8-pattern email prediction for contacts missing emails: first.last@, flast@, firstlast@, first_last@, and more
  • Confidence scoring (0-100) with weighted components: SMTP +40, MX +20, auth records +15, pattern +10

Enterprise-Grade Infrastructure

  • Adaptive concurrency: auto-scales 4-32 workers based on success rate and response times
  • Cross-run shared cache: eliminates redundant DNS lookups and re-crawls across pipeline runs
  • Incremental delta mode: skip companies enriched within configurable freshness window (1-90 days)
  • Executive correlation engine: cross-source contact dedup with fuzzy Levenshtein name matching
  • Webhook dispatcher: HTTP POST results to your CRM/Zapier/webhook endpoint with 3x retry
  • Checkpoint/resume: large runs survive restarts and actor migrations

Flexible Export & Integration

  • Multi-format export: Apify Dataset + CSV + 5-sheet Excel + JSON Lines (.jsonl)
  • 6 dataset views: All Contacts, High Priority Decision Makers, Companies, Company Intelligence, Funding Intel
  • Webhook integration: real-time HTTP POST on pipeline completion with summary or full results
  • CRM-ready: import directly into HubSpot, Salesforce, Pipedrive, Apollo, Lemlist, Instantly, Smartlead
  • API access: full REST API for programmatic integration, scheduling, and automation

Use Cases

Cold Email Outreach & Email Marketing

Upload your target company list and get back verified decision maker email addresses with B2B send tier classification. Filter by TIER_1_SEND for the safest emails (typically <5% bounce rate), or include TIER_2 for broader reach. Import directly into Lemlist, Instantly, Smartlead, Apollo, Woodpecker, Mailchimp, or any email marketing platform.

Sales Prospecting & Lead List Building

Build targeted B2B lead lists from scratch. Start with just company names -- the pipeline discovers websites, extracts leadership teams, finds and verifies emails, and scores every contact. Export the High_Priority sheet for your SDR team's daily call list. Use the funding intelligence to prioritize recently-funded companies.

Account-Based Marketing (ABM)

Enrich your target account list with verified contacts, social profiles, tech stack data, and company intelligence. The decision maker mapping identifies Economic Buyers and Champions at each company. Social enrichment gives your team conversation starters across LinkedIn, Twitter, and more.

CRM Data Enrichment & Cleansing

Have a CRM full of companies but missing contact details? Upload your company list and the pipeline fills in emails, phones, LinkedIn URLs, social profiles, tech stack, and decision maker details. The incremental mode ensures you only pay for new enrichment -- previously-processed companies are skipped.

Competitive Intelligence & Market Research

Scrape company websites at scale to collect organizational data, leadership teams, tech stacks, funding signals, and social presence. The Company Intelligence view shows tech stack fingerprinting, SaaS detection, employee count estimates, and company maturity scores. The Funding Intel view tracks revenue estimates and acquisition signals.

Recruitment & Talent Sourcing

Find hiring managers and leadership contacts at target companies. The pipeline extracts LinkedIn profiles alongside email addresses, making it easy to combine email outreach with LinkedIn messaging. Use the persona classification to identify Technical Evaluators and Champions.

Apollo/ZoomInfo Data Supplement

Supplement your existing Apollo or ZoomInfo data with fresh website scraping. This tool scrapes company websites in real-time rather than relying on a static database, finding contacts that Apollo and ZoomInfo miss -- especially at small/mid-size companies, international firms, and recently-hired executives.


How It Works -- 24-Stage Intelligence Pipeline

Architecture Overview

INPUT: Company list (CSV / Excel / URL / JSON)
|
v
+--[ Stage 1: INGEST ]--[ Stage 2: DISCOVER ]--[ Stage 3: GOOGLE BOOST ]
|
+--[ Stage 4: ENRICH (adaptive concurrency, browser pool) ]
| |
| +---> Website crawling (35+ page paths)
| +---> 4-method contact extraction
| +---> Smart retry escalation (HTTP -> Browser -> Stealth -> Residential)
|
+--[ Stage 5: GEO ]--[ Stage 6: SOCIAL ]--[ Stage 7: LINKEDIN ]
|
+--[ Stage 8: SEMANTIC PAGES ]--[ Stage 9: SEARCH + SERP INTEL ]
|
+--[ Stage 10: PDF MINING ]--[ Stage 11: DEEP EXTRACT ]--[ Stage 12: HIDDEN EXTRACT ]
|
+--[ Stage 13: CONTACT INTEL ]--[ Stage 14: COMPANY INTEL ]--[ Stage 15: EXEC CORRELATION ]
|
+--[ Stage 16: EMAIL DISCOVER ]--[ Stage 17: EMAIL PREDICT ]--[ Stage 18: VERIFY ]
|
+--[ Stage 19: SCORE ]--[ Stage 20: CLEANUP ]--[ Stage 21: QUALITY GATE ]
|
+--[ Stage 22: METRICS ]--[ Stage 23: EXPORT ]--[ Stage 24: WEBHOOK ]
|
v
OUTPUT: Verified leads -> Apify Dataset + CSV + Excel + JSON Lines + Webhook

Stage-by-Stage Breakdown

Stage 1: INGEST -- Smart Input Parsing

Loads CSV, Excel, or JSON input. Auto-detects company name and website columns from 30+ aliases (company_name, organisation, business, exhibitor, firm, url, domain, web_address, and more). Preserves all additional columns in output. Supports UTF-8, UTF-8 with BOM, and Latin-1 encodings.

Stage 2: DISCOVER -- Website Discovery

For companies without a website, runs multi-engine domain discovery via DuckDuckGo and Bing. Filters out 45+ aggregator/social domains (LinkedIn, Wikipedia, Alibaba, ZoomInfo, Bloomberg, etc.). Scores results using Levenshtein distance and keyword overlap. Cross-run cache avoids re-discovering known domains.

Stage 3: GOOGLE BOOST -- 8-Step Discovery Enhancement

Eight targeted search passes per company:

  • Domain Recovery -- 6 query angles to find missing websites
  • Email Discovery -- Search for published email addresses
  • Social Discovery -- Find company social profiles
  • Phone & Address -- Discover published contact info
  • DNS Validation -- Full MX, SPF, DKIM, DMARC checks with domain trust scoring

Stage 4: ENRICH -- Adaptive Hybrid Extraction

Launches parallel workers with adaptive concurrency (4-32, auto-scaling). Each company website is crawled across 35+ page paths (/about, /team, /leadership, /contact, /people, /management, /staff, /executives, /board, /partners, /founders, /imprint, and more).

Smart Retry Escalation:

  1. HTTP request (fastest, lowest resource)
  2. Browser with DOM wait (for JavaScript-rendered pages)
  3. Stealth browser (for bot-protected sites)
  4. Residential proxy rotation (for aggressive blockers)

Four extraction methods run on every page:

  1. JSON-LD Parsing -- Reads <script type="application/ld+json"> for schema.org Person/Organization data
  2. Team Card Selectors -- Matches 18 CSS patterns (.team-member, .staff-card, .leadership-card, etc.)
  3. Heuristic Matching -- Finds domain emails and matches nearby names/titles using proximity analysis
  4. LinkedIn URL Extraction -- Extracts contact info from LinkedIn /in/ URLs on the page

Stage 5: GEO ENRICH -- Location Intelligence

Extracts addresses, cities, states, countries, and postal codes from website content. Enriches with geographic metadata for regional targeting campaigns.

Stage 6: SOCIAL ENRICH -- 8-Platform Social Discovery

Discovers company profiles on LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, Crunchbase, and Glassdoor. Uses site-specific DuckDuckGo queries with name matching and URL slug validation. Calculates social presence score (0-100) weighted by B2B relevance.

Stage 7: LINKEDIN DISCOVER -- Employee Discovery

Multi-query LinkedIn employee discovery using search engines (no LinkedIn login required):

  • 4-tier role queries: C-Suite, VP/Director, Management, Specialist
  • Parallel 3-concurrent search with DuckDuckGo + Bing fallback
  • Extracts names from LinkedIn URL slugs and search snippets
  • Deduplicates against contacts already found during website crawling

Stage 8: SEMANTIC PAGE DETECT -- Intelligent Page Classification

Analyzes crawled HTML to classify pages by type: leadership, speaker, board, investor relations, careers, and partner pages. Detected pages feed into subsequent extraction stages for targeted re-processing.

Stage 9: SEARCH EXPANSION + SERP Intelligence

Auto-generates 8-category search queries per company (employee, executive, email, PDF, hiring, press, conference, investor). Extracts structured intelligence from search snippets:

  • Revenue estimates ($M/$B from financial mentions)
  • Funding signals (Series A-F, seed rounds, investment amounts)
  • Employee counts (from "X employees" mentions)
  • Founded year and acquisition signals

Stage 10: FILE INTELLIGENCE -- PDF Mining

Downloads and parses PDFs/documents found during search expansion and crawling:

  • Extracts emails matching company domain
  • Mines contacts using name-title proximity matching (200-char context window)
  • Processes up to 5 PDFs per company (max 20 pages, 10MB per file)
  • Finds contacts in annual reports, brochures, org charts, and catalogs invisible to HTML scrapers

Stage 11: DEEP CONTACT EXTRACT -- Second-Pass Extraction

Runs a 4-method deep re-extraction on all crawled HTML. For companies with 0 contacts after initial crawl, triggers a targeted re-crawl of leadership and team pages. Extracts company-level info (general email, phone) separately from personal contacts.

Stage 12: HIDDEN CONTACT EXTRACT -- JavaScript Payload Mining

Parses contacts hidden in JavaScript bundles and framework payloads:

  • __NEXT_DATA__ (Next.js)
  • __NUXT__ (Nuxt.js)
  • window.__INITIAL_STATE__ (Vue/Redux)
  • Inline JSON-LD arrays and embedded API response objects

Stage 13: CONTACT INTELLIGENCE -- Decision Maker Mapping

  • Seniority inference (0-5 scale) from 40+ title keywords
  • Persona classification: Economic Buyer, Champion, Technical Evaluator, Influencer
  • Target title matching against 30+ B2B decision maker titles
  • Authority scoring (0-100) with weighted components
  • Circuit breaker learning for per-domain intelligence

Stage 14: COMPANY INTEL -- Company Intelligence Profile

Builds a comprehensive company profile:

  • Tech stack fingerprinting (18+ frameworks from HTTP headers and HTML)
  • Analytics tools detection (Google Analytics, Segment, Mixpanel, etc.)
  • Employee count estimation from multiple signals
  • SaaS detection and hiring velocity signals
  • Company maturity score (0-100)

Stage 15: EXECUTIVE CORRELATION -- Fuzzy Dedup Engine

Cross-references contacts from all extraction methods (crawl, deep extract, hidden, LinkedIn, search, file intel). Merges duplicates using:

  • Email matching (exact)
  • LinkedIn slug matching (exact)
  • Name matching (exact + fuzzy Levenshtein with edit distance threshold 1-2)
  • Cross-source preference: crawled > deep extract > hidden > file intel > LinkedIn > search > predicted
  • Builds unified profiles with composite confidence scores

Stage 16: EMAIL DISCOVER -- 5-Layer Email Discovery

LayerMethodWhat It Finds
Layer 0DNS/OSINT (MX, DMARC, SPF record parsing)Admin/reporting emails from DNS
Layer 1Direct HTTP crawl (/contact, /about, /impressum, /kontakt)Page-embedded emails
Layer 2DuckDuckGo multi-query searchPublicly indexed emails
Layer 3PDF document search (filetype:pdf)Emails in documents
Layer 4GitHub + LinkedIn site-specific searchDeveloper/professional emails

Stage 17: EMAIL PREDICT -- Pattern Learning

Analyzes known emails at each domain to detect the dominant pattern. Generates predictions using 8 templates: first.last@, flast@, firstlast@, first_last@, first@, last@, last.first@, f.last@.

Stage 18: VERIFY -- 6-Check Email Verification

CheckWhat It Validates
SyntaxRFC 5322 email format
MX RecordsDomain accepts mail
Catch-AllDomain accepts all addresses (reduces confidence)
DisposableFilters Mailinator, Guerrilla Mail, TempMail, 20+ providers
Role AddressFlags info@, admin@, noreply@, sales@, 25+ generic prefixes
AuthenticationDKIM, SPF, DMARC record presence and configuration

B2B Send Tiers:

TierScoreRecommendation
TIER_1_SEND80-100Safe to send. Valid MX, structured address, strong auth.
TIER_2_LIKELY_GOOD50-79Likely valid. Minor concerns (role address, generic provider).
TIER_3_REVIEW30-49Manual review before sending.
SKIP0-29Do not send. Invalid, disposable, or failed checks.

Stage 19: SCORE -- Lead Scoring Engine

Combined priority score = 60% authority score + 40% verification confidence.

SeniorityTitle ExamplesAuthority Score
C-Level (5)CEO, Founder, CTO, CFO, COO, President, Owner80-100
VP/Director (4)Vice President, Director, Head of, Partner, GM70-79
Manager (3)Manager, Team Lead, Senior Manager60-69
Staff (2)Engineer, Developer, Analyst, Specialist40-59
UnknownEmail-only contact (no title found)40

Stage 20: CLEANUP -- 14-Rule Data Quality

Removes junk contacts: duplicates, short names (<3 chars), UI artifacts ("View Bio", "Read More", "Subscribe", "Menu"), navigation strings, placeholder text. Sorts by authority score. Caps at maxContactsPerCompany with decision makers retained first.

Stage 21: QUALITY GATE -- Configurable Thresholds

Filters contacts below your configured minLeadScore and minConfidenceScore. Tags every contact with data freshness: verified, crawled, linkedin_only, predicted_only, search_derived. Separates filtered contacts for optional review.

Stage 22: METRICS -- Pipeline Analytics

Computes extraction rates, verification tier distributions, tech stack distribution, proxy health, stage-level timing, per-domain quality metrics, and cache hit rates.

Stage 23: EXPORT -- Multi-Format Output

  • Apify Dataset -- Browsable, downloadable as JSON/CSV/Excel, accessible via API
  • CSV (output.csv) -- UTF-8 with BOM for Excel compatibility
  • Excel (output.xlsx) -- 5-sheet workbook: Contacts, Companies, Locations, High_Priority, Audit
  • JSON Lines (output.jsonl) -- One JSON object per line for streaming ingestion (BigQuery, data pipelines)

Stage 24: WEBHOOK -- Real-Time Delivery

HTTP POST to your endpoint on pipeline completion:

  • Summary stats (companies, contacts, decision makers, verified emails)
  • Optional full results payload
  • 3x exponential backoff retry (5s, 25s, 125s)
  • Dead letter queue to KeyValueStore on failure

Anti-Blocking Technology

This pipeline uses enterprise-grade anti-detection to maximize extraction rates:

FeatureHow It Works
Smart Retry EscalationHTTP -> Browser -> Stealth -> Residential proxy (never same method twice)
Adaptive ConcurrencyAuto-scales 4-32 workers based on success rate and response times
Playwright Stealthnavigator.webdriver spoofing, timezone/locale/device randomization
Browser State IsolationReset cookies, cache, localStorage every 25 requests
Resource BlockingBlock third-party trackers only, preserve first-party JS/XHR for data extraction
Domain Rate LimitingPer-domain circuit breaker with failure threshold and recovery timeout
Proxy RotationResidential for corporate sites, datacenter for static, mobile fallback for CAPTCHA
Sitemap-First CrawlParse sitemap.xml first, prioritize contact/team/leadership pages (Tier 1-4 system)
Memory ScalingReduce browser tabs at 75% RAM, preserve HTTP workers for throughput

Input Parameters

Data Input (choose one)

ParameterTypeDescription
inputFileFile uploadUpload a CSV or Excel file with company names and/or websites
inputUrlStringPublic URL to a CSV or Excel file
companiesJSON arrayInline company list as JSON objects

Settings & Pricing

ParameterTypeDefaultDescription
maxResultsInteger20Max companies to process. Free: 20/run. Beyond: $2/1,000
workersInteger16Initial parallel workers (adaptive: auto-scales 4-32)
maxContactsPerCompanyInteger20Contact cap per company. Decision makers prioritized

Incremental & Quality

ParameterTypeDefaultDescription
incrementalModeBooleanfalseSkip recently-enriched companies (~70% time savings on repeats)
incrementalFreshnessDaysInteger7Days before cached data is considered stale (1-90)
minLeadScoreInteger0Quality gate: minimum combined_priority to include in export
minConfidenceScoreInteger0Quality gate: minimum confidence score to include in export

Webhook & Export

ParameterTypeDefaultDescription
webhookUrlString--HTTP endpoint to receive results on completion
webhookSendFullResultsBooleanfalseInclude full data in webhook (vs summary only)
exportJsonLinesBooleanfalseAlso export as .jsonl in KeyValueStore

Pipeline Stage Controls

ParameterTypeDefaultDescription
skipGoogleBoostBooleanfalseSkip 8-step Google Discovery (~30% faster)
skipSocialEnrichmentBooleanfalseSkip 8-platform social discovery (~15% faster)
skipLinkedInDiscoveryBooleanfalseSkip LinkedIn employee discovery
skipSemanticPageDetectBooleanfalseSkip semantic page classification
skipSearchExpansionBooleanfalseSkip search expansion + SERP intelligence
skipFileIntelligenceBooleanfalseSkip PDF mining
skipDeepContactExtractBooleanfalseSkip deep 4-method re-extraction
skipHiddenContactExtractBooleanfalseSkip JS/JSON payload extraction
skipContactIntelligenceBooleanfalseSkip decision maker mapping
skipCompanyIntelBooleanfalseSkip company intelligence profile
skipExecutiveCorrelationBooleanfalseSkip cross-source contact dedup
skipEmailDiscoveryBooleanfalseSkip 5-layer email discovery
skipEmailPredictionBooleanfalseSkip 8-pattern email prediction
skipVerificationBooleanfalseSkip 6-check email verification
skipQualityGateBooleanfalseSkip quality gate filtering

Proxy

ParameterTypeDescription
proxyConfigurationProxyApify Proxy config. Residential strongly recommended

Supported Column Names

The actor auto-detects columns using these aliases (case-insensitive):

Company name: company_name, company, name, organisation, organization, business_name, business, firm, exhibitor, exhibitor_name

Website: website, company_website, url, web, domain, site, official_domain, homepage, web_address

Any additional columns in your input file are preserved in the output.


Output Schema

Each row represents one contact (or one company if no contacts were found).

Contact Fields

FieldTypeDescription
contact_nameStringFull name
contact_titleStringJob title
contact_emailStringEmail address
contact_phoneStringDirect phone number
contact_linkedinStringLinkedIn profile URL
extraction_methodStringHow found: jsonld, team_card, heuristic, linkedin, deep_extract, hidden_extract, file_intel, search
is_decision_makerBooleanHolds a leadership position
persona_typeStringEconomic Buyer, Champion, Technical Evaluator, Influencer
seniorityInteger (0-5)Title seniority level
lead_scoreInteger (0-100)Authority score
combined_priorityInteger (0-100)Blended: 60% authority + 40% verification
priority_bandStringHIGH, MEDIUM, LOW, SKIP
verification_statusStringvalid, risky, invalid, unknown
b2b_tierStringTIER_1_SEND, TIER_2_LIKELY_GOOD, TIER_3_REVIEW, SKIP
confidence_scoreInteger (0-100)Email deliverability confidence
correlation_confidenceInteger (0-100)Cross-source correlation score
data_freshnessStringverified, crawled, linkedin_only, predicted_only, search_derived
auth_scoreInteger (0-100)Domain authentication score
email_typeStringcrawled, discovered, predicted

Company Fields

FieldTypeDescription
company_nameStringCompany name
company_websiteStringFull URL
domainStringNormalized domain (e.g., acme.com)
company_emailsStringSemicolon-separated company emails
company_phonesStringSemicolon-separated phone numbers
linkedin_companyStringLinkedIn company page
twitter_urlStringTwitter/X profile
facebook_url, instagram_url, youtube_urlStringSocial profiles
github_url, crunchbase_url, glassdoor_urlStringBusiness profiles
company_city, company_countryStringLocation
tech_stackStringDetected technologies
analytics_toolsStringDetected analytics platforms
company_maturity_scoreInteger (0-100)Business maturity index
is_saasBooleanSaaS company detection
employee_count_estimateStringEstimated employee count
estimated_revenue_mNumberRevenue estimate (millions USD) from SERP intel
funding_amount_mNumberFunding amount (millions USD) from SERP intel
funding_stageStringFunding stage (Seed, Series A-F)
has_mx, has_spf, has_dkim, has_dmarcBooleanDNS validation
domain_scoreInteger (0-100)Domain trust score
website_quality_scoreInteger (0-100)Website quality index
pages_crawledIntegerPages successfully scraped
enrichment_statusStringdone, cached, failed, error

Pricing

TierActor FeeResults Per RunBest For
Free$0Up to 20Testing the pipeline
Pay-Per-Event$2 per 1,000 resultsUnlimitedProduction lead generation

Apify platform compute charges (CPU, memory, proxy) are billed separately per your Apify subscription.

Cost Comparison vs Alternatives

Solution1,000 Leads10,000 Leads100,000 Leads
This Actor~$3~$30~$250
Apollo.io$49/mo (limited)$99-399/moCustom pricing
ZoomInfo$250+/mo$500+/mo$1,000+/mo
Lusha$49/mo (limited)$199/moCustom pricing
Hunter.io$49/mo (500 lookups)$199/moCustom pricing

Actor costs include both per-event fees and estimated Apify platform charges. All stages enabled, residential proxy.

Cost Estimation by Batch Size

ScenarioCompaniesActor FeeEst. PlatformTotal
Quick test20$0 (free)~$0.05~$0.05
Small batch100$0.16~$0.15~$0.31
Medium batch500$0.96~$0.50~$1.46
Large batch1,000$1.96~$1.00~$2.96
Enterprise10,000$19.96~$10~$30

Usage Examples

Quick Start (Apify Console)

  1. Open the actor page and click Start
  2. Upload a CSV with a company_name column (and optionally website)
  3. Set maxResults to the number of companies to process
  4. Click Start -- watch progress: "Stage 4/24: Enriching 45/100 companies..."
  5. Download from Dataset tab (JSON/CSV/Excel) or KeyValueStore (multi-sheet Excel, JSON Lines)

Inline JSON Input

{
"companies": [
{"company_name": "Stripe", "website": "https://stripe.com"},
{"company_name": "Notion", "website": "https://notion.so"},
{"company_name": "Linear"},
{"company_name": "Vercel"},
{"company_name": "Figma"}
],
"maxResults": 20,
"workers": 16,
"maxContactsPerCompany": 15,
"exportJsonLines": true
}

With Quality Gate & Webhook

{
"inputUrl": "https://example.com/target-companies.csv",
"maxResults": 500,
"workers": 16,
"minLeadScore": 50,
"minConfidenceScore": 40,
"webhookUrl": "https://hooks.zapier.com/hooks/catch/123456/abcdef/",
"webhookSendFullResults": true,
"exportJsonLines": true,
"proxyConfiguration": {"useApifyProxy": true}
}

Incremental Mode (Repeat Runs)

{
"inputUrl": "https://example.com/same-companies.csv",
"maxResults": 1000,
"incrementalMode": true,
"incrementalFreshnessDays": 14,
"proxyConfiguration": {"useApifyProxy": true}
}

Python API

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run_input = {
"inputUrl": "https://example.com/target-companies.csv",
"maxResults": 500,
"workers": 16,
"maxContactsPerCompany": 15,
"minLeadScore": 50,
"webhookUrl": "https://your-crm.com/webhook",
"proxyConfiguration": {"useApifyProxy": True},
}
run = client.actor("leadslogix/leadslogix-pipeline").call(run_input=run_input)
# Get decision makers
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("is_decision_maker") and item.get("b2b_tier") == "TIER_1_SEND":
print(f"{item['company_name']} | {item['contact_name']} | "
f"{item['contact_email']} | {item['combined_priority']}")
# Download Excel
kv = client.key_value_store(run["defaultKeyValueStoreId"])
xlsx = kv.get_record("output.xlsx")
with open("leads.xlsx", "wb") as f:
f.write(xlsx["value"])
# Download JSON Lines
jsonl = kv.get_record("output.jsonl")
with open("leads.jsonl", "wb") as f:
f.write(jsonl["value"])

JavaScript API

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("leadslogix/leadslogix-pipeline").call({
companies: [
{ company_name: "Datadog", website: "https://datadoghq.com" },
{ company_name: "Cloudflare", website: "https://cloudflare.com" },
{ company_name: "Twilio", website: "https://twilio.com" },
],
maxResults: 50,
workers: 16,
minLeadScore: 50,
exportJsonLines: true,
proxyConfiguration: { useApifyProxy: true },
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
const tier1DecisionMakers = items.filter(
(i) => i.is_decision_maker && i.b2b_tier === "TIER_1_SEND"
);
console.log(`Found ${tier1DecisionMakers.length} verified decision makers`);
for (const lead of tier1DecisionMakers) {
console.log(`${lead.company_name} | ${lead.contact_name} | ${lead.contact_email} | Score: ${lead.combined_priority}`);
}

cURL

# Start a run
curl -X POST "https://api.apify.com/v2/acts/leadslogix~leadslogix-pipeline/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"companies": [
{"company_name": "Figma", "website": "https://figma.com"},
{"company_name": "Canva", "website": "https://canva.com"}
],
"maxResults": 20,
"workers": 16,
"webhookUrl": "https://your-endpoint.com/webhook"
}'

Webhook Payload Example

When the pipeline completes, your webhook receives:

{
"event": "pipeline_complete",
"pipeline_version": "v7.0",
"timestamp": "2026-05-19T12:30:00.000Z",
"summary": {
"total_companies": 100,
"total_contacts": 450,
"high_priority": 85,
"decision_makers": 120,
"emails_found": 380,
"verified_emails": 310
},
"audit": {
"total_companies": 100,
"elapsed_seconds": 1200,
"pipeline_version": "v7.0 (24-stage intelligence engine)"
}
}

Scheduled Lead Generation

Automate recurring prospecting:

  1. Go to the actor page and click Schedules
  2. Create a schedule (e.g., 0 8 * * 1 for every Monday at 8 AM)
  3. Point the input to a URL that updates with new target companies
  4. Enable incrementalMode to skip previously-enriched companies
  5. Set a webhookUrl to receive results in your CRM automatically

Integrations

PlatformIntegration Method
Google SheetsAuto-sync via Apify Google Sheets integration
HubSpotImport CRM-ready CSV, or use webhook for real-time sync
SalesforceImport enriched contacts via CSV, or connect via Zapier
PipedriveCSV import or webhook integration
Lemlist / Instantly / SmartleadExport TIER_1 emails as CSV for cold email campaigns
Apollo / Outreach / SalesLoftImport as prospect sequence
Zapier / MakeConnect to 5,000+ apps via Apify Zapier integration
WebhooksDirect HTTP POST to any endpoint on pipeline completion
BigQuery / SnowflakeIngest JSON Lines (.jsonl) output for data warehouse
Custom APIFull Apify REST API for programmatic integration and scheduling

Performance Benchmarks

MetricTypical Result
Companies per hour100-200 (all stages, residential proxy)
Contacts per company3-15 (varies by company size and web presence)
Email discovery rate60-80% of companies yield at least one email
Decision maker rate30-50% of contacts are flagged as decision makers
TIER_1 email rate40-60% of verified emails are TIER_1_SEND
Cache hit rate30-70% on repeat runs (incremental mode)
Adaptive scaling4-32 workers, responds within 10s to rate changes

Estimated Run Times

CompaniesAll StagesSkip Google+SocialDiscovery Only
203-5 min2-3 min1-2 min
10015-25 min10-15 min5-8 min
5001-2 hours40-70 min20-30 min
1,0003-5 hours2-3 hours45-60 min
10,00024-48 hours16-30 hours6-10 hours

Troubleshooting

Common Issues

Low contact extraction rate

  • Enable all extraction stages (don't skip deep extract, hidden extract, or file intelligence)
  • Use residential proxies -- datacenter proxies get blocked by many corporate sites
  • Companies with simple brochure websites may genuinely have few published contacts

0 contacts for a company with a known team page

  • The website may use heavy JavaScript rendering. The pipeline retries with browser mode, but some SPAs require stealth mode.
  • Check if the website blocks headless browsers (Cloudflare, Akamai). Residential proxy usually bypasses this.
  • Non-English websites may use different page structure patterns.

Low email verification scores

  • Catch-all domains (accept any address) reduce confidence scores. This is expected behavior.
  • Role addresses (info@, sales@) score lower than personal addresses. Filter by email_type if needed.
  • Some email providers have aggressive rate limiting. The circuit breaker prevents excessive retries.

Run times are slow

  • Reduce workers if you see high failure rates (adaptive concurrency will also do this automatically)
  • Skip Google Boost and Social Enrichment for ~40% faster runs
  • Use incremental mode for repeat runs to skip cached companies
  • Consider splitting very large lists (10K+) into batches

Webhook not receiving data

  • Verify your endpoint accepts POST with JSON content-type
  • Check the webhook_dead_letter key in KeyValueStore for failed delivery details
  • The webhook retries 3x with exponential backoff (5s, 25s, 125s) before giving up

Memory errors on large runs

  • The pipeline auto-scales down browser tabs at 75% RAM usage
  • For 5,000+ companies, use 8-16 workers (not 32) to manage memory
  • Skip file intelligence (PDF parsing) to reduce memory pressure

Limitations

  • Email verification is DNS-based, not SMTP-based. It confirms the domain accepts mail but does not verify individual mailbox existence. For maximum accuracy on cold campaigns, run TIER_2 emails through an additional SMTP verification service.
  • Websites behind login walls or with aggressive anti-bot measures may return limited contacts.
  • Non-English websites (Korean, Chinese, Japanese, Arabic) may have lower extraction rates due to different page structures and email conventions.
  • LinkedIn discovery uses search engines, not direct LinkedIn scraping. Results depend on LinkedIn profile visibility in search engine indexes.
  • SERP intelligence (revenue, funding) is extracted from search snippets with regex and may not be available or accurate for all companies.
  • PDF mining requires the PyMuPDF library. Complex or scanned PDFs may not parse correctly.
  • Social enrichment depends on DuckDuckGo availability. Rate limiting during heavy usage may reduce discovery rates.

Roadmap

  • SMTP-level mailbox verification (in addition to DNS-based)
  • LinkedIn profile page parsing for richer contact data
  • HubSpot and Salesforce native API integration
  • Google Sheets direct push (no Zapier required)
  • AI-powered contact relevance scoring using LLMs
  • Company news monitoring and trigger events
  • Multi-language extraction optimization (CJK, Arabic, Cyrillic)
  • Real-time progress webhooks (per-stage, not just completion)

Frequently Asked Questions

How is this different from Apollo, ZoomInfo, or Lusha? Those tools maintain a pre-built database of contacts. This tool scrapes company websites and search engines in real time, finding contacts that static databases miss -- especially at small/mid-size companies, international firms, and recently-hired executives. It's also 10-50x cheaper per lead.

Do I need API keys? No. This tool uses public web data, DNS records, and search engines. No paid API subscriptions required.

What input formats are supported? CSV, Excel (.xlsx, .xls), and inline JSON. Upload directly, provide a URL, or pass data via the API.

How does incremental mode work? When enabled, the pipeline checks its cross-run cache for each company. If a company was successfully enriched within the freshness window (default 7 days), it's skipped. This saves ~70% on repeat runs with the same company list.

How does the quality gate work? Set minLeadScore and/or minConfidenceScore to filter contacts below your threshold. Contacts that don't pass are excluded from the export but tracked in pipeline metrics. Set both to 0 to export everything.

What does the webhook send? By default, a summary payload with totals (companies, contacts, decision makers, verified emails). Enable webhookSendFullResults to include the full dataset in the POST body.

How accurate is the email verification? TIER_1_SEND emails typically have <5% bounce rate in production cold email campaigns. The pipeline checks MX, SPF, DKIM, DMARC, catch-all, disposable, and role addresses. It does not perform SMTP-level mailbox verification.

Can I use this for a single company? Yes. Use inline JSON with one company and maxResults: 1. The API supports synchronous runs.

Does this work for non-English companies? Yes, but extraction rates are typically 30-50% lower for CJK (Chinese/Japanese/Korean) and Arabic websites due to different page structures and email conventions.

What proxy should I use? Residential proxies give the best results. Datacenter proxies work for most sites but corporate sites may block them. Running without proxy is not recommended for batches over 20 companies.

Can I skip stages to save time? Yes. Toggle any of the 14 skip parameters. Skipping Google Boost + Social Enrichment saves ~40% runtime. The pipeline automatically adjusts downstream stages.

What's the maximum batch size? Technically unlimited with maxResults up to 100,000. For runs over 5,000 companies, we recommend 8-16 workers with residential proxy and incremental mode enabled.


Changelog

v7.0 (2026-05-19)

  • Quality Gate Engine (Stage 21): configurable lead score and confidence thresholds, data freshness tracking
  • Webhook Dispatcher (Stage 24): HTTP POST with 3x exponential backoff, dead letter queue
  • Incremental Delta Mode: skip recently-enriched companies with configurable freshness window
  • Fuzzy Dedup: Levenshtein edit distance name matching in executive correlation
  • Cross-Source Preference: crawled contacts preferred over predicted in merge conflicts
  • JSON Lines Export: streaming .jsonl output for data pipeline ingestion
  • New fields: data_freshness, passed_quality_gate
  • 8 new input parameters for quality, webhook, incremental, and export control

v6.0 (2026-05-19)

  • SERP Intelligence: revenue, funding, employee count, acquisition signals from search snippets
  • File Intelligence (Stage 10): PDF mining for contacts, org charts, emails
  • Executive Correlation Engine (Stage 15): cross-source contact dedup with unified profiles
  • Adaptive Concurrency: dynamic 4-32 worker scaling
  • Shared Cache Layer: cross-run KV store cache for DNS, domains, enrichment

v5.0 (2026-05-18)

  • Semantic Page Detection, Search Expansion Matrix, Hidden Contact Extraction
  • Company Intelligence Profile (tech stack, maturity scoring, SaaS detection)
  • Structured Pipeline Metrics, Improved Proxy Routing

v4.0 (2026-05-17)

  • Deep Contact Extraction (4-method second pass)
  • Contact Intelligence Engine (decision maker mapping, persona classification)

v3.5 (2026-05-16)

  • URL Intelligence, Sitemap-First Crawl, Smart Retry Escalation
  • Stealth Fingerprinting, Adaptive Memory Scaling, Queue Segmentation

v2.0 (2026-05-15)

  • Merged 9 actors into unified 12-stage pipeline
  • 4-method extraction, 5-layer email discovery, social enrichment

v1.0 (2026-05-08)

  • Initial release: 8-stage pipeline with Playwright enrichment