# B2B Lead Scraper & Email Finder - Decision Makers (`painless_tweet/leadslogix-pipeline`) Actor

Upload a company list, get verified decision maker emails, phones, LinkedIn, and social profiles. 12-stage pipeline: website discovery, contact extraction, email finder, verification, social enrichment, lead scoring, and Excel export. For email marketing, cold outreach, and B2B prospecting.

- **URL**: https://apify.com/painless\_tweet/leadslogix-pipeline.md
- **Developed by:** [Leadslogix LLC](https://apify.com/painless_tweet) (community)
- **Categories:** Lead generation, Marketing, Business
- **Stats:** 6 total users, 4 monthly users, 61.1% runs succeeded, 3 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $0.00005 / actor start

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## B2B Lead Generation Tool & Sales Intelligence Platform -- Extract Verified Decision Maker Emails at Scale

The most powerful B2B lead generation and contact enrichment tool on Apify. Extract verified decision maker emails, phone numbers, LinkedIn profiles, and company intelligence from any company list -- no API keys required. A cost-effective Apollo alternative and ZoomInfo alternative that scrapes company websites, discovers emails through 5 search layers, verifies every address, and scores contacts by seniority -- all in a single 24-stage automated pipeline.

**Upload a CSV. Get sales-ready leads back. $2 per 1,000 results.**

Built for sales teams, growth marketers, SDRs, recruiters, and agencies who need a reliable business leads database, contact discovery engine, and CRM data enrichment tool for cold email outreach, sales prospecting, account-based marketing, and lead list building at scale.

---

### Why Teams Switch from Apollo, ZoomInfo, and Lusha to This

| Pain Point | How This Solves It |
|------------|-------------------|
| Apollo/ZoomInfo costs $100-500/mo for stale data | **Pay $2 per 1,000 leads** -- fresh data scraped in real time, no subscription |
| Purchased lead lists have 30-50% bounce rates | **Built-in 6-check email verification** with B2B tier classification (TIER_1 = <5% bounce) |
| Contact databases miss small/mid-size companies | **Scrapes any company website directly** -- not limited to a pre-built database |
| LinkedIn Sales Navigator requires manual prospecting | **Automated LinkedIn employee discovery** finds decision makers via search engines |
| Generic web scrapers miss contacts in JavaScript | **Headless Chromium + 4 extraction methods** catch contacts hidden in JSON-LD, JS bundles, and hydration payloads |
| No way to tell who's a decision maker | **AI-powered lead scoring** with seniority mapping, persona classification, and authority scoring |
| Exporting data requires manual cleanup | **14-rule junk removal**, dedup, and CRM-ready export in CSV, Excel, and JSON Lines |
| Running the same list twice wastes time | **Incremental delta mode** skips recently-enriched companies, saving ~70% on repeat runs |

---

### Key Features

#### Multi-Source B2B Data Extraction
- **Website email extractor** with 4-method contact extraction (JSON-LD, team cards, heuristic proximity, LinkedIn URLs)
- **5-layer email discovery** engine: DNS/OSINT, direct crawl, search engines, PDF mining, social platforms
- **LinkedIn employee discovery** via multi-query search with role-based variations (CEO, CTO, VP, Director, Manager)
- **8-platform social media enrichment**: LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor
- **SERP intelligence**: extract revenue estimates, funding signals, employee counts, and acquisition news from search results
- **File intelligence**: download and parse PDFs for contacts, org charts, and emails invisible to HTML scrapers
- **Hidden contact extraction**: parse `__NEXT_DATA__`, `__NUXT__`, `__INITIAL_STATE__`, and JS hydration payloads

#### AI-Powered Lead Scoring & Sales Intelligence
- **Decision maker identification** with 5-level seniority mapping (C-Suite, VP/Director, Manager, Staff, Unknown)
- **Persona classification**: Economic Buyer, Champion, Technical Evaluator, Influencer
- **Combined priority score** (0-100): 60% authority + 40% email confidence
- **Company intelligence profile**: tech stack fingerprinting (18+ frameworks), SaaS detection, company maturity scoring
- **Quality gate engine**: configurable thresholds filter low-quality contacts before export

#### Email Verification & Deliverability
- **6-check verification pipeline**: syntax, MX records, catch-all detection, disposable filtering, role detection, DKIM/SPF/DMARC
- **B2B send tiers**: TIER_1_SEND (safe), TIER_2_LIKELY_GOOD, TIER_3_REVIEW, SKIP
- **8-pattern email prediction** for contacts missing emails: `first.last@`, `flast@`, `firstlast@`, `first_last@`, and more
- **Confidence scoring** (0-100) with weighted components: SMTP +40, MX +20, auth records +15, pattern +10

#### Enterprise-Grade Infrastructure
- **Adaptive concurrency**: auto-scales 4-32 workers based on success rate and response times
- **Cross-run shared cache**: eliminates redundant DNS lookups and re-crawls across pipeline runs
- **Incremental delta mode**: skip companies enriched within configurable freshness window (1-90 days)
- **Executive correlation engine**: cross-source contact dedup with fuzzy Levenshtein name matching
- **Webhook dispatcher**: HTTP POST results to your CRM/Zapier/webhook endpoint with 3x retry
- **Checkpoint/resume**: large runs survive restarts and actor migrations

#### Flexible Export & Integration
- **Multi-format export**: Apify Dataset + CSV + 5-sheet Excel + JSON Lines (.jsonl)
- **6 dataset views**: All Contacts, High Priority Decision Makers, Companies, Company Intelligence, Funding Intel
- **Webhook integration**: real-time HTTP POST on pipeline completion with summary or full results
- **CRM-ready**: import directly into HubSpot, Salesforce, Pipedrive, Apollo, Lemlist, Instantly, Smartlead
- **API access**: full REST API for programmatic integration, scheduling, and automation

---

### Use Cases

#### Cold Email Outreach & Email Marketing
Upload your target company list and get back verified decision maker email addresses with B2B send tier classification. Filter by TIER_1_SEND for the safest emails (typically <5% bounce rate), or include TIER_2 for broader reach. Import directly into Lemlist, Instantly, Smartlead, Apollo, Woodpecker, Mailchimp, or any email marketing platform.

#### Sales Prospecting & Lead List Building
Build targeted B2B lead lists from scratch. Start with just company names -- the pipeline discovers websites, extracts leadership teams, finds and verifies emails, and scores every contact. Export the High_Priority sheet for your SDR team's daily call list. Use the funding intelligence to prioritize recently-funded companies.

#### Account-Based Marketing (ABM)
Enrich your target account list with verified contacts, social profiles, tech stack data, and company intelligence. The decision maker mapping identifies Economic Buyers and Champions at each company. Social enrichment gives your team conversation starters across LinkedIn, Twitter, and more.

#### CRM Data Enrichment & Cleansing
Have a CRM full of companies but missing contact details? Upload your company list and the pipeline fills in emails, phones, LinkedIn URLs, social profiles, tech stack, and decision maker details. The incremental mode ensures you only pay for new enrichment -- previously-processed companies are skipped.

#### Competitive Intelligence & Market Research
Scrape company websites at scale to collect organizational data, leadership teams, tech stacks, funding signals, and social presence. The Company Intelligence view shows tech stack fingerprinting, SaaS detection, employee count estimates, and company maturity scores. The Funding Intel view tracks revenue estimates and acquisition signals.

#### Recruitment & Talent Sourcing
Find hiring managers and leadership contacts at target companies. The pipeline extracts LinkedIn profiles alongside email addresses, making it easy to combine email outreach with LinkedIn messaging. Use the persona classification to identify Technical Evaluators and Champions.

#### Apollo/ZoomInfo Data Supplement
Supplement your existing Apollo or ZoomInfo data with fresh website scraping. This tool scrapes company websites in real-time rather than relying on a static database, finding contacts that Apollo and ZoomInfo miss -- especially at small/mid-size companies, international firms, and recently-hired executives.

---

### How It Works -- 24-Stage Intelligence Pipeline

#### Architecture Overview

````

INPUT: Company list (CSV / Excel / URL / JSON)
|
v
+--\[ Stage 1: INGEST ]--\[ Stage 2: DISCOVER ]--\[ Stage 3: GOOGLE BOOST ]
|
+--\[ Stage 4: ENRICH (adaptive concurrency, browser pool) ]
|       |
|       +---> Website crawling (35+ page paths)
|       +---> 4-method contact extraction
|       +---> Smart retry escalation (HTTP -> Browser -> Stealth -> Residential)
|
+--\[ Stage 5: GEO ]--\[ Stage 6: SOCIAL ]--\[ Stage 7: LINKEDIN ]
|
+--\[ Stage 8: SEMANTIC PAGES ]--\[ Stage 9: SEARCH + SERP INTEL ]
|
+--\[ Stage 10: PDF MINING ]--\[ Stage 11: DEEP EXTRACT ]--\[ Stage 12: HIDDEN EXTRACT ]
|
+--\[ Stage 13: CONTACT INTEL ]--\[ Stage 14: COMPANY INTEL ]--\[ Stage 15: EXEC CORRELATION ]
|
+--\[ Stage 16: EMAIL DISCOVER ]--\[ Stage 17: EMAIL PREDICT ]--\[ Stage 18: VERIFY ]
|
+--\[ Stage 19: SCORE ]--\[ Stage 20: CLEANUP ]--\[ Stage 21: QUALITY GATE ]
|
+--\[ Stage 22: METRICS ]--\[ Stage 23: EXPORT ]--\[ Stage 24: WEBHOOK ]
|
v
OUTPUT: Verified leads -> Apify Dataset + CSV + Excel + JSON Lines + Webhook

````

#### Stage-by-Stage Breakdown

##### Stage 1: INGEST -- Smart Input Parsing
Loads CSV, Excel, or JSON input. Auto-detects company name and website columns from **30+ aliases** (company_name, organisation, business, exhibitor, firm, url, domain, web_address, and more). Preserves all additional columns in output. Supports UTF-8, UTF-8 with BOM, and Latin-1 encodings.

##### Stage 2: DISCOVER -- Website Discovery
For companies without a website, runs multi-engine domain discovery via DuckDuckGo and Bing. Filters out 45+ aggregator/social domains (LinkedIn, Wikipedia, Alibaba, ZoomInfo, Bloomberg, etc.). Scores results using Levenshtein distance and keyword overlap. Cross-run cache avoids re-discovering known domains.

##### Stage 3: GOOGLE BOOST -- 8-Step Discovery Enhancement
Eight targeted search passes per company:
- **Domain Recovery** -- 6 query angles to find missing websites
- **Email Discovery** -- Search for published email addresses
- **Social Discovery** -- Find company social profiles
- **Phone & Address** -- Discover published contact info
- **DNS Validation** -- Full MX, SPF, DKIM, DMARC checks with domain trust scoring

##### Stage 4: ENRICH -- Adaptive Hybrid Extraction
Launches parallel workers with adaptive concurrency (4-32, auto-scaling). Each company website is crawled across **35+ page paths** (/about, /team, /leadership, /contact, /people, /management, /staff, /executives, /board, /partners, /founders, /imprint, and more).

**Smart Retry Escalation:**
1. HTTP request (fastest, lowest resource)
2. Browser with DOM wait (for JavaScript-rendered pages)
3. Stealth browser (for bot-protected sites)
4. Residential proxy rotation (for aggressive blockers)

**Four extraction methods run on every page:**
1. **JSON-LD Parsing** -- Reads `<script type="application/ld+json">` for schema.org Person/Organization data
2. **Team Card Selectors** -- Matches 18 CSS patterns (.team-member, .staff-card, .leadership-card, etc.)
3. **Heuristic Matching** -- Finds domain emails and matches nearby names/titles using proximity analysis
4. **LinkedIn URL Extraction** -- Extracts contact info from LinkedIn /in/ URLs on the page

##### Stage 5: GEO ENRICH -- Location Intelligence
Extracts addresses, cities, states, countries, and postal codes from website content. Enriches with geographic metadata for regional targeting campaigns.

##### Stage 6: SOCIAL ENRICH -- 8-Platform Social Discovery
Discovers company profiles on **LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, Crunchbase, and Glassdoor**. Uses site-specific DuckDuckGo queries with name matching and URL slug validation. Calculates social presence score (0-100) weighted by B2B relevance.

##### Stage 7: LINKEDIN DISCOVER -- Employee Discovery
Multi-query LinkedIn employee discovery using search engines (no LinkedIn login required):
- **4-tier role queries**: C-Suite, VP/Director, Management, Specialist
- **Parallel 3-concurrent search** with DuckDuckGo + Bing fallback
- Extracts names from LinkedIn URL slugs and search snippets
- Deduplicates against contacts already found during website crawling

##### Stage 8: SEMANTIC PAGE DETECT -- Intelligent Page Classification
Analyzes crawled HTML to classify pages by type: leadership, speaker, board, investor relations, careers, and partner pages. Detected pages feed into subsequent extraction stages for targeted re-processing.

##### Stage 9: SEARCH EXPANSION + SERP Intelligence
Auto-generates **8-category search queries** per company (employee, executive, email, PDF, hiring, press, conference, investor). Extracts structured intelligence from search snippets:
- **Revenue estimates** ($M/$B from financial mentions)
- **Funding signals** (Series A-F, seed rounds, investment amounts)
- **Employee counts** (from "X employees" mentions)
- **Founded year** and **acquisition signals**

##### Stage 10: FILE INTELLIGENCE -- PDF Mining
Downloads and parses PDFs/documents found during search expansion and crawling:
- Extracts emails matching company domain
- Mines contacts using name-title proximity matching (200-char context window)
- Processes up to 5 PDFs per company (max 20 pages, 10MB per file)
- Finds contacts in annual reports, brochures, org charts, and catalogs invisible to HTML scrapers

##### Stage 11: DEEP CONTACT EXTRACT -- Second-Pass Extraction
Runs a 4-method deep re-extraction on all crawled HTML. For companies with 0 contacts after initial crawl, triggers a targeted re-crawl of leadership and team pages. Extracts company-level info (general email, phone) separately from personal contacts.

##### Stage 12: HIDDEN CONTACT EXTRACT -- JavaScript Payload Mining
Parses contacts hidden in JavaScript bundles and framework payloads:
- `__NEXT_DATA__` (Next.js)
- `__NUXT__` (Nuxt.js)
- `window.__INITIAL_STATE__` (Vue/Redux)
- Inline JSON-LD arrays and embedded API response objects

##### Stage 13: CONTACT INTELLIGENCE -- Decision Maker Mapping
- **Seniority inference** (0-5 scale) from 40+ title keywords
- **Persona classification**: Economic Buyer, Champion, Technical Evaluator, Influencer
- **Target title matching** against 30+ B2B decision maker titles
- **Authority scoring** (0-100) with weighted components
- **Circuit breaker learning** for per-domain intelligence

##### Stage 14: COMPANY INTEL -- Company Intelligence Profile
Builds a comprehensive company profile:
- **Tech stack fingerprinting** (18+ frameworks from HTTP headers and HTML)
- **Analytics tools detection** (Google Analytics, Segment, Mixpanel, etc.)
- **Employee count estimation** from multiple signals
- **SaaS detection** and hiring velocity signals
- **Company maturity score** (0-100)

##### Stage 15: EXECUTIVE CORRELATION -- Fuzzy Dedup Engine
Cross-references contacts from **all extraction methods** (crawl, deep extract, hidden, LinkedIn, search, file intel). Merges duplicates using:
- **Email matching** (exact)
- **LinkedIn slug matching** (exact)
- **Name matching** (exact + fuzzy Levenshtein with edit distance threshold 1-2)
- **Cross-source preference**: crawled > deep extract > hidden > file intel > LinkedIn > search > predicted
- Builds unified profiles with composite confidence scores

##### Stage 16: EMAIL DISCOVER -- 5-Layer Email Discovery

| Layer | Method | What It Finds |
|-------|--------|---------------|
| **Layer 0** | DNS/OSINT (MX, DMARC, SPF record parsing) | Admin/reporting emails from DNS |
| **Layer 1** | Direct HTTP crawl (/contact, /about, /impressum, /kontakt) | Page-embedded emails |
| **Layer 2** | DuckDuckGo multi-query search | Publicly indexed emails |
| **Layer 3** | PDF document search (`filetype:pdf`) | Emails in documents |
| **Layer 4** | GitHub + LinkedIn site-specific search | Developer/professional emails |

##### Stage 17: EMAIL PREDICT -- Pattern Learning
Analyzes known emails at each domain to detect the dominant pattern. Generates predictions using **8 templates**: `first.last@`, `flast@`, `firstlast@`, `first_last@`, `first@`, `last@`, `last.first@`, `f.last@`.

##### Stage 18: VERIFY -- 6-Check Email Verification

| Check | What It Validates |
|-------|------------------|
| **Syntax** | RFC 5322 email format |
| **MX Records** | Domain accepts mail |
| **Catch-All** | Domain accepts all addresses (reduces confidence) |
| **Disposable** | Filters Mailinator, Guerrilla Mail, TempMail, 20+ providers |
| **Role Address** | Flags info@, admin@, noreply@, sales@, 25+ generic prefixes |
| **Authentication** | DKIM, SPF, DMARC record presence and configuration |

**B2B Send Tiers:**

| Tier | Score | Recommendation |
|------|-------|---------------|
| **TIER_1_SEND** | 80-100 | Safe to send. Valid MX, structured address, strong auth. |
| **TIER_2_LIKELY_GOOD** | 50-79 | Likely valid. Minor concerns (role address, generic provider). |
| **TIER_3_REVIEW** | 30-49 | Manual review before sending. |
| **SKIP** | 0-29 | Do not send. Invalid, disposable, or failed checks. |

##### Stage 19: SCORE -- Lead Scoring Engine
**Combined priority score** = 60% authority score + 40% verification confidence.

| Seniority | Title Examples | Authority Score |
|-----------|---------------|-----------------|
| C-Level (5) | CEO, Founder, CTO, CFO, COO, President, Owner | 80-100 |
| VP/Director (4) | Vice President, Director, Head of, Partner, GM | 70-79 |
| Manager (3) | Manager, Team Lead, Senior Manager | 60-69 |
| Staff (2) | Engineer, Developer, Analyst, Specialist | 40-59 |
| Unknown | Email-only contact (no title found) | 40 |

##### Stage 20: CLEANUP -- 14-Rule Data Quality
Removes junk contacts: duplicates, short names (<3 chars), UI artifacts ("View Bio", "Read More", "Subscribe", "Menu"), navigation strings, placeholder text. Sorts by authority score. Caps at `maxContactsPerCompany` with decision makers retained first.

##### Stage 21: QUALITY GATE -- Configurable Thresholds
Filters contacts below your configured `minLeadScore` and `minConfidenceScore`. Tags every contact with data freshness: `verified`, `crawled`, `linkedin_only`, `predicted_only`, `search_derived`. Separates filtered contacts for optional review.

##### Stage 22: METRICS -- Pipeline Analytics
Computes extraction rates, verification tier distributions, tech stack distribution, proxy health, stage-level timing, per-domain quality metrics, and cache hit rates.

##### Stage 23: EXPORT -- Multi-Format Output
- **Apify Dataset** -- Browsable, downloadable as JSON/CSV/Excel, accessible via API
- **CSV** (`output.csv`) -- UTF-8 with BOM for Excel compatibility
- **Excel** (`output.xlsx`) -- 5-sheet workbook: Contacts, Companies, Locations, High_Priority, Audit
- **JSON Lines** (`output.jsonl`) -- One JSON object per line for streaming ingestion (BigQuery, data pipelines)

##### Stage 24: WEBHOOK -- Real-Time Delivery
HTTP POST to your endpoint on pipeline completion:
- Summary stats (companies, contacts, decision makers, verified emails)
- Optional full results payload
- 3x exponential backoff retry (5s, 25s, 125s)
- Dead letter queue to KeyValueStore on failure

---

### Anti-Blocking Technology

This pipeline uses enterprise-grade anti-detection to maximize extraction rates:

| Feature | How It Works |
|---------|-------------|
| **Smart Retry Escalation** | HTTP -> Browser -> Stealth -> Residential proxy (never same method twice) |
| **Adaptive Concurrency** | Auto-scales 4-32 workers based on success rate and response times |
| **Playwright Stealth** | navigator.webdriver spoofing, timezone/locale/device randomization |
| **Browser State Isolation** | Reset cookies, cache, localStorage every 25 requests |
| **Resource Blocking** | Block third-party trackers only, preserve first-party JS/XHR for data extraction |
| **Domain Rate Limiting** | Per-domain circuit breaker with failure threshold and recovery timeout |
| **Proxy Rotation** | Residential for corporate sites, datacenter for static, mobile fallback for CAPTCHA |
| **Sitemap-First Crawl** | Parse sitemap.xml first, prioritize contact/team/leadership pages (Tier 1-4 system) |
| **Memory Scaling** | Reduce browser tabs at 75% RAM, preserve HTTP workers for throughput |

---

### Input Parameters

#### Data Input (choose one)

| Parameter | Type | Description |
|-----------|------|-------------|
| `inputFile` | File upload | Upload a CSV or Excel file with company names and/or websites |
| `inputUrl` | String | Public URL to a CSV or Excel file |
| `companies` | JSON array | Inline company list as JSON objects |

#### Settings & Pricing

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `maxResults` | Integer | `20` | Max companies to process. Free: 20/run. Beyond: $2/1,000 |
| `workers` | Integer | `16` | Initial parallel workers (adaptive: auto-scales 4-32) |
| `maxContactsPerCompany` | Integer | `20` | Contact cap per company. Decision makers prioritized |

#### Incremental & Quality

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `incrementalMode` | Boolean | `false` | Skip recently-enriched companies (~70% time savings on repeats) |
| `incrementalFreshnessDays` | Integer | `7` | Days before cached data is considered stale (1-90) |
| `minLeadScore` | Integer | `0` | Quality gate: minimum combined_priority to include in export |
| `minConfidenceScore` | Integer | `0` | Quality gate: minimum confidence score to include in export |

#### Webhook & Export

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `webhookUrl` | String | -- | HTTP endpoint to receive results on completion |
| `webhookSendFullResults` | Boolean | `false` | Include full data in webhook (vs summary only) |
| `exportJsonLines` | Boolean | `false` | Also export as .jsonl in KeyValueStore |

#### Pipeline Stage Controls

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `skipGoogleBoost` | Boolean | `false` | Skip 8-step Google Discovery (~30% faster) |
| `skipSocialEnrichment` | Boolean | `false` | Skip 8-platform social discovery (~15% faster) |
| `skipLinkedInDiscovery` | Boolean | `false` | Skip LinkedIn employee discovery |
| `skipSemanticPageDetect` | Boolean | `false` | Skip semantic page classification |
| `skipSearchExpansion` | Boolean | `false` | Skip search expansion + SERP intelligence |
| `skipFileIntelligence` | Boolean | `false` | Skip PDF mining |
| `skipDeepContactExtract` | Boolean | `false` | Skip deep 4-method re-extraction |
| `skipHiddenContactExtract` | Boolean | `false` | Skip JS/JSON payload extraction |
| `skipContactIntelligence` | Boolean | `false` | Skip decision maker mapping |
| `skipCompanyIntel` | Boolean | `false` | Skip company intelligence profile |
| `skipExecutiveCorrelation` | Boolean | `false` | Skip cross-source contact dedup |
| `skipEmailDiscovery` | Boolean | `false` | Skip 5-layer email discovery |
| `skipEmailPrediction` | Boolean | `false` | Skip 8-pattern email prediction |
| `skipVerification` | Boolean | `false` | Skip 6-check email verification |
| `skipQualityGate` | Boolean | `false` | Skip quality gate filtering |

#### Proxy

| Parameter | Type | Description |
|-----------|------|-------------|
| `proxyConfiguration` | Proxy | Apify Proxy config. Residential strongly recommended |

#### Supported Column Names

The actor auto-detects columns using these aliases (case-insensitive):

**Company name:** `company_name`, `company`, `name`, `organisation`, `organization`, `business_name`, `business`, `firm`, `exhibitor`, `exhibitor_name`

**Website:** `website`, `company_website`, `url`, `web`, `domain`, `site`, `official_domain`, `homepage`, `web_address`

Any additional columns in your input file are preserved in the output.

---

### Output Schema

Each row represents one contact (or one company if no contacts were found).

#### Contact Fields

| Field | Type | Description |
|-------|------|-------------|
| `contact_name` | String | Full name |
| `contact_title` | String | Job title |
| `contact_email` | String | Email address |
| `contact_phone` | String | Direct phone number |
| `contact_linkedin` | String | LinkedIn profile URL |
| `extraction_method` | String | How found: `jsonld`, `team_card`, `heuristic`, `linkedin`, `deep_extract`, `hidden_extract`, `file_intel`, `search` |
| `is_decision_maker` | Boolean | Holds a leadership position |
| `persona_type` | String | `Economic Buyer`, `Champion`, `Technical Evaluator`, `Influencer` |
| `seniority` | Integer (0-5) | Title seniority level |
| `lead_score` | Integer (0-100) | Authority score |
| `combined_priority` | Integer (0-100) | Blended: 60% authority + 40% verification |
| `priority_band` | String | `HIGH`, `MEDIUM`, `LOW`, `SKIP` |
| `verification_status` | String | `valid`, `risky`, `invalid`, `unknown` |
| `b2b_tier` | String | `TIER_1_SEND`, `TIER_2_LIKELY_GOOD`, `TIER_3_REVIEW`, `SKIP` |
| `confidence_score` | Integer (0-100) | Email deliverability confidence |
| `correlation_confidence` | Integer (0-100) | Cross-source correlation score |
| `data_freshness` | String | `verified`, `crawled`, `linkedin_only`, `predicted_only`, `search_derived` |
| `auth_score` | Integer (0-100) | Domain authentication score |
| `email_type` | String | `crawled`, `discovered`, `predicted` |

#### Company Fields

| Field | Type | Description |
|-------|------|-------------|
| `company_name` | String | Company name |
| `company_website` | String | Full URL |
| `domain` | String | Normalized domain (e.g., `acme.com`) |
| `company_emails` | String | Semicolon-separated company emails |
| `company_phones` | String | Semicolon-separated phone numbers |
| `linkedin_company` | String | LinkedIn company page |
| `twitter_url` | String | Twitter/X profile |
| `facebook_url`, `instagram_url`, `youtube_url` | String | Social profiles |
| `github_url`, `crunchbase_url`, `glassdoor_url` | String | Business profiles |
| `company_city`, `company_country` | String | Location |
| `tech_stack` | String | Detected technologies |
| `analytics_tools` | String | Detected analytics platforms |
| `company_maturity_score` | Integer (0-100) | Business maturity index |
| `is_saas` | Boolean | SaaS company detection |
| `employee_count_estimate` | String | Estimated employee count |
| `estimated_revenue_m` | Number | Revenue estimate (millions USD) from SERP intel |
| `funding_amount_m` | Number | Funding amount (millions USD) from SERP intel |
| `funding_stage` | String | Funding stage (Seed, Series A-F) |
| `has_mx`, `has_spf`, `has_dkim`, `has_dmarc` | Boolean | DNS validation |
| `domain_score` | Integer (0-100) | Domain trust score |
| `website_quality_score` | Integer (0-100) | Website quality index |
| `pages_crawled` | Integer | Pages successfully scraped |
| `enrichment_status` | String | `done`, `cached`, `failed`, `error` |

---

### Pricing

| Tier | Actor Fee | Results Per Run | Best For |
|------|-----------|----------------|----------|
| **Free** | $0 | Up to 20 | Testing the pipeline |
| **Pay-Per-Event** | $2 per 1,000 results | Unlimited | Production lead generation |

Apify platform compute charges (CPU, memory, proxy) are billed separately per your [Apify subscription](https://apify.com/pricing).

#### Cost Comparison vs Alternatives

| Solution | 1,000 Leads | 10,000 Leads | 100,000 Leads |
|----------|-------------|--------------|---------------|
| **This Actor** | ~$3 | ~$30 | ~$250 |
| Apollo.io | $49/mo (limited) | $99-399/mo | Custom pricing |
| ZoomInfo | $250+/mo | $500+/mo | $1,000+/mo |
| Lusha | $49/mo (limited) | $199/mo | Custom pricing |
| Hunter.io | $49/mo (500 lookups) | $199/mo | Custom pricing |

*Actor costs include both per-event fees and estimated Apify platform charges. All stages enabled, residential proxy.*

#### Cost Estimation by Batch Size

| Scenario | Companies | Actor Fee | Est. Platform | Total |
|----------|-----------|-----------|--------------|-------|
| Quick test | 20 | $0 (free) | ~$0.05 | ~$0.05 |
| Small batch | 100 | $0.16 | ~$0.15 | ~$0.31 |
| Medium batch | 500 | $0.96 | ~$0.50 | ~$1.46 |
| Large batch | 1,000 | $1.96 | ~$1.00 | ~$2.96 |
| Enterprise | 10,000 | $19.96 | ~$10 | ~$30 |

---

### Usage Examples

#### Quick Start (Apify Console)

1. Open the actor page and click **Start**
2. Upload a CSV with a `company_name` column (and optionally `website`)
3. Set `maxResults` to the number of companies to process
4. Click **Start** -- watch progress: "Stage 4/24: Enriching 45/100 companies..."
5. Download from **Dataset** tab (JSON/CSV/Excel) or **KeyValueStore** (multi-sheet Excel, JSON Lines)

#### Inline JSON Input

```json
{
    "companies": [
        {"company_name": "Stripe", "website": "https://stripe.com"},
        {"company_name": "Notion", "website": "https://notion.so"},
        {"company_name": "Linear"},
        {"company_name": "Vercel"},
        {"company_name": "Figma"}
    ],
    "maxResults": 20,
    "workers": 16,
    "maxContactsPerCompany": 15,
    "exportJsonLines": true
}
````

#### With Quality Gate & Webhook

```json
{
    "inputUrl": "https://example.com/target-companies.csv",
    "maxResults": 500,
    "workers": 16,
    "minLeadScore": 50,
    "minConfidenceScore": 40,
    "webhookUrl": "https://hooks.zapier.com/hooks/catch/123456/abcdef/",
    "webhookSendFullResults": true,
    "exportJsonLines": true,
    "proxyConfiguration": {"useApifyProxy": true}
}
```

#### Incremental Mode (Repeat Runs)

```json
{
    "inputUrl": "https://example.com/same-companies.csv",
    "maxResults": 1000,
    "incrementalMode": true,
    "incrementalFreshnessDays": 14,
    "proxyConfiguration": {"useApifyProxy": true}
}
```

#### Python API

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run_input = {
    "inputUrl": "https://example.com/target-companies.csv",
    "maxResults": 500,
    "workers": 16,
    "maxContactsPerCompany": 15,
    "minLeadScore": 50,
    "webhookUrl": "https://your-crm.com/webhook",
    "proxyConfiguration": {"useApifyProxy": True},
}

run = client.actor("leadslogix/leadslogix-pipeline").call(run_input=run_input)

## Get decision makers
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item.get("is_decision_maker") and item.get("b2b_tier") == "TIER_1_SEND":
        print(f"{item['company_name']} | {item['contact_name']} | "
              f"{item['contact_email']} | {item['combined_priority']}")

## Download Excel
kv = client.key_value_store(run["defaultKeyValueStoreId"])
xlsx = kv.get_record("output.xlsx")
with open("leads.xlsx", "wb") as f:
    f.write(xlsx["value"])

## Download JSON Lines
jsonl = kv.get_record("output.jsonl")
with open("leads.jsonl", "wb") as f:
    f.write(jsonl["value"])
```

#### JavaScript API

```javascript
import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("leadslogix/leadslogix-pipeline").call({
    companies: [
        { company_name: "Datadog", website: "https://datadoghq.com" },
        { company_name: "Cloudflare", website: "https://cloudflare.com" },
        { company_name: "Twilio", website: "https://twilio.com" },
    ],
    maxResults: 50,
    workers: 16,
    minLeadScore: 50,
    exportJsonLines: true,
    proxyConfiguration: { useApifyProxy: true },
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();

const tier1DecisionMakers = items.filter(
    (i) => i.is_decision_maker && i.b2b_tier === "TIER_1_SEND"
);
console.log(`Found ${tier1DecisionMakers.length} verified decision makers`);

for (const lead of tier1DecisionMakers) {
    console.log(`${lead.company_name} | ${lead.contact_name} | ${lead.contact_email} | Score: ${lead.combined_priority}`);
}
```

#### cURL

```bash
## Start a run
curl -X POST "https://api.apify.com/v2/acts/leadslogix~leadslogix-pipeline/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "companies": [
      {"company_name": "Figma", "website": "https://figma.com"},
      {"company_name": "Canva", "website": "https://canva.com"}
    ],
    "maxResults": 20,
    "workers": 16,
    "webhookUrl": "https://your-endpoint.com/webhook"
  }'
```

#### Webhook Payload Example

When the pipeline completes, your webhook receives:

```json
{
    "event": "pipeline_complete",
    "pipeline_version": "v7.0",
    "timestamp": "2026-05-19T12:30:00.000Z",
    "summary": {
        "total_companies": 100,
        "total_contacts": 450,
        "high_priority": 85,
        "decision_makers": 120,
        "emails_found": 380,
        "verified_emails": 310
    },
    "audit": {
        "total_companies": 100,
        "elapsed_seconds": 1200,
        "pipeline_version": "v7.0 (24-stage intelligence engine)"
    }
}
```

#### Scheduled Lead Generation

Automate recurring prospecting:

1. Go to the actor page and click **Schedules**
2. Create a schedule (e.g., `0 8 * * 1` for every Monday at 8 AM)
3. Point the input to a URL that updates with new target companies
4. Enable `incrementalMode` to skip previously-enriched companies
5. Set a `webhookUrl` to receive results in your CRM automatically

***

### Integrations

| Platform | Integration Method |
|----------|-------------------|
| **Google Sheets** | Auto-sync via [Apify Google Sheets integration](https://apify.com/integrations/google-sheets) |
| **HubSpot** | Import CRM-ready CSV, or use webhook for real-time sync |
| **Salesforce** | Import enriched contacts via CSV, or connect via Zapier |
| **Pipedrive** | CSV import or webhook integration |
| **Lemlist / Instantly / Smartlead** | Export TIER\_1 emails as CSV for cold email campaigns |
| **Apollo / Outreach / SalesLoft** | Import as prospect sequence |
| **Zapier / Make** | Connect to 5,000+ apps via [Apify Zapier integration](https://apify.com/integrations/zapier) |
| **Webhooks** | Direct HTTP POST to any endpoint on pipeline completion |
| **BigQuery / Snowflake** | Ingest JSON Lines (.jsonl) output for data warehouse |
| **Custom API** | Full Apify REST API for programmatic integration and scheduling |

***

### Performance Benchmarks

| Metric | Typical Result |
|--------|---------------|
| **Companies per hour** | 100-200 (all stages, residential proxy) |
| **Contacts per company** | 3-15 (varies by company size and web presence) |
| **Email discovery rate** | 60-80% of companies yield at least one email |
| **Decision maker rate** | 30-50% of contacts are flagged as decision makers |
| **TIER\_1 email rate** | 40-60% of verified emails are TIER\_1\_SEND |
| **Cache hit rate** | 30-70% on repeat runs (incremental mode) |
| **Adaptive scaling** | 4-32 workers, responds within 10s to rate changes |

#### Estimated Run Times

| Companies | All Stages | Skip Google+Social | Discovery Only |
|-----------|-----------|-------------------|---------------|
| 20 | 3-5 min | 2-3 min | 1-2 min |
| 100 | 15-25 min | 10-15 min | 5-8 min |
| 500 | 1-2 hours | 40-70 min | 20-30 min |
| 1,000 | 3-5 hours | 2-3 hours | 45-60 min |
| 10,000 | 24-48 hours | 16-30 hours | 6-10 hours |

***

### Troubleshooting

#### Common Issues

**Low contact extraction rate**

- Enable all extraction stages (don't skip deep extract, hidden extract, or file intelligence)
- Use residential proxies -- datacenter proxies get blocked by many corporate sites
- Companies with simple brochure websites may genuinely have few published contacts

**0 contacts for a company with a known team page**

- The website may use heavy JavaScript rendering. The pipeline retries with browser mode, but some SPAs require stealth mode.
- Check if the website blocks headless browsers (Cloudflare, Akamai). Residential proxy usually bypasses this.
- Non-English websites may use different page structure patterns.

**Low email verification scores**

- Catch-all domains (accept any address) reduce confidence scores. This is expected behavior.
- Role addresses (info@, sales@) score lower than personal addresses. Filter by `email_type` if needed.
- Some email providers have aggressive rate limiting. The circuit breaker prevents excessive retries.

**Run times are slow**

- Reduce workers if you see high failure rates (adaptive concurrency will also do this automatically)
- Skip Google Boost and Social Enrichment for ~40% faster runs
- Use incremental mode for repeat runs to skip cached companies
- Consider splitting very large lists (10K+) into batches

**Webhook not receiving data**

- Verify your endpoint accepts POST with JSON content-type
- Check the `webhook_dead_letter` key in KeyValueStore for failed delivery details
- The webhook retries 3x with exponential backoff (5s, 25s, 125s) before giving up

**Memory errors on large runs**

- The pipeline auto-scales down browser tabs at 75% RAM usage
- For 5,000+ companies, use 8-16 workers (not 32) to manage memory
- Skip file intelligence (PDF parsing) to reduce memory pressure

***

### Limitations

- **Email verification is DNS-based, not SMTP-based.** It confirms the domain accepts mail but does not verify individual mailbox existence. For maximum accuracy on cold campaigns, run TIER\_2 emails through an additional SMTP verification service.
- **Websites behind login walls** or with aggressive anti-bot measures may return limited contacts.
- **Non-English websites** (Korean, Chinese, Japanese, Arabic) may have lower extraction rates due to different page structures and email conventions.
- **LinkedIn discovery** uses search engines, not direct LinkedIn scraping. Results depend on LinkedIn profile visibility in search engine indexes.
- **SERP intelligence** (revenue, funding) is extracted from search snippets with regex and may not be available or accurate for all companies.
- **PDF mining** requires the PyMuPDF library. Complex or scanned PDFs may not parse correctly.
- **Social enrichment** depends on DuckDuckGo availability. Rate limiting during heavy usage may reduce discovery rates.

***

### Roadmap

- \[ ] SMTP-level mailbox verification (in addition to DNS-based)
- \[ ] LinkedIn profile page parsing for richer contact data
- \[ ] HubSpot and Salesforce native API integration
- \[ ] Google Sheets direct push (no Zapier required)
- \[ ] AI-powered contact relevance scoring using LLMs
- \[ ] Company news monitoring and trigger events
- \[ ] Multi-language extraction optimization (CJK, Arabic, Cyrillic)
- \[ ] Real-time progress webhooks (per-stage, not just completion)

***

### Frequently Asked Questions

**How is this different from Apollo, ZoomInfo, or Lusha?**
Those tools maintain a pre-built database of contacts. This tool scrapes company websites and search engines in real time, finding contacts that static databases miss -- especially at small/mid-size companies, international firms, and recently-hired executives. It's also 10-50x cheaper per lead.

**Do I need API keys?**
No. This tool uses public web data, DNS records, and search engines. No paid API subscriptions required.

**What input formats are supported?**
CSV, Excel (.xlsx, .xls), and inline JSON. Upload directly, provide a URL, or pass data via the API.

**How does incremental mode work?**
When enabled, the pipeline checks its cross-run cache for each company. If a company was successfully enriched within the freshness window (default 7 days), it's skipped. This saves ~70% on repeat runs with the same company list.

**How does the quality gate work?**
Set `minLeadScore` and/or `minConfidenceScore` to filter contacts below your threshold. Contacts that don't pass are excluded from the export but tracked in pipeline metrics. Set both to 0 to export everything.

**What does the webhook send?**
By default, a summary payload with totals (companies, contacts, decision makers, verified emails). Enable `webhookSendFullResults` to include the full dataset in the POST body.

**How accurate is the email verification?**
TIER\_1\_SEND emails typically have <5% bounce rate in production cold email campaigns. The pipeline checks MX, SPF, DKIM, DMARC, catch-all, disposable, and role addresses. It does not perform SMTP-level mailbox verification.

**Can I use this for a single company?**
Yes. Use inline JSON with one company and `maxResults: 1`. The API supports synchronous runs.

**Does this work for non-English companies?**
Yes, but extraction rates are typically 30-50% lower for CJK (Chinese/Japanese/Korean) and Arabic websites due to different page structures and email conventions.

**What proxy should I use?**
Residential proxies give the best results. Datacenter proxies work for most sites but corporate sites may block them. Running without proxy is not recommended for batches over 20 companies.

**Can I skip stages to save time?**
Yes. Toggle any of the 14 skip parameters. Skipping Google Boost + Social Enrichment saves ~40% runtime. The pipeline automatically adjusts downstream stages.

**What's the maximum batch size?**
Technically unlimited with `maxResults` up to 100,000. For runs over 5,000 companies, we recommend 8-16 workers with residential proxy and incremental mode enabled.

***

### Changelog

#### v7.0 (2026-05-19)

- **Quality Gate Engine** (Stage 21): configurable lead score and confidence thresholds, data freshness tracking
- **Webhook Dispatcher** (Stage 24): HTTP POST with 3x exponential backoff, dead letter queue
- **Incremental Delta Mode**: skip recently-enriched companies with configurable freshness window
- **Fuzzy Dedup**: Levenshtein edit distance name matching in executive correlation
- **Cross-Source Preference**: crawled contacts preferred over predicted in merge conflicts
- **JSON Lines Export**: streaming .jsonl output for data pipeline ingestion
- New fields: `data_freshness`, `passed_quality_gate`
- 8 new input parameters for quality, webhook, incremental, and export control

#### v6.0 (2026-05-19)

- **SERP Intelligence**: revenue, funding, employee count, acquisition signals from search snippets
- **File Intelligence** (Stage 10): PDF mining for contacts, org charts, emails
- **Executive Correlation Engine** (Stage 15): cross-source contact dedup with unified profiles
- **Adaptive Concurrency**: dynamic 4-32 worker scaling
- **Shared Cache Layer**: cross-run KV store cache for DNS, domains, enrichment

#### v5.0 (2026-05-18)

- Semantic Page Detection, Search Expansion Matrix, Hidden Contact Extraction
- Company Intelligence Profile (tech stack, maturity scoring, SaaS detection)
- Structured Pipeline Metrics, Improved Proxy Routing

#### v4.0 (2026-05-17)

- Deep Contact Extraction (4-method second pass)
- Contact Intelligence Engine (decision maker mapping, persona classification)

#### v3.5 (2026-05-16)

- URL Intelligence, Sitemap-First Crawl, Smart Retry Escalation
- Stealth Fingerprinting, Adaptive Memory Scaling, Queue Segmentation

#### v2.0 (2026-05-15)

- Merged 9 actors into unified 12-stage pipeline
- 4-method extraction, 5-layer email discovery, social enrichment

#### v1.0 (2026-05-08)

- Initial release: 8-stage pipeline with Playwright enrichment

# Actor input Schema

## `inputFile` (type: `string`):

Upload a CSV or Excel file with company names and/or websites. Auto-detects column names (30+ aliases supported).

## `inputUrl` (type: `string`):

URL to a publicly accessible CSV or Excel file.

## `companies` (type: `array`):

Provide companies as JSON array.

## `maxResults` (type: `integer`):

Maximum companies to process. Free tier: 20 results per run. Beyond that: $2 per 1,000 results (pay-per-event). Platform compute charges billed separately by Apify.

## `workers` (type: `integer`):

Initial number of concurrent async workers. Adaptive concurrency auto-scales between 4-32 based on success rate and response times. Uses HTTP-first with shared browser pool (2 browsers, 6 tabs).

## `maxContactsPerCompany` (type: `integer`):

Cap on contacts per company. Decision makers and high-scored contacts prioritized.

## `incrementalMode` (type: `boolean`):

Skip companies enriched within the freshness window (default 7 days). Uses cross-run cache to avoid re-processing. Saves ~70% time on repeat runs with the same company list.

## `incrementalFreshnessDays` (type: `integer`):

How many days before a cached company is considered stale and re-enriched. Only applies when Incremental Mode is enabled.

## `minLeadScore` (type: `integer`):

Quality gate: contacts below this combined\_priority score are filtered out before export. Set to 0 to export all contacts.

## `minConfidenceScore` (type: `integer`):

Quality gate: contacts below this confidence score are filtered out before export. Set to 0 to export all contacts.

## `webhookUrl` (type: `string`):

HTTP endpoint to receive pipeline results on completion. Sends POST with JSON payload containing summary stats and optional full results. Retries 3x with exponential backoff on failure.

## `webhookSendFullResults` (type: `boolean`):

Include complete contact/company data in webhook payload. When disabled, only summary statistics are sent (smaller payload, faster delivery).

## `exportJsonLines` (type: `boolean`):

Also export results as JSON Lines (.jsonl) in KeyValueStore. One JSON object per line — ideal for streaming ingestion, BigQuery, or processing large datasets.

## `skipGoogleBoost` (type: `boolean`):

Skip the 8-step Google Discovery Boost (domain recovery, email/social discovery via search engines, DNS validation).

## `skipSocialEnrichment` (type: `boolean`):

Skip the 8-platform social profile discovery stage (LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor).

## `skipLinkedInDiscovery` (type: `boolean`):

Skip the LinkedIn employee discovery stage. This stage uses multi-query search (DuckDuckGo + Bing fallback) with role-based variations (CEO, CTO, VP, Director, Manager, etc.) to discover decision makers on LinkedIn. Significantly increases contact yield for larger companies.

## `skipSemanticPageDetect` (type: `boolean`):

Skip the semantic page detection stage. This stage analyzes crawled HTML to classify pages as leadership, speaker, board, investor relations, careers, or partner pages.

## `skipSearchExpansion` (type: `boolean`):

Skip the search expansion matrix and SERP intelligence stage. Generates 8-category search queries (employee, executive, email, PDF, hiring, press, conference, investor) and extracts revenue estimates, funding signals, company size, and acquisition news from search snippets.

## `skipFileIntelligence` (type: `boolean`):

Skip the file intelligence stage. Downloads and parses PDFs and documents found during search expansion and crawling. Extracts contacts, org chart data, and emails from document content invisible to HTML-based extraction.

## `skipDeepContactExtract` (type: `boolean`):

Skip the deep 4-method contact re-extraction stage. Runs JSON-LD, team cards, heuristic, and LinkedIn link extraction on all crawled HTML, re-crawls for companies with 0 contacts.

## `skipHiddenContactExtract` (type: `boolean`):

Skip the hidden contact extraction stage. Parses JS state stores (**NEXT\_DATA**, **NUXT**, **INITIAL\_STATE**), inline JSON-LD arrays, and hydration payloads for contact data not visible in rendered HTML.

## `skipContactIntelligence` (type: `boolean`):

Skip the contact intelligence engine. Applies decision maker mapping (seniority 0-5, persona classification), target title matching, authority scoring, and circuit breaker intelligence learning.

## `skipCompanyIntel` (type: `boolean`):

Skip the company intelligence profile stage. Builds tech stack fingerprinting (18+ frameworks), analytics tools detection, employee count estimation, SaaS signals, hiring velocity, and company maturity scoring (0-100).

## `skipExecutiveCorrelation` (type: `boolean`):

Skip the executive correlation engine. Cross-references contacts from all extraction methods (crawl, deep extract, hidden, LinkedIn, search, file intel), merges duplicates by email/name/LinkedIn, and builds unified profiles with composite confidence scores.

## `skipEmailDiscovery` (type: `boolean`):

Skip the 5-layer email discovery stage (DNS/OSINT, website crawl, search engine, PDF, social).

## `skipEmailPrediction` (type: `boolean`):

Skip the email prediction stage (8-pattern generation for contacts missing emails).

## `skipVerification` (type: `boolean`):

Skip the 6-check email verification stage (MX, SPF, DKIM, DMARC, role detection, disposable check).

## `skipQualityGate` (type: `boolean`):

Skip the quality gate stage. When skipped, all contacts are exported regardless of lead score or confidence. Quality gate filtering uses minLeadScore and minConfidenceScore thresholds.

## `proxyConfiguration` (type: `object`):

Apify Proxy for all scraping operations. Residential proxies strongly recommended for best results and avoiding blocks.

## Actor input object example

```json
{
  "companies": [
    {
      "company_name": "IANA",
      "website": "https://iana.org"
    },
    {
      "company_name": "ICANN",
      "website": "https://icann.org"
    }
  ],
  "maxResults": 20,
  "workers": 16,
  "maxContactsPerCompany": 20,
  "incrementalMode": false,
  "incrementalFreshnessDays": 7,
  "minLeadScore": 0,
  "minConfidenceScore": 0,
  "webhookSendFullResults": false,
  "exportJsonLines": false,
  "skipGoogleBoost": false,
  "skipSocialEnrichment": false,
  "skipLinkedInDiscovery": false,
  "skipSemanticPageDetect": false,
  "skipSearchExpansion": false,
  "skipFileIntelligence": false,
  "skipDeepContactExtract": false,
  "skipHiddenContactExtract": false,
  "skipContactIntelligence": false,
  "skipCompanyIntel": false,
  "skipExecutiveCorrelation": false,
  "skipEmailDiscovery": false,
  "skipEmailPrediction": false,
  "skipVerification": false,
  "skipQualityGate": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "companies": [
        {
            "company_name": "IANA",
            "website": "https://iana.org"
        },
        {
            "company_name": "ICANN",
            "website": "https://icann.org"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("painless_tweet/leadslogix-pipeline").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "companies": [
        {
            "company_name": "IANA",
            "website": "https://iana.org",
        },
        {
            "company_name": "ICANN",
            "website": "https://icann.org",
        },
    ] }

# Run the Actor and wait for it to finish
run = client.actor("painless_tweet/leadslogix-pipeline").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "companies": [
    {
      "company_name": "IANA",
      "website": "https://iana.org"
    },
    {
      "company_name": "ICANN",
      "website": "https://icann.org"
    }
  ]
}' |
apify call painless_tweet/leadslogix-pipeline --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=painless_tweet/leadslogix-pipeline",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "B2B Lead Scraper & Email Finder - Decision Makers",
        "description": "Upload a company list, get verified decision maker emails, phones, LinkedIn, and social profiles. 12-stage pipeline: website discovery, contact extraction, email finder, verification, social enrichment, lead scoring, and Excel export. For email marketing, cold outreach, and B2B prospecting.",
        "version": "7.0",
        "x-build-id": "ccMCQVE4bG5ASIfhE"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/painless_tweet~leadslogix-pipeline/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-painless_tweet-leadslogix-pipeline",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/painless_tweet~leadslogix-pipeline/runs": {
            "post": {
                "operationId": "runs-sync-painless_tweet-leadslogix-pipeline",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/painless_tweet~leadslogix-pipeline/run-sync": {
            "post": {
                "operationId": "run-sync-painless_tweet-leadslogix-pipeline",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "inputFile": {
                        "title": "Upload CSV/Excel File",
                        "type": "string",
                        "description": "Upload a CSV or Excel file with company names and/or websites. Auto-detects column names (30+ aliases supported)."
                    },
                    "inputUrl": {
                        "title": "CSV/Excel URL",
                        "type": "string",
                        "description": "URL to a publicly accessible CSV or Excel file."
                    },
                    "companies": {
                        "title": "Company List (inline)",
                        "type": "array",
                        "description": "Provide companies as JSON array.",
                        "items": {
                            "type": "object",
                            "properties": {
                                "company_name": {
                                    "title": "Company Name",
                                    "type": "string",
                                    "description": "Company name"
                                },
                                "website": {
                                    "title": "Website",
                                    "type": "string",
                                    "description": "Company website URL (optional — will be discovered if missing)"
                                }
                            }
                        }
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum companies to process. Free tier: 20 results per run. Beyond that: $2 per 1,000 results (pay-per-event). Platform compute charges billed separately by Apify.",
                        "default": 20
                    },
                    "workers": {
                        "title": "Parallel Workers (Initial)",
                        "minimum": 1,
                        "maximum": 32,
                        "type": "integer",
                        "description": "Initial number of concurrent async workers. Adaptive concurrency auto-scales between 4-32 based on success rate and response times. Uses HTTP-first with shared browser pool (2 browsers, 6 tabs).",
                        "default": 16
                    },
                    "maxContactsPerCompany": {
                        "title": "Max Contacts Per Company",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Cap on contacts per company. Decision makers and high-scored contacts prioritized.",
                        "default": 20
                    },
                    "incrementalMode": {
                        "title": "Incremental Mode (Delta Processing)",
                        "type": "boolean",
                        "description": "Skip companies enriched within the freshness window (default 7 days). Uses cross-run cache to avoid re-processing. Saves ~70% time on repeat runs with the same company list.",
                        "default": false
                    },
                    "incrementalFreshnessDays": {
                        "title": "Incremental Freshness (Days)",
                        "minimum": 1,
                        "maximum": 90,
                        "type": "integer",
                        "description": "How many days before a cached company is considered stale and re-enriched. Only applies when Incremental Mode is enabled.",
                        "default": 7
                    },
                    "minLeadScore": {
                        "title": "Minimum Lead Score",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Quality gate: contacts below this combined_priority score are filtered out before export. Set to 0 to export all contacts.",
                        "default": 0
                    },
                    "minConfidenceScore": {
                        "title": "Minimum Confidence Score",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Quality gate: contacts below this confidence score are filtered out before export. Set to 0 to export all contacts.",
                        "default": 0
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "HTTP endpoint to receive pipeline results on completion. Sends POST with JSON payload containing summary stats and optional full results. Retries 3x with exponential backoff on failure."
                    },
                    "webhookSendFullResults": {
                        "title": "Include Full Results in Webhook",
                        "type": "boolean",
                        "description": "Include complete contact/company data in webhook payload. When disabled, only summary statistics are sent (smaller payload, faster delivery).",
                        "default": false
                    },
                    "exportJsonLines": {
                        "title": "Export JSON Lines",
                        "type": "boolean",
                        "description": "Also export results as JSON Lines (.jsonl) in KeyValueStore. One JSON object per line — ideal for streaming ingestion, BigQuery, or processing large datasets.",
                        "default": false
                    },
                    "skipGoogleBoost": {
                        "title": "Skip Google Discovery Boost",
                        "type": "boolean",
                        "description": "Skip the 8-step Google Discovery Boost (domain recovery, email/social discovery via search engines, DNS validation).",
                        "default": false
                    },
                    "skipSocialEnrichment": {
                        "title": "Skip Social Enrichment",
                        "type": "boolean",
                        "description": "Skip the 8-platform social profile discovery stage (LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor).",
                        "default": false
                    },
                    "skipLinkedInDiscovery": {
                        "title": "Skip LinkedIn Employee Discovery",
                        "type": "boolean",
                        "description": "Skip the LinkedIn employee discovery stage. This stage uses multi-query search (DuckDuckGo + Bing fallback) with role-based variations (CEO, CTO, VP, Director, Manager, etc.) to discover decision makers on LinkedIn. Significantly increases contact yield for larger companies.",
                        "default": false
                    },
                    "skipSemanticPageDetect": {
                        "title": "Skip Semantic Page Detection",
                        "type": "boolean",
                        "description": "Skip the semantic page detection stage. This stage analyzes crawled HTML to classify pages as leadership, speaker, board, investor relations, careers, or partner pages.",
                        "default": false
                    },
                    "skipSearchExpansion": {
                        "title": "Skip Search Expansion + SERP Intelligence",
                        "type": "boolean",
                        "description": "Skip the search expansion matrix and SERP intelligence stage. Generates 8-category search queries (employee, executive, email, PDF, hiring, press, conference, investor) and extracts revenue estimates, funding signals, company size, and acquisition news from search snippets.",
                        "default": false
                    },
                    "skipFileIntelligence": {
                        "title": "Skip File Intelligence (PDF Mining)",
                        "type": "boolean",
                        "description": "Skip the file intelligence stage. Downloads and parses PDFs and documents found during search expansion and crawling. Extracts contacts, org chart data, and emails from document content invisible to HTML-based extraction.",
                        "default": false
                    },
                    "skipDeepContactExtract": {
                        "title": "Skip Deep Contact Extraction",
                        "type": "boolean",
                        "description": "Skip the deep 4-method contact re-extraction stage. Runs JSON-LD, team cards, heuristic, and LinkedIn link extraction on all crawled HTML, re-crawls for companies with 0 contacts.",
                        "default": false
                    },
                    "skipHiddenContactExtract": {
                        "title": "Skip Hidden Contact Extraction",
                        "type": "boolean",
                        "description": "Skip the hidden contact extraction stage. Parses JS state stores (__NEXT_DATA__, __NUXT__, __INITIAL_STATE__), inline JSON-LD arrays, and hydration payloads for contact data not visible in rendered HTML.",
                        "default": false
                    },
                    "skipContactIntelligence": {
                        "title": "Skip Contact Intelligence Engine",
                        "type": "boolean",
                        "description": "Skip the contact intelligence engine. Applies decision maker mapping (seniority 0-5, persona classification), target title matching, authority scoring, and circuit breaker intelligence learning.",
                        "default": false
                    },
                    "skipCompanyIntel": {
                        "title": "Skip Company Intelligence Profile",
                        "type": "boolean",
                        "description": "Skip the company intelligence profile stage. Builds tech stack fingerprinting (18+ frameworks), analytics tools detection, employee count estimation, SaaS signals, hiring velocity, and company maturity scoring (0-100).",
                        "default": false
                    },
                    "skipExecutiveCorrelation": {
                        "title": "Skip Executive Correlation Engine",
                        "type": "boolean",
                        "description": "Skip the executive correlation engine. Cross-references contacts from all extraction methods (crawl, deep extract, hidden, LinkedIn, search, file intel), merges duplicates by email/name/LinkedIn, and builds unified profiles with composite confidence scores.",
                        "default": false
                    },
                    "skipEmailDiscovery": {
                        "title": "Skip Email Discovery",
                        "type": "boolean",
                        "description": "Skip the 5-layer email discovery stage (DNS/OSINT, website crawl, search engine, PDF, social).",
                        "default": false
                    },
                    "skipEmailPrediction": {
                        "title": "Skip Email Prediction",
                        "type": "boolean",
                        "description": "Skip the email prediction stage (8-pattern generation for contacts missing emails).",
                        "default": false
                    },
                    "skipVerification": {
                        "title": "Skip Email Verification",
                        "type": "boolean",
                        "description": "Skip the 6-check email verification stage (MX, SPF, DKIM, DMARC, role detection, disposable check).",
                        "default": false
                    },
                    "skipQualityGate": {
                        "title": "Skip Quality Gate",
                        "type": "boolean",
                        "description": "Skip the quality gate stage. When skipped, all contacts are exported regardless of lead score or confidence. Quality gate filtering uses minLeadScore and minConfidenceScore thresholds.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Apify Proxy for all scraping operations. Residential proxies strongly recommended for best results and avoiding blocks."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
