B2B Lead Generation Suite - Find Emails, Score & Qualify Leads avatar

B2B Lead Generation Suite - Find Emails, Score & Qualify Leads

Pricing

from $250.00 / 1,000 lead enricheds

Go to Apify Store
B2B Lead Generation Suite - Find Emails, Score & Qualify Leads

B2B Lead Generation Suite - Find Emails, Score & Qualify Leads

All-in-one B2B lead pipeline. Enter company URLs, get enriched leads with emails, phone numbers, contacts, email patterns, quality scores (0-100), grades, and business signals from a 3-step automated pipeline.

Pricing

from $250.00 / 1,000 lead enricheds

Rating

0.0

(0)

Developer

Ryan Clinton

Ryan Clinton

Maintained by Community

Actor stats

0

Bookmarked

47

Total users

5

Monthly active users

5 days ago

Last modified

Share

B2B Lead Generation Suite

An all-in-one B2B lead pipeline that turns a list of company websites into a send-ready outreach list. The suite runs a 3-step pipeline by default — Website Lead Intelligence (formerly Website Contact Scraper), Email Pattern Finder, B2B Lead Qualifier — and an opt-in 4th step (Bulk Email Verifier) for cold-outreach-grade deliverability decisions. From a single run you get send actions, verified emails, named decision-makers, buying-committee classification, and a 0–100 quality score per domain.

Provide one or more company URLs (e.g., stripe.com, https://buffer.com) and receive a unified dataset where every lead includes:

  • A send actionSEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE per domain
  • A buying committee — decision-makers, influencers, champions, and blockers grouped by role
  • A first-touch opening line stem — generated deterministically from job title + company type
  • A pipeline-value rank — relative priority within the batch (1 = best)
  • A plain-English summary — one-sentence takeaway you can paste into Slack or an email
  • Verified emails, phone numbers, named contacts, social media links, addresses
  • Email pattern + generated team emails (Step 2)
  • 0–100 quality score + letter grade (Step 3)
  • Per-email deliverability decisions (Step 4 — optional)

The entire pipeline runs automatically with no manual intervention between steps.

Which actor should I use?

This suite isn't always the right starting point. If you only need one step, run the dedicated sub-actor and skip orchestration overhead:

You have…You want…Run thisCost
Company URLsSend-ready leads + decisionsWebsite Lead Intelligence (Step 1 standalone)$0.20/domain
Domains + namesPattern-detected emails for given namesEmail Pattern Finder (Step 2 standalone)$0.10/domain
Domains0–100 lead quality score + grade onlyB2B Lead Qualifier (Step 3 standalone)$0.15/lead
Business names or marketing phrases (no URLs)Resolve to deduped website URLsSERP Name Resolver$0.002/query
Company URLsAll of the above merged into one record per domain — contacts, send action, pattern emails, lead scoreB2B Lead Generation Suite (this actor)$0.30–$0.45/lead

Use this suite when you want a single dataset with the full picture per lead. Use a standalone sub-actor when you only need one layer.

Why Use B2B Lead Generation Suite?

Running four actors manually means configuring inputs four times, waiting for each run, downloading intermediate datasets, and writing code to merge results. This actor eliminates that overhead. Configure once, click Start, and get a merged dataset ready for your CRM, outreach tool, or spreadsheet.

The orchestrator also handles data flow between steps intelligently:

  • Step 1 (Website Lead Intelligence) is a send-decision engine. Every domain ships with sendDecision, sendPlan, pipelineValue, firstTouch, buyingCommittee, and plainEnglishSummary — surfaced verbatim in the suite's output, no transformation.
  • Emails discovered by the Contact Scraper are automatically fed into the Pattern Finder as known samples, improving pattern detection accuracy without re-scraping.
  • Contact names from website scraping are passed to the Pattern Finder for email generation, so team members without public emails still get predicted addresses.
  • The Qualifier receives all upstream data (emails, phones, contacts, social links, detected patterns) via a pipelineData parameter, eliminating redundant extraction work.
  • Website scraping is disabled in Step 2 since the Contact Scraper already crawled the sites — only GitHub commit search runs as an additional email source.
  • Error handling is built in — if the Pattern Finder, Qualifier, or Verifier fails on a particular domain, the pipeline continues with the data it has rather than aborting.
  • Step 1 v2.0 inputs pass through. Set goal, preset, confidenceMode, enableProFallback, compareToPrevRun, crmWebhookUrl, autoFilter, or exportFormats at suite level and they forward to the scraper unchanged. Default behaviour matches the scraper's auto preset.

Key Features

  • Four-step pipeline in one click — Contact scraping + send-decision, email pattern detection, lead qualification, and email verification run sequentially without manual handoffs.
  • Send-decision per leadSEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE action with risk level and plain-English reasons. Branch automation on the action enum, never on the prose.
  • Buying-committee classification — Contacts grouped into decisionMakers (CEO/founder/C-suite), influencers (VPs/Directors), champions (Sales/BD/Partnerships — most reachable), and blockers (Legal/Finance/Procurement — email last).
  • Pipeline-value rank — Relative priority within the batch (rankInBatch: 1 = top lead). Use to order outreach.
  • First-touch opening lines — Deterministic opening-sentence stems per lead (angle, hook, line) generated from job title + company type. Not LLM-generated copy.
  • Merged and deduplicated output — Emails, phones, and social links from multiple steps combined into a single record per domain with no duplicates.
  • Lead scoring and grading — Every lead gets a 0–100 quality score and letter grade across five categories: contact reachability, business legitimacy, online presence, website quality, and team transparency.
  • Email pattern detection — Identifies naming conventions like first.last@, flast@, or first@ and generates predicted emails for team members.
  • Email verification (optional) — Step 4 verifies every discovered + pattern-generated email and emits per-email decisions (send / send-monitor / hold / replace / suppress).
  • Scheduled monitoring + change detection — Set compareToPrevRun: true and every domain gets changeFlags[] (NEW_TEAM_HIRE / TIER_UPGRADED / etc.) plus a per-domain delta block.
  • CRM auto-push — Set crmWebhookUrl and Step 1 POSTs each enriched lead directly to HubSpot, Salesforce, Zapier, Make.com, or n8n. No glue code.
  • Outreach-tool CSV exports — Set exportFormats: ["instantly", "smartlead", "apollo"] to drop ready-to-import CSVs into the run's key-value store.
  • Auto-filter — Set autoFilter: "send-now-only" and the dataset only contains green-light leads.
  • Configurable pipeline — Skip the Pattern Finder or Lead Qualifier to save time and cost when you only need contact data.
  • Minimum score + personal-email filtering — Set thresholds to exclude low-quality leads. Filtered domains are excluded from PPE billing.
  • Proxy support — Pass proxy settings through to all pipeline steps for reliable scraping across many domains.
  • Graceful error handling — If an optional step fails, the pipeline continues and outputs whatever data it successfully gathered.

If you only have a list of business names (CRM export, prospect spreadsheet) — or a niche identified by distinctive footer/marketing phrases — leave urls empty and use the knownNames and footerPhrases inputs instead. The suite resolves names + phrases to websites via Google before kicking off the contact-scrape → email-pattern → lead-qualify pipeline, all in one run.

  • knownNames — list of business names. Each is searched on Google as {name} {nameSuffix} and the top organic result is treated as that company's website.
  • footerPhrases — distinctive marketing phrases that identify a niche (e.g. "we buy land in any state"). Each runs as an exact-match Google query and every organic result is collected.
  • nameSuffix — appended to every name query to disambiguate (e.g. "we buy land", "real estate", "plumber"). Strongly recommended — generic names without a suffix can resolve to wrong entities.

The discovery layer calls our Business Name & Phrase to URL Resolver sub-actor ($0.002 per Google query). After resolution, every found domain flows through the standard suite pipeline.

How to Use

  1. Add company URLs -- Enter one or more website URLs or bare domains in the "Website URLs or Domains" field. For example: stripe.com, https://buffer.com, hubspot.com. Each domain produces one enriched lead in the output. Or leave URLs empty and provide knownNames / footerPhrases instead (see section above).

  2. Configure crawl depth -- Set "Max pages per domain" for both the contact scraping step and the lead qualification step. Higher values discover more contacts and signals but increase run time and cost. The default of 5 pages per step works well for most company websites.

  3. Choose pipeline steps -- By default, all three steps run. Check "Skip email pattern detection" or "Skip lead qualification" if you only need basic contact data. Skipping both optional steps makes the run roughly 3x faster.

  4. Set a minimum score -- If lead qualification is enabled, set a minimum score (0-100) to filter out low-quality leads. Set to 0 to include all leads regardless of score.

  5. Configure proxy -- For scraping more than a handful of domains, enable Apify Proxy to avoid rate limiting. The proxy settings are forwarded to all sub-actors automatically.

  6. Run and export -- Click "Start" and wait for the pipeline to complete. Download the dataset as JSON, CSV, or Excel, or access it via the Apify API for integration into your workflow.

Input Parameters

Core pipeline

ParameterTypeRequiredDefaultDescription
urlsString[]Yes*List of company website URLs or bare domains to process
knownNamesString[]No[]Business names; resolved to domains via Google when urls is empty
footerPhrasesString[]No[]Distinctive phrases that identify a niche; each runs as an exact-match Google query
maxPagesPerDomainIntegerNo5Max pages to crawl per site during Step 1 contact scraping (1–20)
maxQualifierPagesPerDomainIntegerNo5Max pages to crawl per site during Step 3 lead qualification (1–15)
minScoreIntegerNo0Minimum Step 3 lead-score threshold; leads below this are excluded
skipEmailPatternFinderBooleanNofalseSkip Step 2 (email pattern detection)
skipLeadQualifierBooleanNofalseSkip Step 3 (lead qualification + scoring)
verifyEmailsBooleanNofalseRun Step 4 (Bulk Email Verifier) — adds per-email send/hold/replace decisions
proxyConfigurationObjectNoApify ProxyProxy settings forwarded to all pipeline steps

*Either urls or knownNames/footerPhrases is required.

Step 1 + Step 2 shared passthrough

These four inputs forward to BOTH Step 1 (Website Lead Intelligence) and Step 2 (Email Pattern Finder). Same enum values, matching intent at each step.

ParameterTypeDefaultDescription
goalStringsub-actor defaultquick-outreach / high-deliverability / max-coverage — sets sensible defaults at every step
autoFilterStringnonesend-now-only / safe-only / max-leads — drops records that don't pass the filter before they hit the dataset OR billing. max-leads is normalized to max-coverage for Step 2
compareToPrevRunBooleanfalseMonitoring mode. Step 1 emits contact-side changeFlags[] + delta block. Step 2 emits pattern-side changeSinceLastRun + driftState + patternStabilityScore.
monitorStateKeyStringautoKey-value store name; auto-derived from input domains when blank

Step 1 — Website Lead Intelligence passthrough

ParameterTypeDefaultDescription
presetStringscraper defaultauto / fast / balanced / maximum — execution depth
confidenceModeStringscraper defaultsafe / balanced / aggressive — risk appetite for which emails ship
deepScanBooleanscraper defaultProbe hidden pages (/imprint, /privacy-policy, etc.)
enableProFallbackBooleanfalseAuto-retry JS-heavy / Cloudflare-protected sites in real browser ($0.35/site)
requirePersonalEmailBooleanfalseDrop domains without a personal email; filtered domains not billed
companyTypesString[][]Filter by classified company type (saas, agency, legal, etc.)
crmWebhookUrlString (secret)HTTPS endpoint that receives one POST per enriched lead
crmFormatStringgeneric-jsongeneric-json / hubspot / salesforce
crmOnlyTierABooleanfalseOnly push tier-A leads (verified personal email + senior contact) to the CRM
exportFormatsString[][]instantly / smartlead / apollo — generates ready-to-import CSVs in the run's key-value store

Step 2 — Email Pattern Finder passthrough

Step 2 already gets discovered emails + contact names from Step 1 automatically. These inputs add extra data sources or change behaviour.

ParameterTypeDefaultDescription
searchWhoisBooleanfalseLook up domain registration data for registrant emails. Best for smaller companies where the owner's email is in the WHOIS record
hunterApiKeyString (secret)Hunter.io API key for additional Step 2 email discovery. Free tier gives 25 searches/month

Step 3 — B2B Lead Qualifier passthrough

Step 3 already gets all upstream data (emails, phones, contacts, social links, detected patterns) from Steps 1+2 via pipelineData automatically. These inputs change how Step 3 scores or surfaces results.

ParameterTypeDefaultDescription
scoringProfileStringdefaultdefault / sales / marketing / recruiting. Adjusts category weights — Sales emphasizes contact reachability + decision makers; Marketing emphasizes online presence + website quality; Recruiting emphasizes team transparency
watchlistNameStringName this run as a separate watchlist. Score history is stored per-watchlist, so you can run the suite as N independent watchlists (e.g. tier-1-prospects, churn-risk-accounts)
qualifierWebhookUrlString (secret)Slack or Discord incoming webhook URL. On run completion, Step 3 posts a rich embed with the top scored leads + a link to the Apify run. Distinct from Step 1's crmWebhookUrl (per-record CRM push)
qualifierCircuitBreakerThresholdInteger0Abort the Step 3 sub-actor if this many consecutive domains fail to fetch (e.g. proxy outage). 0 disables the breaker. Recommended: 5–10 for large batches

Input Examples

Full pipeline with quality filter (most common use case):

{
"urls": ["stripe.com", "hubspot.com", "notion.so", "linear.app", "cal.com"],
"maxPagesPerDomain": 8,
"maxQualifierPagesPerDomain": 5,
"minScore": 50
}

Cold-outreach mode — only ship send-ready leads:

{
"urls": ["stripe.com", "hubspot.com", "linear.app"],
"goal": "high-deliverability",
"autoFilter": "send-now-only",
"verifyEmails": true
}

Maximum coverage — every possible lead including pattern-generated:

{
"urls": ["buffer.com", "zapier.com", "cal.com"],
"goal": "max-coverage",
"confidenceMode": "aggressive"
}

JS-heavy / Cloudflare-protected sites — auto-retry with browser rendering:

{
"urls": ["fancy-spa-site.com", "cloudflare-protected.com"],
"preset": "auto",
"enableProFallback": true
}

Scheduled monitoring — diff against last week's run:

{
"urls": ["acmecorp.com", "globex.io", "initech.com"],
"compareToPrevRun": true,
"monitorStateKey": "us-saas-watchlist-2026"
}

Sales-tuned scoring profile + per-list watchlist history:

{
"urls": ["acmecorp.com", "globex.io", "initech.com"],
"scoringProfile": "sales",
"watchlistName": "tier-1-prospects-q2",
"qualifierWebhookUrl": "https://hooks.slack.com/services/..."
}

Auto-push to HubSpot — hands-off lead routing:

{
"urls": ["stripe.com", "hubspot.com"],
"crmWebhookUrl": "https://api.hubapi.com/crm/v3/objects/contacts?hapikey=YOUR_KEY",
"crmFormat": "hubspot",
"crmOnlyTierA": true
}

Outreach-tool CSV exports — drop straight into Instantly + Smartlead:

{
"urls": ["stripe.com", "hubspot.com", "notion.so"],
"exportFormats": ["instantly", "smartlead"]
}

Contact scraping only (fastest, cheapest):

{
"urls": ["https://example.com", "https://acme.co"],
"maxPagesPerDomain": 10,
"skipEmailPatternFinder": true,
"skipLeadQualifier": true
}

Contacts + patterns, no scoring:

{
"urls": ["buffer.com", "zapier.com"],
"skipLeadQualifier": true
}

Input Tips

  • Bare domains like stripe.com and full URLs like https://stripe.com both work -- the actor normalizes them automatically.
  • For SaaS prospecting, use 5-8 pages per domain to catch team/about pages where contacts are listed.
  • For large enterprise sites, increase maxPagesPerDomain to 15-20 to reach contacts buried in deep navigation.
  • Set minScore to 60+ when feeding results to outreach tools -- this removes placeholder sites and parked domains.
  • Enable proxy when processing more than 10 domains in a single run to avoid rate limiting.

Output Example

Each domain produces one enriched lead record. Here is a representative example with all pipeline steps enabled:

{
"domain": "apify.com",
"url": "https://apify.com",
"emails": ["info@apify.com", "support@apify.com", "jan@apify.com"],
"personalEmails": ["jan@apify.com"],
"genericEmails": ["info@apify.com", "support@apify.com"],
"phones": ["+420 255 000 222"],
"contacts": [
{ "name": "Jan Curn", "title": "CEO & Co-founder", "email": "jan@apify.com" },
{ "name": "Ondra Urban", "title": "CTO & Co-founder" }
],
"socialLinks": {
"twitter": "https://twitter.com/apify",
"linkedin": "https://www.linkedin.com/company/apifytech",
"github": "https://github.com/apify",
"youtube": "https://www.youtube.com/c/Apify"
},
"addresses": ["Vodickova 704/36, 110 00 Prague, Czech Republic"],
"companyMeta": { "name": "Apify", "industry": "Software", "language": "en" },
"companyType": "saas",
"sendDecision": {
"action": "SEND_NOW",
"riskLevel": "low",
"reasons": ["Verified personal email present", "Senior contact identified", "No catch-all flag"]
},
"sendPlan": {
"status": "ready",
"channel": "email-first",
"safeToAutomate": true,
"openingAngle": "product/platform — pitch the developer-tooling angle",
"followUpStrategy": "2 follow-ups, 3 days apart, then mark not interested"
},
"pipelineValue": { "relativeScore": 1.0, "rankInBatch": 1 },
"firstTouch": {
"angle": "product-side",
"hook": "Apify operates a developer platform — partnerships likely exposed to external pipeline",
"line": "Saw Jan's CEO role at Apify — quick idea on the developer-platform / partnerships side"
},
"buyingCommittee": {
"decisionMakers": [{ "name": "Jan Curn", "title": "CEO & Co-founder", "email": "jan@apify.com", "seniority": 100, "reachable": true }],
"influencers": [{ "name": "Ondra Urban", "title": "CTO & Co-founder", "seniority": 100, "reachable": false }],
"champions": [],
"blockers": [],
"size": 2
},
"topContacts": [
{ "name": "Jan Curn", "title": "CEO & Co-founder", "email": "jan@apify.com", "score": 95, "reasons": ["CEO seniority", "Personal email verified"] }
],
"bestContact": { "name": "Jan Curn", "title": "CEO & Co-founder", "email": "jan@apify.com" },
"decision": { "tier": "A", "reason": "Verified personal email + senior contact" },
"leadScore": 92,
"dataQuality": "high",
"isContactable": true,
"contactFormDetected": true,
"catchAllDetected": false,
"domainPurity": 100,
"plainEnglishSummary": "Best person to email at Apify is Jan Curn (CEO). Email is verified and safe — you can reach out now.",
"whyThisLead": ["Founder accessible via personal email", "Active social media presence indicates outbound posture"],
"confidence": { "emailConfidence": 95, "contactConfidence": 90, "overallConfidence": 92, "riskFlags": [] },
"coverage": { "emails": "complete", "contacts": "complete", "phones": "found", "socials": "complete", "addresses": "found", "contactForm": true },
"emailPattern": "first@apify.com",
"emailPatternConfidence": 0.85,
"alternateEmailPatterns": [
{ "pattern": "first.last@apify.com", "confidence": 0.42 },
{ "pattern": "flast@apify.com", "confidence": 0.28 }
],
"generatedEmails": [
{ "name": "Ondra Urban", "email": "ondra@apify.com", "pattern": "first@apify.com", "confidence": 0.85 }
],
"patternAnalysis": {
"confidenceLevel": "high",
"isSendable": true,
"isContactable": true,
"bounceRiskBucket": "low",
"isCatchAll": false,
"mxValid": true,
"mxRecord": ["10 aspmx.l.google.com", "20 alt1.aspmx.l.google.com"],
"sendDecision": {
"action": "SEND_NOW",
"riskLevel": "low",
"reasons": ["6 emails analyzed", "single dominant pattern", "MX valid", "not catch-all"]
},
"recommendedSequence": ["first@apify.com", "first.last@apify.com", "flast@apify.com"],
"emailCulture": "strict-format",
"patternStabilityScore": 1.0,
"decisionSignals": ["high-confidence", "sample-rich", "multi-source", "strict-format", "stable-pattern", "mx-valid"],
"negativeSignals": [],
"plainEnglishSummary": "Pattern is `first@apify.com` (85% confidence, 6 samples). Safe to send to generated emails."
},
"score": 88,
"grade": "A",
"scoreBreakdown": {
"contactReachability": 22,
"businessLegitimacy": 20,
"onlinePresence": 18,
"websiteQuality": 16,
"teamTransparency": 12
},
"signals": [
{ "signal": "Multiple email addresses found", "category": "contactReachability", "points": 10, "detail": "3 emails discovered" },
{ "signal": "Phone number present", "category": "contactReachability", "points": 8, "detail": "+420 255 000 222" },
{ "signal": "Social media profiles found", "category": "onlinePresence", "points": 8, "detail": "4 platforms linked" },
{ "signal": "Team members listed", "category": "teamTransparency", "points": 10, "detail": "2 named contacts with titles" }
],
"address": "Vodickova 704/36, 110 00 Prague, Czech Republic",
"cmsDetected": "Next.js",
"techSignals": ["React", "Next.js", "Google Analytics", "Intercom"],
"industry": "Software / Developer Tools",
"jobCount": 12,
"qualifierAnalysis": {
"summary": "Strong B2B SaaS lead — verified personal contact, full team transparency, hiring actively. Outreach immediately.",
"scoreExplanation": "Tier-A signals: 3 personal emails, 2 named contacts with senior titles, full social presence, modern tech stack, and 12 open roles indicating active growth.",
"confidence": { "score": 92, "level": "high", "components": [{ "name": "signal-breadth", "weight": 0.4, "value": 95 }, { "name": "crawl-depth", "weight": 0.3, "value": 90 }, { "name": "source-integrity", "weight": 0.3, "value": 90 }] },
"recommendedAction": "outreach-immediately",
"previousScore": null,
"scoreChange": null,
"changeFlag": "NEW",
"jsWarning": null,
"botProtection": { "detected": false, "vendor": null },
"dataGaps": [],
"agentContract": { "decision": "qualified-A", "confidence": 92, "nextAction": "outreach-immediately", "costToAct": 0 },
"recordType": "result",
"schemaVersion": "2.0.0",
"eventId": "sha256-abc123...",
"failureType": null,
"shouldOutreach": true,
"decisionSignals": ["grade-a", "high-score", "outreach-immediately", "new", "high-confidence", "has-contact-signals", "has-legitimacy-signals", "has-presence-signals", "has-quality-signals", "has-team-signals", "multi-source", "has-emails", "has-phones", "has-contacts", "rich-socials", "hiring-active", "industry-classified"],
"negativeSignals": [],
"confidenceConflict": null,
"failureContext": null,
"methodology": "Lead score and recommendedAction are heuristic-derived from observable website signals — not produced by a trained model."
},
"verifiedEmails": [
{ "email": "jan@apify.com", "status": "valid", "confidence": 0.92, "decision": "send", "actionId": "send", "failureCategory": null }
],
"topVerifiedEmail": "jan@apify.com",
"topEmailDecision": "send",
"sendableEmailCount": 1,
"pipelineSteps": ["contact-scraper", "email-pattern-finder", "lead-qualifier", "bulk-email-verifier"],
"processedAt": "2026-05-06T14:30:00.000Z"
}

Output Fields

Step 1 — Website Lead Intelligence (send-decision engine)

FieldTypeDescription
domainStringNormalized company domain
urlStringFull URL of the website
emailsString[]All discovered email addresses (Step 1 + Step 3 union)
personalEmailsString[]Personal addresses (not info@/hello@)
genericEmailsString[]Role-based addresses (info@, hello@, contact@)
phonesString[]All discovered phone numbers
contactsObject[]Named contacts with name, title, optional email
socialLinksObjectSocial profile URLs keyed by platform
addressesArrayPhysical addresses (schema.org PostalAddress, JSON-LD, <address> elements)
companyMetaObject|nullCompany name, description, industry, logo, employee count, founding date
companyTypeString|nullClassified type: saas, agency, consulting, legal, accounting, ecommerce, healthcare, real_estate, etc.
sendDecisionObject|null{ action: 'SEND_NOW' / 'VERIFY_FIRST' / 'SKIP' / 'ENRICH_MORE', riskLevel, reasons }. Branch automation on action.
sendPlanObject|null{ status, channel, safeToAutomate, openingAngle, followUpStrategy, ... } — sequence-ready execution plan
pipelineValueObject|null{ relativeScore (0–1), rankInBatch (1 = best), ... } — relative priority within this batch
firstTouchObject|nullOpening-line stem: { angle, hook, line }. Deterministic from job-title + company-type, not LLM copy
decisionObject|nullOutreach readiness tier: { tier: 'A'/'B'/'C', reason }
leadScoreNumber|nullStep 1's own 0–100 score (distinct from Step 3's score)
dataQualityString|nullhigh / medium / low / no-data
bestContactObject|nullHighest-ranked contact (name, title, email, score)
topContactsObject[]Top-3 ranked contacts with reasons; backup options beyond bestContact
buyingCommitteeObject|null{ decisionMakers, influencers, champions, blockers, size }
plainEnglishSummaryString|nullOne-sentence takeaway, paste-ready for Slack/email
whyThisLeadString[]Plain-English intent signals
confidenceObject|null{ emailConfidence, contactConfidence, overallConfidence, riskFlags, components }. components is an explainable breakdown — { emailEvidence, contactEvidence, verificationLift, catchAllPenalty, riskPenalty, multipleSamplesBonus, finalScore } — parallels Step 2's confidenceBreakdown for consistent debugging across the suite
coverageObject|nullPer-signal completeness: emails, contacts, phones, socials, addresses, contactForm
summaryObject|nullFlat scanning block: primaryEmail, primaryContact, title, decision, confidence, leadScore
isContactableBoolean|nullTrue when domain has at least one personal email or bestContact with email
contactFormDetectedBoolean|nullTrue when an inquiry form was found
catchAllDetectedBoolean|nullTrue when domain accepts mail to any address
catchAllImplicationString|nullPlain-English consequence of catch-all flag
domainPurityNumber|null% of emails that match the website's root domain (0–100)
botProtectionObject|nullDetected anti-bot service (cloudflare/datadome/akamai/etc.) with recommendation
failureTypeString|nullno-data / blocked / timeout / js-required / parse-error. Null on success
scrapeErrorString|nullError message if all retries failed
jsWarningString|nullWarning when a JavaScript-heavy site was detected
recommendationString|nullActionable next step (e.g., 'Try deepScan=true', 'Use Pro fallback for JS sites')
recoveryPlanObject|null{ nextBestTool, nextBestActorSlug, method, confidence } for failed/thin records
bounceRiskBucketString|nullExplicit low / medium / high band — filter directly instead of composing from confidence + riskFlags + catchAllDetected. Matches Step 2's same-named field for cohesive multi-step filtering
decisionSignalsString[]Stable, additive-only enum tokens for SQL/agent filters (high-confidence / multi-source / personal-email-found / tier-a / send-now / catch-all / etc.). Distinct from signals[] which is scoring evidence with points
negativeSignalsString[]Concrete reasons this lead might bounce or burn sender reputation. Empty array = no concerns. Distinct from confidence.riskFlags which mixes positive + negative concepts
confidenceConflictObject|null{ exists, reason } when signals disagree (high confidence + catch-all, single-sample inflated confidence, senior contact with no email, etc.)
failureContextObject|null{ confidenceLossReason, retryLikelihood } when extraction failed or confidence is low — would re-running help?
methodologyString|nullDisclosure: scoring is heuristic-derived, not produced by a trained model. Surfaced for AI/agent buyers auditing for hallucination risk
isSendableBoolean|nullConvenience boolean — true when sendDecision.action === 'SEND_NOW'. Filter on this in spreadsheets without parsing the sendDecision object
changeFlagsString[]Stable change codes when monitoring is on (NEW_TEAM_HIRE, TIER_UPGRADED, etc.)
changeSinceLastRunObject|nullPer-domain delta block: addedEmails, removedContacts, leadScoreDelta, decisionTierBefore/After, daysSinceLastSeen
firstSeenAtString|nullISO timestamp — first time domain observed across monitor runs
lastSeenAtString|nullISO timestamp — most recent observation
crmPushResultObject|nullPer-record outcome of CRM auto-push when crmWebhookUrl is set

Step 2 — Email Pattern Finder

FieldTypeDescription
emailPatternString|nullDetected email naming convention (e.g., first.last@domain.com). Null if Step 2 skipped
emailPatternConfidenceNumber|nullPattern confidence from 0 to 1
alternateEmailPatternsObject[]Other plausible patterns with lower confidence — { pattern, confidence }. Useful for cold-email tools that retry on bounce
generatedEmailsObject[]Predicted emails for contacts whose addresses were not publicly found
patternAnalysisObject|nullStep 2's full v2 decision-engine output, namespaced to avoid collision with Step 1's identically-named fields. Null when Step 2 skipped or no record found for this domain.

The patternAnalysis block contains:

Field (under patternAnalysis)TypeDescription
confidenceLevelString|nullBanded label: high (≥ 0.75), medium (≥ 0.5), low (< 0.5)
isSendableBoolean|nullTrue when Step 2's sendDecision.action is SEND_NOW
isContactableBoolean|nullTrue when domain has valid MX AND at least one real or generated email
bounceRiskBucketString|nulllow / medium / high — derived from confidence + catch-all + MX
isCatchAllBoolean|nullTrue if the domain accepts mail to any address (SMTP verification unreliable)
mxValidBoolean|nullTrue if the domain has valid MX records
mxRecordString[]DNS MX records sorted by priority
sendDecisionObject|null{ action, riskLevel, reasons } — Step 2's own action enum (different scope from Step 1's: "trust the pattern?" vs Step 1's "email this domain?")
recoveryPlanObject|nullWhen pattern detection fails: next-best Apify actor to chain into
confidenceBreakdownObject|nullExplainable components: samplesContribution, sourceDiversity, patternConsistency, catchAllPenalty, temporalStability, finalScore
recommendedSequenceArrayRanked list of pattern templates to try in order — primary first, alternates by domain match strength
recommendedSequenceWithScoresArraySame as recommendedSequence with per-pattern scores attached
emailCultureString|nullstrict-format (single dominant pattern, ≥85%), loose (multiple competing, <60%), or mixed
patternStabilityScoreNumber|null0..1 weighted-recency score across this domain's run history. Computed only when compareToPrevRun is on
catchAllStrategyObject|nullNon-null only on catch-all domains. Provides rankedPatterns + recommendedSendOrder + rationale + coverage hint
decisionSignalsString[]Stable, additive-only enum tokens summarising why the decision landed where it did (high-confidence / sample-rich / multi-source / stable-pattern / volatile-pattern / strict-format / catch-all / no-mx / single-source / etc.)
negativeSignalsArrayConcrete reasons this record might bounce or burn sender reputation
confidenceConflictObject|nullSurfaces when signals disagree (high pattern confidence + low stability, single-sample high confidence on a catch-all, etc.)
failureContextObject|nullWhen confidence is low or pattern detection fails: confidenceLossReason + retryLikelihood
sequenceStrategyObject|nullHow to use recommendedSequencesingle-shot / fallback / progressive
driftStateObject|nullCross-run drift summary: status (stable/emerging/unstable/unknown), volatilityScore, lastChangeType
plainEnglishSummaryString|nullStep 2's one-line Slack-ready summary (distinct from Step 1's plainEnglishSummary)
methodologyString|nullDisclosure: pattern is heuristic-derived, not produced by a trained model
failureTypeString|nullCategorised failure reason from Step 2 (distinct from Step 1's failureType)
dataQualityString|nullStep 2's reliability indicator: high (5+ emails), medium (2–4), low (1), no-data
jsWarningString|nullNon-null when company website appears to be JS-rendered SPA AND contributed 0 emails to Step 2
blockedDetectedBoolean|nullTrue when company website returned anti-bot block markers AND contributed 0 emails to Step 2
changeSinceLastRunObject|nullStep 2's per-domain delta block when monitoring is on (PATTERN_CHANGED, NEW_EMAILS_FOUND, CATCH_ALL_FLIPPED_ON, etc.)

Step 3 — B2B Lead Qualifier

FieldTypeDescription
scoreNumber|nullLead quality score 0–100. Null if Step 3 skipped
gradeString|nullLetter grade: A (90–100), B (75–89), C (60–74), D (40–59), F (0–39)
scoreBreakdownObject|nullPoints per category: contactReachability (30), businessLegitimacy (25), onlinePresence (20), websiteQuality (15), teamTransparency (10)
signalsObject[]Individual scoring signals with signal, category, points, detail
addressString|nullSingle physical business address (Step 3's extraction). See addresses[] for Step 1's array
cmsDetectedString|nullDetected CMS or framework (WordPress, Shopify, Next.js, etc.)
techSignalsString[]Technologies and tools detected on the website
industryString|nullDetected industry classification
jobCountNumber|nullNumber of open roles found on /careers, /jobs pages — hiring-velocity signal
qualifierAnalysisObject|nullStep 3's full v2 decision-engine output, namespaced to avoid collision with Step 1's identically-named fields. Null when Step 3 skipped or no record found for this domain.

The qualifierAnalysis block contains:

Field (under qualifierAnalysis)TypeDescription
summaryString|nullPlain-English one-line summary (≤280 chars), LLM/CRM-friendly
scoreExplanationString|nullPlain-English explanation of why this lead got its score
confidenceObject|null{ score, level: 'high' / 'medium' / 'low' / 'very-low', components[] } — captures signal breadth, crawl depth, source integrity. Distinct shape from Step 1's confidence (different scope)
recommendedActionString|nulloutreach-immediately / add-to-nurture / enrich-then-revisit / manual-review / archive — Step 3's qualifier-axis decision. Different scope from Step 1's sendDecision.action (which is deliverability-axis). Both are useful: read sendDecision for "can I email?", read qualifierAnalysis.recommendedAction for "is this lead worth pursuing?"
previousScoreNumber|nullScore from the previous run (null if first run)
scoreChangeNumber|nullChange from previous score (positive = improved)
changeFlagString|nullNEW / IMPROVED / DECLINED / UNCHANGED. Based on previousScore vs current score with ±5 tolerance
jsWarningString|nullStep 3's JS-warning message. Distinct from Step 1's jsWarning (different scope: Step 3's is signal-extraction completeness, Step 1's is contact-extraction completeness)
botProtectionObject|nullStep 3's bot-protection detection: { detected, vendor }
dataGapsObject[]Step 3's parallel to Step 1/2's recoveryPlan: [{ field, reason, suggestedFix }] — missing fields with reasons + suggested upstream actor to fill the gap. Use as automation routing signal
agentContractObject|nullFlat MCP-ready surface for AI consumers: { decision: 'qualified-A' / 'qualified-B' / 'review' / 'low-priority' / 'reject', confidence, nextAction, costToAct }. AI agents read this directly without traversing the full record
recordTypeString|nullDiscriminator: result for scored leads, error for failure records (error rows are filtered before reaching the merged output)
schemaVersionString|nullOutput schema version (semver). Bumps on shape changes
eventIdString|nullIdempotent canonical id (sha256 of watchlist+domain). Same id across re-runs of the same domain
failureTypeString|nullStep 3's failure enum on error records: transient / auth / rate_limit / not_found / schema_mismatch / bot_blocked / unknown
shouldOutreachBoolean|nullConvenience boolean — true when recommendedAction === 'outreach-immediately'. Filter on this in spreadsheets to grab the qualified-leads row without parsing the action enum
decisionSignalsString[]Step 3's stable enum tokens (parallel to Step 1's decisionSignals): grade-a/b/c/d/f, high-score/medium-score/low-score, outreach-immediately, new/improved/declined/unchanged, high-confidence/medium-confidence, multi-source, bot-protected, js-partial, has-emails/has-phones/has-contacts, hiring-active, gap-emails, etc.
negativeSignalsString[]Step 3's negatives-only array (parallel to Step 1's negativeSignals): concrete reasons this lead might not convert. Empty array = no concerns
confidenceConflictObject|nullStep 3's signal-disagreement surface (parallel to Step 1's): { exists, reason } when score and confidence disagree
failureContextObject|nullStep 3's structured failure context (parallel to Step 1's): { confidenceLossReason, retryLikelihood }
methodologyString|nullStep 3's heuristic-derivation disclosure (parallel to Step 1's methodology)

Step 4 — Bulk Email Verifier (when verifyEmails: true)

FieldTypeDescription
verifiedEmailsObject[]Per-email verification: email, status, confidence, decision, actionId, failureCategory
topVerifiedEmailString|nullHighest-confidence email graded send or send-with-monitoring
topEmailDecisionString|nullVerifier decision for topVerifiedEmail
sendableEmailCountNumberCount of emails graded send/send-with-monitoring

When Step 4 is OFF and Step 1 ran with preset: auto (or any preset that includes verification), verifiedEmails is populated from Step 1's basic verification (status + confidence only; decision/actionId/failureCategory will be null).

Metadata

FieldTypeDescription
pipelineStepsString[]Which steps completed: contact-scraper, email-pattern-finder, lead-qualifier, bulk-email-verifier
processedAtStringISO 8601 timestamp when the lead was processed

Programmatic Access (API)

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/b2b-lead-gen-suite").call(run_input={
"urls": ["stripe.com", "hubspot.com", "notion.so"],
"maxPagesPerDomain": 8,
"minScore": 50,
})
for lead in client.dataset(run["defaultDatasetId"]).iterate_items():
grade = lead.get("grade", "N/A")
score = lead.get("score", "N/A")
emails = ", ".join(lead.get("emails", []))
print(f'{lead["domain"]} [{grade} {score}] — {emails}')

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/b2b-lead-gen-suite").call({
urls: ["stripe.com", "hubspot.com", "notion.so"],
maxPagesPerDomain: 8,
minScore: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const lead of items) {
console.log(`${lead.domain} [${lead.grade} ${lead.score}] — ${lead.emails.join(", ")}`);
}

cURL

# Start a run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~b2b-lead-gen-suite/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["stripe.com", "hubspot.com"],
"maxPagesPerDomain": 8,
"minScore": 50
}'
# Fetch results (use defaultDatasetId from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How It Works — Pipeline Architecture

The B2B Lead Generation Suite runs a 3-step sequential pipeline by default, plus an opt-in 4th verification step. Each step calls a dedicated Apify actor via Actor.call() and waits for it to complete before starting the next step. Steps 2 and 3 are skippable; Step 4 is off by default.

┌──────────────────────────────────────┐
│ B2B Lead Generation Suite │
(Orchestrator)
└──────────┬───────────────────────────┘
┌──────────▼───────────────────────────┐
Step 1 (Required) │ Website Lead Intelligence │
$0.20 per domain with contact data │
$0 for filtered or empty domains │
│ • Extracts emails, phones, contacts │
│ • Classifies buying committee │
(decisionMakers/influencers/ │
│ champions/blockers)
│ • Emits sendDecision per domain │
(SEND_NOW / VERIFY_FIRST / │
│ SKIP / ENRICH_MORE)
│ • Generates first-touch line stem │
│ • Ranks pipelineValue within batch │
│ • Optional: Pro fallback for JS sites │
($0.35/site, only when needed)
│ • Optional: monitoring + change │
│ detection (changeFlags[])
│ • Optional: CRM auto-push │
(HubSpot / Salesforce / Make / │
│ Zapier / n8n)
└──────────┬───────────────────────────┘
│ emails + contact names
┌──────────▼───────────────────────────┐
Step 2 (Optional) │ Email Pattern Finder │
$0.10 per domain analyzed │
│ • Receives Step 1 emails as samples │
│ • Website scraping DISABLED │
(already scraped in Step 1)
│ • GitHub commit search ENABLED │
│ • Optional: WHOIS / RDAP search │
(searchWhois)
│ • Optional: Hunter.io API │
(hunterApiKey)
│ • Detects naming convention │
│ • Emits sendDecision per domain │
(own decision engine, distinct │
│ from Step 1's)
│ • bounceRiskBucket + emailCulture │
│ + recommendedSequence │
│ • catchAllStrategy on catch-all │
│ domains (turns dead-end into │
│ send sequence)
│ • Generates predicted emails for
│ contacts without addresses │
└──────────┬───────────────────────────┘
│ all upstream data
┌──────────▼───────────────────────────┐
Step 3 (Optional) │ B2B Lead Qualifier │
$0.15 per lead qualified │
│ • Receives emails, phones, contacts, │
│ social links, and pattern data via │
│ pipelineData parameter │
│ • Crawls website for quality signals │
│ • Scores 0-100 across 5 categories │
│ • Assigns letter grade A-F │
│ • Emits recommendedAction enum │
(outreach-immediately / │
│ add-to-nurture / │
│ enrich-then-revisit / │
│ manual-review / archive)
│ • Cross-run change detection │
(NEW / IMPROVED / DECLINED / │
│ UNCHANGED, score history)
│ • dataGaps[] — missing fields + │
│ next-best actor to fill the gap │
│ • agentContract — flat MCP-ready │
│ surface for AI consumers │
│ • Optional: scoringProfile │
(sales/marketing/recruiting)
│ • Optional: watchlist (per-list │
│ score history)
│ • Optional: Slack/Discord webhook │
(run-completion summary)
└──────────┬───────────────────────────┘
│ all unique emails (discovered + generated)
┌──────────▼───────────────────────────┐
Step 4 (Optional) │ Bulk Email Verifier │
(Outbound Control System)
│ • Verifies every discovered + pattern │
-generated email via SMTP/MX/DNS │
│ • Returns decision per email │
(send / send-monitor / hold / │
│ replace / suppress)
│ • Adds failureAnalysis + │
│ recommendedAction per address │
│ • Mode: enrichment-validation │
└──────────┬───────────────────────────┘
┌──────────▼───────────────────────────┐
│ Merge + Deduplicate + Sort + Filter │
│ • Union emails/phones across steps │
│ • Merge social links (Step 1 priority)
│ • Attach per-email verification + │
│ decision to every lead │
│ • topVerifiedEmail + topEmailDecision │
│ + sendableEmailCount surfaced flat │
│ • Sort by score (highest first)
│ • Remove leads below minScore │
└──────────────────────────────────────┘

Data Flow Between Steps

Step 1 → Step 2 (Contact Scraper → Pattern Finder):

  • Scraped emails are passed as knownEmails with matched contact names for attribution.
  • Contact names are passed as names for email address generation.
  • Website scraping is disabled (searchWebsite: false) to avoid redundant crawling.
  • GitHub commit search stays enabled as an additional email discovery source.

Steps 1+2 → Step 3 (Both → Lead Qualifier):

  • All upstream data is packaged into a pipelineData array indexed by domain.
  • Each entry includes: emails, phones, contacts, social links, detected pattern, and pattern confidence.
  • The qualifier uses this to enrich its scoring without re-extracting data that was already found.

Steps 1+2+3 → Step 4 (All → Bulk Email Verifier):

  • Every unique email across discovered (Step 1), pattern-generated (Step 2), and qualifier-extracted (Step 3) sources is collected and deduplicated by lowercase form.
  • Sent to the verifier in one batch with mode: "enrichment-validation" (deep SMTP, accept catch-all with monitoring, deliverability simulation on).
  • The verifier returns one record per email with status, confidence, decision, recommendedAction, and failureAnalysis.
  • The orchestrator maps each verified email back to its lead by matching on the canonicalised address.

Merge Phase:

  • Emails from Steps 1 and 3 are unioned via a Set for deduplication.
  • Phones from Steps 1 and 3 are unioned via a Set for deduplication.
  • Social links from Step 1 take priority; Step 3 links fill in missing platforms only.
  • Pattern data and scoring data are attached from their respective steps (null if skipped).
  • Verification data attaches as verifiedEmails[] per lead, plus flat topVerifiedEmail / topEmailDecision / sendableEmailCount for cadence-tool branching. The top verified email is the highest-confidence address graded send or send-with-monitoring.
  • Results are sorted by score descending, then filtered by minScore.

Scoring Reference

When Step 3 (Lead Qualifier) runs, each lead is scored across five categories with point caps:

CategoryMax PointsWhat It Measures
Contact Reachability30Email addresses, phone numbers, contact form availability
Business Legitimacy25Physical address, about page, privacy policy, CMS/tech presence
Online Presence20Social media profiles across platforms
Website Quality15SSL, modern CMS, analytics, live chat tools
Team Transparency10Named team members with titles, team/about pages

Grade scale: A (90-100), B (75-89), C (60-74), D (40-59), F (0-39)

How Much Does It Cost?

The B2B Lead Generation Suite is an orchestrator that calls up to four sub-actors. Each sub-actor is billed pay-per-event by Apify directly to your account — the orchestrator itself does not charge a per-lead fee.

Step 1 — Website Lead Intelligence (always runs): $0.20 per domain with contact data. Domains where no contact data is found are not charged. Filtered domains (requirePersonalEmail, minLeadScore, autoFilter) are also not charged. Optional Pro fallback for JS-heavy sites: $0.35/site, only when triggered.

Step 2 — Email Pattern Finder (optional): $0.10 per domain analyzed. Filtered records (when autoFilter excludes them) are not charged.

Step 3 — B2B Lead Qualifier (optional): $0.15 per lead qualified. Filtered records (when minScore excludes them) are not charged.

Step 4 — Bulk Email Verifier (optional, off by default): sub-actor compute, billed against this run.

ConfigurationPer kept lead
Full pipeline with verification (all 4 steps)$0.45 (Steps 1+2+3) + small compute from Step 4
Steps 1+2+3 (skip email verification)$0.45
Steps 1+3 (skip Pattern Finder + Verifier)$0.35
Steps 1+2 (skip Qualifier + Verifier)$0.30
Step 1 only (skip Pattern + Qualifier + Verifier)$0.20

Domains with no contacts found in Step 1, or records filtered out by requirePersonalEmail / minLeadScore / companyTypes / autoFilter, are not billed — you only pay for leads you actually keep. Step 2 only runs on domains that survived Step 1 filtering.

Worked example — 100 URLs through the default 3-step pipeline

Assume 100 domains in, ~70% return contact data after filtering (typical B2B):

StepCharged atDomains chargedCost
Step 1 (Website Lead Intelligence)$0.20 / domain with contact data70$14.00
Step 2 (Email Pattern Finder)$0.10 / domain analyzed70 (Step 1 survivors)$7.00
Step 3 (B2B Lead Qualifier)$0.15 / lead qualified70$10.50
Total$31.50

That's ~$0.32 per kept lead end-to-end. Set verifyEmails: true and Step 4 adds verifier sub-actor compute (no per-event PPE). Set enableProFallback: true and JS-blocked domains add $0.35 each — typically 0–10% of a B2B batch.

The orchestrator itself uses 256 MB of memory and minimal compute. Apify's free tier includes $5 of monthly platform credits.

Tips

  1. Start with a small batch when testing a new set of domains. Run 3–5 URLs through the full pipeline first to verify the output quality before processing hundreds.

  2. Use goal instead of preset + confidenceMode separately. Set goal: "quick-outreach" / "high-deliverability" / "max-coverage" and the suite picks sensible defaults for both Step 1 dials. Manual settings still override.

  3. Branch automation on sendDecision.action, not the prose. The action enum (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE) is a stable contract. The plain-English reasons[] and plainEnglishSummary are for humans — never parse them.

  4. Set autoFilter: "send-now-only" and the dataset only contains green-light leads. Drops the entire SKIP / ENRICH_MORE pile so your downstream tools don't have to filter again.

  5. Use the minimum score filter to focus on high-quality leads. Setting minScore to 50 or higher eliminates domains with thin contact information or low business legitimacy signals.

  6. Skip the qualifier for speed when you already know the companies are legitimate (e.g., a curated list from LinkedIn Sales Navigator). Step 1's decision.tier (A/B/C) is already an outreach-readiness signal — Step 3's score adds website-quality on top.

  7. Increase pages per domain for large enterprise websites where contacts may be buried deep. Setting maxPagesPerDomain to 10–15 finds more email addresses on sites with complex navigation structures.

  8. Schedule weekly runs with compareToPrevRun: true to turn the suite into a self-maintaining lead database. New hires, departures, and tier upgrades surface as changeFlags[] automatically.

  9. Push directly to your CRM with crmWebhookUrl to skip the manual export step. Step 1 POSTs each enriched lead in HubSpot/Salesforce field shape (or generic JSON for Make/Zapier/n8n). Add crmOnlyTierA: true to keep only ready-to-email leads in your CRM.

  10. Generate outreach-tool CSVs in one step. Set exportFormats: ["instantly", "smartlead", "apollo"] and Step 1 writes ready-to-import CSVs to the run's key-value store. Download from the Storage tab and drop straight into your sequence.

Limitations

  • HTML-only by default. Steps 1 and 3 use CheerioCrawler (HTML parsing without a browser). For JavaScript-heavy or Cloudflare/DataDome/Akamai-protected sites, set enableProFallback: true — Step 1 auto-retries those domains via Website Contact Scraper Pro (real-browser rendering, $0.35/site, only triggered when JS is detected AND no contacts were found on the first pass).
  • Email verification is opt-in. Set verifyEmails: true to enable Step 4 (Bulk Email Verifier — Outbound Control System) which validates every discovered + pattern-generated email and attaches a decision (send / send-monitor / hold / replace / suppress), recommendedAction, and failureCategory per address. Default is off to keep cost / runtime predictable for users who only want raw lead data. Note: Step 1's preset: auto already runs basic verification (status + confidence per email) — Step 4 adds the richer routing decisions on top.
  • Sequential pipeline. Steps run one after another, not in parallel within the suite. Step 1 itself processes domains in parallel internally. The default 3-step pipeline takes 30–90 seconds per domain; adding Step 4 (verification) extends this to 30–120 seconds. Large batches (500+ domains) may approach timeout limits.
  • Sub-actor dependency. This actor calls four Apify actors by name. If any sub-actor is temporarily unavailable or returns unexpected output, that step may fail (optional steps fail gracefully; the Contact Scraper is required).
  • Pattern detection needs samples. Email pattern detection accuracy depends on how many sample emails are found in Step 1. Domains with zero or one discovered email produce low-confidence or no pattern results.
  • Scoring is deterministic. Lead scores are based on observable website signals (presence of emails, social links, team pages, etc.), not AI analysis. The score reflects data availability, not company quality.
  • Monitoring requires opt-in baseline. Set compareToPrevRun: true and the first run establishes a baseline — every domain is flagged NEW_DOMAIN. From the second run on, deltas surface as changeFlags[] and changeSinceLastRun. Pair with Apify Schedules for daily/weekly monitoring.

Responsible Use

This actor collects publicly visible information from company websites. Follow these guidelines:

  • Comply with applicable laws. Check GDPR, CAN-SPAM, CCPA, and local regulations before using collected data for outreach. Presence of contact information on a website does not constitute consent to receive marketing communications.
  • Respect robots.txt and rate limits. The actor uses CheerioCrawler with configurable concurrency and rate limiting. Default settings are conservative, but consider lowering crawl depth for websites that explicitly restrict scraping.
  • Generated emails are predictions. Emails produced by the Pattern Finder are algorithmic guesses based on detected naming conventions. They should be verified before use and should never be used for bulk unsolicited messaging.
  • Do not scrape sensitive sites. Avoid using this tool on government agencies, healthcare providers, educational institutions, or other organizations where automated data collection may violate terms of service or regulations.

FAQ

How long does the full pipeline take per domain? With default settings (5 pages per step), the default 3-step pipeline typically completes in 30–90 seconds per domain. Enabling Step 4 (verification) extends this to 30–120 seconds. Processing runs sequentially through the steps, so skipping optional steps reduces run time proportionally.

Can I process hundreds of domains in one run? Yes. The actor passes the full list of URLs to each sub-actor, which processes them in parallel internally. Very large batches (500+ domains) may approach the default 2-hour timeout. For extremely large lists, consider splitting into batches of 100–200 domains.

What happens if one sub-actor fails? The Contact Scraper is required and will abort the run if it fails entirely. The Email Pattern Finder, Lead Qualifier, and Bulk Email Verifier are optional — if any fails, the pipeline continues and outputs the data from the steps that succeeded. Individual domain failures within a step do not block other domains.

What does Step 1 actually produce? Step 1 (Website Lead Intelligence) is a send-decision engine. Per domain it returns: a sendDecision action enum (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE), a sendPlan (channel, safeToAutomate, follow-up strategy), pipelineValue (relative rank within the batch), firstTouch (opening-line stem), buyingCommittee (decisionMakers/influencers/champions/blockers), plainEnglishSummary, plus the underlying contacts/emails/phones/socials. Branch automation on sendDecision.action, never on the prose.

How is Step 1's decision tier different from Step 3's score and grade? Step 1's decision.tier (A/B/C) measures outreach readiness — verified personal email + senior contact = A. Step 3's score (0–100) and grade (A–F) measure website quality + business legitimacy signals. They're different axes; both are useful. Filter on Step 1 to gate cold outreach, filter on Step 3 to remove parked domains and shells.

Is the email pattern detection accurate? The Email Pattern Finder analyses discovered emails to reverse-engineer the naming convention. Confidence scores above 0.7 are generally reliable. The patternAnalysis.confidenceLevel band (high ≥ 0.75, medium ≥ 0.5, low < 0.5) and bounceRiskBucket give you stable filters. The accuracy depends on how many sample emails were found — more samples mean higher confidence. Generated emails are predictions, not verified addresses.

Why does Step 2 emit its own sendDecision? Isn't that Step 1's job? Both sub-actors emit sendDecision, but they answer different questions. Step 1 asks "should I email this domain right now?" — based on whether a verified personal email + senior contact exist. Step 2 asks "should I trust the detected pattern enough to send to generated emails?" — based on sample count, source diversity, catch-all status, MX validity, and pattern stability. Both useful, different scopes. The suite namespaces Step 2's decision-engine output under patternAnalysis to avoid collision: read sendDecision for Step 1, patternAnalysis.sendDecision for Step 2.

What's recommendedSequence for? When Step 2 detects a pattern but has multiple plausible candidates (e.g., first.last@ 0.6 confidence, flast@ 0.4, first@ 0.3), patternAnalysis.recommendedSequence returns them ranked. Cold-email tools like Instantly and Smartlead can use this for bounce-retry strategies: try the primary first; on bounce, try the next pattern. patternAnalysis.sequenceStrategy tells you how to use it (single-shot / fallback / progressive).

What's catchAllStrategy and when does it appear? On catch-all domains (where SMTP verification accepts every address), patternAnalysis.catchAllStrategy is non-null and provides a ranked send order with rationale instead of just flagging the domain dead. Pair with patternAnalysis.recommendedSequenceWithScores for probabilistic fallback. This turns catch-all domains from "skip" into "actionable send sequence with real risk-adjusted ordering."

Step 3 emits its own recommendedAction. How does it differ from Step 1's sendDecision.action? Different decision axes. Step 1's sendDecision.action (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE) is the deliverability answer: "can I email this domain right now without burning sender reputation?". Step 3's recommendedAction (outreach-immediately / add-to-nurture / enrich-then-revisit / manual-review / archive) is the prioritization answer: "is this lead worth pursuing at all?" — based on website-quality + business-legitimacy signals. Both useful, different scopes. For full automation: filter on sendDecision.action === 'SEND_NOW' AND qualifierAnalysis.recommendedAction === 'outreach-immediately'.

What's scoringProfile and when should I switch from default? The qualifier scores leads across 5 categories with default weights. scoringProfile: 'sales' re-weights toward contact reachability + decision makers (use for outbound sales lists). scoringProfile: 'marketing' re-weights toward online presence + website quality (use for ABM / content syndication targets). scoringProfile: 'recruiting' re-weights toward team transparency + contact info (use when prospecting candidates' employers).

What's dataGaps[] and how is it different from Step 1's recoveryPlan? Both fill the same role: "this record is incomplete, here's how to recover." Step 1's recoveryPlan is a single object pointing at a specific next-best actor. Step 3's qualifierAnalysis.dataGaps[] is an ARRAY of { field, reason, suggestedFix } entries — multiple gaps surfaced individually so automation can branch per gap (e.g., missing email → run Email Pattern Finder; missing phone → run Phone Number Finder; missing social links → run a different actor).

What's qualifierAnalysis.agentContract and how do I use it? A flat MCP-ready surface for AI agents: { decision: 'qualified-A' / 'qualified-B' / 'review' / 'low-priority' / 'reject', confidence: 0-100, nextAction: <RecommendedAction enum>, costToAct: <USD> }. Agents read this directly without traversing score, grade, recommendedAction, scoreBreakdown separately. Pair with Step 1's sendDecision for a complete agent decision surface in two field reads.

How does the per-watchlist scoreChange + changeFlag work? Set watchlistName: "tier-1-prospects" on a scheduled run. Step 3 stores per-domain score history under that watchlist key. On the next run, every record gets previousScore + scoreChange (delta) + changeFlag (NEW / IMPROVED / DECLINED / UNCHANGED, with ±5 tolerance). Use changeFlag === 'IMPROVED' to surface accounts that just got hotter; use changeFlag === 'DECLINED' to flag churn risk. Run separate watchlists (tier-1-prospects vs churn-risk-accounts) to maintain independent histories.

What data is passed between pipeline steps? The orchestrator feeds data forward intelligently. Step 1 emails become "known samples" for Step 2 pattern detection. Contact names from Step 1 become candidates for Step 2 email generation. All upstream data (emails, phones, contacts, social links, patterns) feeds into Step 3 via a pipelineData parameter so the qualifier doesn't re-extract data already found. All unique emails (discovered + generated) feed into Step 4.

What's enableProFallback and when should I use it? By default Step 1 uses CheerioCrawler (HTML parsing, fast, cheap). When a target site is JavaScript-heavy (React/Next.js/Vue) or sits behind Cloudflare/DataDome/Akamai, Cheerio can return empty results. Set enableProFallback: true and Step 1 auto-retries those specific domains via Website Contact Scraper Pro (real-browser rendering, $0.35/site). It only triggers when JS is detected AND no contacts were found — you don't pay $0.35 on every domain.

How does monitoring mode work? Set compareToPrevRun: true and Step 1 stores a per-domain snapshot in a key-value store. The first run establishes the baseline (every domain gets NEW_DOMAIN flag). From the second run on, Step 1 diffs against the prior baseline and emits changeFlags[] (NEW_TEAM_HIRE / TIER_UPGRADED / TEAM_DEPARTURE / etc.) plus a changeSinceLastRun delta block per domain. Pair with Apify Schedules for daily/weekly automation. Re-runs on the same input list use the same baseline automatically; override monitorStateKey if your input list shifts but you want to maintain history.

Can I run just the contact scraper without the other steps? Yes. Set skipEmailPatternFinder: true and skipLeadQualifier: true (Step 4 is already off by default). This gives you Step 1's full send-decision output (sendDecision, buyingCommittee, firstTouch, etc.) at $0.20 per domain with contact data. Alternatively, use Website Lead Intelligence (the Step 1 sub-actor) directly.

Integrations

The B2B Lead Generation Suite works with the full Apify platform ecosystem:

  • Apify API -- Trigger pipeline runs programmatically and retrieve enriched leads as JSON via https://api.apify.com/v2/acts/ryanclinton~b2b-lead-gen-suite/runs. Build automated prospecting workflows that enrich new domains nightly.

  • Zapier -- Connect to 5,000+ apps. Trigger a lead enrichment run when a new company is added to your CRM, then push the scored results back into Salesforce, HubSpot, or Pipedrive automatically.

  • Make (Integromat) -- Build multi-step workflows that take prospect lists from Google Sheets, run them through the pipeline, filter by score, and route qualified leads into outreach sequences.

  • Google Sheets -- Export the enriched dataset directly to Google Sheets. Key fields like domain, emails, score, grade, and contacts map cleanly into spreadsheet columns for team review and collaboration.

  • Webhooks -- Configure a webhook URL to receive enriched leads as soon as the pipeline completes, enabling real-time lead routing into custom applications.

  • Scheduled Runs -- Set up daily or weekly schedules to process new batches of prospect domains automatically. Combine with the Apify dataset API to stream results into your data warehouse.

Build a complete B2B sales intelligence stack by combining this actor with other tools from ryanclinton on the Apify Store:

ActorWhat It DoesHow It Complements This Suite
Website Lead Intelligence (formerly Website Contact Scraper)Send-decision engine: extracts emails + verifies + ranks decision-makers + classifies buying committee + generates first-touch line — SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE per domainStep 1 of this pipeline. Use standalone when you only need send-ready leads without scoring or pattern detection ($0.20/domain with contact data)
Email Pattern FinderDetects email naming convention + emits its own send-decision engine (sendDecision, bounceRiskBucket, catchAllStrategy, recommendedSequence, emailCulture, driftState) — $0.10/domain analyzedStep 2 of this pipeline. Use standalone when you already have emails or want pattern intelligence without Step 1's full contact crawl
B2B Lead QualifierScores 0-100 + A-F grade across 5 weighted categories. Emits recommendedAction enum, cross-run change detection, dataGaps[] routing, agentContract for MCP consumers, scoringProfiles (sales/marketing/recruiting), per-watchlist score history — $0.15/lead qualifiedStep 3 of this pipeline. Use standalone to score pre-existing lead lists or to set up watchlist monitoring with score-change alerts
Phone Number FinderFinds mobile + direct dial numbers, decides who to call (P1–P4 SLA tier), predicts call outcomeRun downstream of this suite to enrich the discovered contacts with personal phone numbers + dialler-ready routing decisions ($0.10/found)
Lead Scoring EngineScores existing leads on ICP fit + intent + economics; outputs decision (qualify / nurture / disqualify)Run downstream to apply ICP-based scoring (different axis from this suite's website-quality grade)
Person Enrichment LookupMulti-source person enrichment (PDL + heuristics) — fills name/title/email/phone gaps per individual contactRun downstream when this suite returns named contacts but missing email/phone
Bulk Email VerifierOutbound Control System — verifies email deliverability AND emits routing decisions (send / send-monitor / hold / replace / suppress) plus SLA tier, automation triggers, and deliverability simulationRun downstream to verify discovered + pattern-generated emails AND get cadence-tool-ready routing primitives in one call
HubSpot Lead PusherPushes leads into HubSpot CRMAuto-create contacts and companies from enriched pipeline output
Company Deep Research AgentGenerates comprehensive company intelligence reportsDeep-dive research on your highest-scoring leads
Google Maps Lead EnricherEnriches Google Maps listings with contact dataCombine local business data with this pipeline for local lead gen
Website Tech Stack DetectorIdentifies frameworks and tools used by a websiteTailor sales pitches based on prospect technology stack
WHOIS Domain LookupDomain registration, registrar, and expiration dataVerify domain age and ownership for lead qualification
Waterfall Contact EnrichmentMulti-source contact enrichment with fallback cascadeSupplement pipeline output with additional contact discovery
Lead Enrichment Pipeline6-step enrichment: email, phone, verify, company, score, CRM pushDeeper enrichment on leads this suite produces — the full Clay alternative
AI Outreach PersonalizerAI-generated personalized cold emails via BYOK OpenAI/AnthropicGenerate outreach copy for scored leads using their company context
Intent Signal TrackerTracks hiring, tech, funding, and content signals per companyPrioritize which companies to run through this pipeline by buying intent
Lead Data Quality AuditorScores email, phone, domain, and completeness quality per leadAudit pipeline output before outreach to filter bad data