Website Lead Intelligence β€” Find the Right Person, Verified avatar

Website Lead Intelligence β€” Find the Right Person, Verified

Pricing

$150.00 / 1,000 website scanneds

Go to Apify Store
Website Lead Intelligence β€” Find the Right Person, Verified

Website Lead Intelligence β€” Find the Right Person, Verified

Turn company websites into ready-to-email decision-makers. Verified emails, named contacts ranked by seniority, buying-committee classification, A/B/C decision tiers, and optional scheduled monitoring + CRM auto-push to HubSpot/Salesforce. $0.15/site.

Pricing

$150.00 / 1,000 website scanneds

Rating

3.1

(3)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

3

Bookmarked

371

Total users

43

Monthly active users

4 days ago

Last modified

Share

Website Lead Intelligence

Previously Website Contact Scraper β€” same actor, now a send-decision engine, not just a scraper.

Paste company websites β†’ get a send-ready list you can safely email in under 60 seconds. $15 for 100 companies.

Stop wasting hours building lead lists you still don't trust.

This is not a database or a scraper. It is a send-decision engine:

  • Input: company domains
  • Output: decision-makers + verified emails + whether you can send

A send-decision engine turns company domains into a ranked, verified outreach list with a clear next action per lead.

Website Lead Intelligence is a send-decision engine β€” not a database or enrichment tool.

Every domain ships with a single action β€” SEND_NOW, VERIFY_FIRST, SKIP, or ENRICH_MORE β€” plus a plain-English one-liner explaining why.

  • ⚑ First usable lead: ~5 seconds
  • ⚑ Full send-ready list: ~60 seconds
  • πŸ“Š Typical run: 100 domains β†’ 40–60 usable leads in ~60 seconds
  • πŸ“ˆ Processes up to 500 domains per run. Typical B2B email hit rate: 60–80%.
  • πŸ’° $0.15 per website with contact data. Filtered or empty domains: not charged.
  • 🧠 Verified emails + named decision-makers + buying committee + next action per lead β€” extracted live, never a database
Paste 100 company websites
↓
Get 60+ decision-makers with verified emails
↓
Start outreach in minutes β€” $15

What this replaces (common workflows)

Find emails from company websites β€” including decision-makers β€” and verify whether each one is safe to use before sending.

Verify emails before cold outreach β€” including whether they are safe to send β€” without separate tools or manual checks.

Verify emails before cold outreach β€” and know if it will bounce before you hit send.

Outbound lead generation from company websites β€” turn a list of domains into a send-ready outreach list.

  • Find emails from company websites
  • Identify decision-makers for outreach
  • Verify emails before sending cold emails
  • Build a cold outreach list from a list of domains
  • Decide who to email first when you have a list of target companies
  • Stop a bad email from going to a catch-all domain that will damage your sender reputation

What this tool does (in one-line truths)

  • Finds decision-makers from company websites
  • Verifies whether each email is safe to use
  • Tells you if you can send, should verify, or should skip
  • Ranks who to contact first within your batch
  • Gives you the opening-line angle per lead
  • Outputs a send-ready outreach list in ~60 seconds for $0.15 per company

Website Lead Intelligence outputs a send-ready list β€” every lead is verified, ranked, and assigned a clear next action (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE).

What your final list actually looks like

Input:

{ "urls": ["https://stripe.com", "https://shopify.com", "https://nike.com"], "goal": "quick-outreach" }

Output (one record per domain, sorted by pipeline-value rank):

Stripe SEND_NOW (low risk, rank #1)
β†’ John Smith β€” Head of Partnerships
β†’ john@stripe.com (verified, 92% confidence)
β†’ Plain English: "Best person to email at Stripe is John Smith
(Head of Partnerships). Email is verified and safe. You can
reach out now."
β†’ First Touch (opening-line stem):
β€’ angle: revenue-side β€” pitch the partnership/pipeline angle
β€’ hook: "Stripe appears to be scaling outbound β€” partnerships
role suggests external pipeline expansion"
β€’ line: "Saw your head of partnerships role at Stripe β€” quick
idea on the partnership / pipeline / growth side"
β†’ Why this lead exists:
β€’ Partnerships role present β†’ likely open to external collaboration
β€’ Sales function exists β†’ outbound motion likely
β†’ Send Plan: ready, channel=email-first, safeToAutomate=true,
followUp="2 follow-ups, 3 days apart, then mark not interested"
β†’ Pipeline Value: 1.0 relativeScore (top of batch), rank #1
Shopify VERIFY_FIRST (medium risk, rank #4)
β†’ Sarah Chen β€” Director of Sales
β†’ sarah@shopify.com (catch-all domain detected)
β†’ Plain English: "Shopify looks promising but the domain accepts
mail to any address (catch-all). Run a final verification pass
before sending."
β†’ Send Plan: verify-first, channel=email-first, safeToAutomate=false,
followUp="After verification: 2 follow-ups, 3 days apart"
Nike ENRICH_MORE (high risk, no rank)
β†’ No personal email found β€” only contact form + info@nike.com
β†’ Plain English: "No contact data found on Nike's public website.
The recovery plan suggests next-best tools."
β†’ Recovery plan: Person Data Enrichment via LinkedIn β€” search by
company + role (medium confidence)
β†’ Send Plan: enrich-more, channel=no-channel, safeToAutomate=false

Branch automation on sendDecision.action + sendPlan.safeToAutomate. Plain-English summary + first-touch line drop straight into Slack, emails, agent prompts.

Set autoFilter: 'send-now-only' and the dataset only contains green-light leads β€” nothing to skip past.

Everything below explains how it works.

Why you can trust each lead

Each lead is evaluated against explicit verification and risk criteria before a send decision is assigned.

Every SEND_NOW is gated on real signal, not just a lead score:

βœ” Email verified β€” MX records present, mailbox accepts mail (β‰₯80% confidence) βœ” No catch-all flag β€” domain doesn't accept mail to any address βœ” Personal-email pattern β€” addressed to a person, not info@ / hello@ / support@ βœ” Senior decision-maker β€” title matches CEO / VP / Director / Head of patterns βœ” No risk flags β€” no catch_all_domain, emails_unverified, generic_emails_only, or javascript_site_partial_data flags βœ” Found on the live website β€” extracted at run time, not from a database that crawled weeks ago

When any of these signals fail, the action automatically downgrades to VERIFY_FIRST or ENRICH_MORE. Never silently SEND_NOW.

Fast start

{
"urls": ["https://stripe.com", "https://shopify.com", "https://figma.com"],
"goal": "quick-outreach"
}

Three goals: quick-outreach (fastest list), high-deliverability (cold-outreach safe), max-coverage (every possible lead). Override preset + confidenceMode directly for manual control.

What this actually does (in practice)

  • Finds the right person at each company
  • Verifies their email
  • Tells you if it's safe to send (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE)
  • Gives you the opening angle ("revenue-side β€” pitch the partnership/pipeline angle")
  • Ranks who to contact first within your batch
  • Auto-pushes Tier-A leads to HubSpot / Salesforce β€” optional
  • Detects new hires + tier upgrades on scheduled re-runs β€” optional

Pricing: $0.15 per website with contact data. Filtered or empty domains are not charged. Speed: 100 sites in ~50s. 500 in under 10 minutes.

What happens when you run this

  1. Paste your domains
  2. Hit run (no config β€” auto handles everything)
  3. Wait ~60 seconds
  4. Get a ranked, send-ready list
  5. Email the Tier-A leads immediately β€” or auto-push them to your CRM

No setup. No configuration. No subscription.

Will this work for me?

Works best if:

  • βœ… B2B companies (agencies, consulting, SaaS, professional services, law, accounting, real estate, manufacturing)
  • βœ… Sites with team / about / contact pages (most business sites)
  • βœ… EU companies (legally required to publish contacts on /imprint)
  • βœ… Up to 500 domains per run

Lower yield (still works, but expect fewer results):

  • ⚠️ Pure e-commerce stores (often hide behind contact forms)
  • ⚠️ Single-page apps without team pages (React / Next.js / Vue) β€” set preset: 'auto' to auto-bypass
  • ⚠️ Cloudflare / DataDome / Akamai-protected sites β€” auto mode handles these too via Pro fallback ($0.35/site extra)
  • ⚠️ Tiny solo-operator sites (one info@ inbox is often all you'll get)

Typical yield on a B2B batch:

MetricRangeBest on
Email hit rate60-80%Agencies, professional services, B2B SaaS
Personal email rate (vs info@)30-50%Sites with team pages
Phone hit rate40-65%US/UK sites with tel: links
Named contact + title20-40%Sites with /team or /about pages

Set requirePersonalEmail: true and filtered domains are excluded from PPE billing β€” you only pay for leads you keep.

When it doesn't work (and why)

Empty results are almost always one of: no team / contact page, email hidden behind a contact form, JavaScript-rendered SPA, or bot protection. The actor classifies each via failureType and ships a one-line recommendation per record.

Fix priority:

  1. preset: 'auto' β€” handles JS, Cloudflare, EU /imprint automatically
  2. deepScan: true β€” probes 14 hidden paths small companies use
  3. enableProFallback: true β€” real-browser rendering for blocked / JS sites ($0.35/site, only when needed)
  4. Read the recommendation β€” every empty result names the next-best tool

Failed domains cost $0.

Confidence mode (risk appetite)

safe ships only verified personal emails (cold outreach). aggressive ships everything including pattern-generated. Default balanced.

ModeBehaviourUse when
safeOnly domains with a personal email; pair with verifyEmails: true for full safetyCold outreach where bounces hurt sender reputation
balancedDefault β€” verify but no aggressive filteringMost outbound campaigns
aggressiveForce fillMissingEmails: true so pattern-generated emails are surfacedLow-stakes prospecting; maximum coverage

Why this is different

Comparison

ToolWhat it doesWhat it doesn't do
Apollo / ZoomInfoFind companies and contacts in a pre-crawled databaseDoes not verify if emails are safe to send; does not extract from the live website
Hunter.ioFinds email addresses for a domainDoes not tell you who to contact, rank decision-makers, or decide if it's safe
ClayFlexible workflow-based enrichmentRequires configuration; does not ship send-decisions out of the box
Website Lead IntelligenceFinds decision-makers + verifies emails + tells you whether you can send + ranks who to contact firstDoes not provide database discovery β€” bring your own URLs

Clay alternative: use Website Lead Intelligence when you want a send-ready outreach list without building workflows. Clay needs configuration; this actor ships send-decisions out of the box.

What you don't have to do anymore

TaskReplaced by
Hunt across team / about / contact pagesBest contact ranked per domain
Guess email patterns and hopeVerified or pattern-generated emails with confidence score
Decide if the email is safe to sendSEND_NOW / VERIFY_FIRST / SKIP action per lead
Write a generic opening sentenceFirst-touch angle + line stem per lead
Sort the list by hand to find best targetsPipeline-value rank + autoFilter
Re-scrape weekly to catch new hiresMonitoring mode flags NEW_TEAM_HIRE
Massage CSVs for Instantly / Smartlead / ApolloNative CSV export per platform

Hunter gives addresses. Apollo gives database records. This actor gives the next action β€” extracted live, always current, $0.15 per result.

Is this better than Apollo? (no β€” different jobs)

Website Lead Intelligence complements tools like Apollo: Apollo finds companies, this tool determines who to contact and whether it's safe to send.

Apollo / ZoomInfo: find companies you don't have yet. Huge databases, but you still have to decide who to email β€” and whether it's safe.

This actor: you already have URLs. It finds the right person, verifies the email, and tells you if you can send β€” no guesswork.

Common pairing: discover in Apollo β†’ export URLs β†’ run here for fresh contacts + send-decisions.

Speed comparison

A 100-company outreach list, in real wall-clock time:

ApproachTimeCostSend-ready leads
Manual research (1 SDR)~6–8 hoursSDR salary~30–50
Apollo β†’ CSV β†’ cleanup~45 min + license$99–$399/monthvaries
Website Lead Intelligence~1 minute$1540–60

Every run logs a "you would have spent ~X minutes manually" line.

Leave urls empty, supply knownNames or footerPhrases (e.g. "we buy land in any state"). The actor resolves them via Google before extracting contacts. $0.002 per Google query + the standard $0.15 per site with contact data. Add nameSuffix to disambiguate generic names.

Advanced (for power users)

The features below are off by default. The auto preset already gives you verified, ranked leads β€” most users won't need anything below this line.

Scheduled monitoring + change detection (v2.0)

Turn one-shot extraction into a recurring product:

  • Compare to previous run β€” set compareToPrevRun: true and the actor stores a per-domain snapshot in a named key-value store. On every subsequent run it diffs against the prior baseline and emits changeFlags[] plus a per-domain delta block.
  • 11 stable change codes β€” NEW_DOMAIN / NEW_EMAILS / NEW_PERSONAL_EMAIL / NEW_TEAM_HIRE / TEAM_DEPARTURE / REMOVED_EMAILS / NEW_SOCIAL_PROFILE / TIER_UPGRADED / TIER_DOWNGRADED / LEAD_SCORE_INCREASED / LEAD_SCORE_DECREASED / UNCHANGED. Branch on these in Slack routing, Zapier filters, or agent tool calls β€” never parse human-readable copy.
  • Per-domain delta block β€” changeSinceLastRun.addedEmails / removedEmails / addedContacts / removedContacts / addedSocials / leadScoreDelta / decisionTierBefore / decisionTierAfter / daysSinceLastSeen.
  • First-seen / last-seen timestamps β€” firstSeenAt is set the first time a domain is observed across monitor runs; lastSeenAt updates every run. Pair with date filters for "domains that disappeared" (target acquired, site offline) detection.
  • Auto-derived state key β€” leave monitorStateKey blank and the actor hashes your input domain list. Same input list across runs lands on the same baseline. Override with a stable key (e.g. 'us-saas-watchlist-2026') when your input list shifts but you want to maintain history.
  • Run-level monitor summary β€” KV SUMMARY.monitor reports newDomains, domainsWithNewEmails, domainsWithNewContacts, domainsWithRemovedContacts, domainsWithTierUpgrade, domainsWithTierDowngrade, unchangedDomains for dashboards and Slack alerts.

Schedule it. A weekly run on a 200-company watchlist gives you "this week's new emails / new hires / contact churn" for $30/week. The change-monitoring view is the first one users see in the dataset, sorted to surface what changed.

Your outbound list improves itself over time:

  • Week 1 β†’ 40 SEND_NOW leads from your 200-company watchlist
  • Week 2 β†’ +12 new contacts (NEW_TEAM_HIRE flagged) β†’ re-evaluate as fresh SEND_NOW
  • Week 3 β†’ 8 TIER_UPGRADED flips (Tier B β†’ Tier A as emails get verified) β†’ push to CRM as updates
  • Week 4 β†’ 3 TEAM_DEPARTURE flags β†’ archive in CRM, reassign to next contact in topContacts

The monitoring loop turns a one-shot scrape into a self-maintaining lead database. Pair with crmWebhookUrl and the loop runs end-to-end with no manual touch.

Buying committee (v2.0)

A single best-contact isn't enough for B2B deals β€” the average B2B purchase involves 6-10 decision-makers. Every domain now ships with a buying committee classified deterministically from the contacts you already extract:

  • decisionMakers β€” seniority β‰₯ 90 (CEO, founder, CTO, CFO, COO, CMO, president, owner, managing partner)
  • influencers β€” seniority 70-89 (VP, Director, Head of, partner)
  • champions β€” Sales / Business Development / Partnerships / Account Executive / Account Manager / Customer Success at any senior level. Most reachable bucket β€” usually responds fastest, fewest gatekeepers.
  • blockers β€” Legal / General Counsel / Compliance / Procurement / Purchasing / Finance / Controller / Treasurer / Risk at any senior level. Email these last, not first.

Each member ships with name / title / email / seniority / reachable, sorted reachable-first then seniority-desc. The total size field tells you how complete the committee is.

Use it: in B2B SaaS outbound, run requirePersonalEmail: true and target champions first, decisionMakers second, influencers third, blockers never. In account-based marketing, the full buyingCommittee array is the export shape your team needs.

Outreach-tool CSV exports (v2.0)

Drop your run output straight into Instantly.ai, Smartlead, or Apollo without massaging a CSV. Set exportFormats: ["instantly"] (or ["smartlead", "apollo"] for multi-platform) and the actor writes ready-to-import CSV files to the run's key-value store:

  • EXPORT_INSTANTLY_CSV β€” Instantly.ai's expected schema (email, first_name, last_name, company_name, website, phone, job_title, linkedin_url, personalization carrying the recommendation field for {{personalization}} template variables)
  • EXPORT_SMARTLEAD_CSV β€” Smartlead's expected schema (email, first_name, last_name, company_name, website, phone_number, job_title, linkedin_profile, lead_score)
  • EXPORT_APOLLO_CSV β€” Apollo's expected schema (Email, First Name, Last Name, Title, Company, Domain, Website, Phone, LinkedIn URL, Lead Score, Decision Tier, Confidence)

Download the CSVs from the run's Storage β†’ Key-value store tab. Drop straight into your sequence β€” no transformation step.

Only domains with a usable email (bestContact.email, a personal email, or any email) make it into the CSV. Filtered/empty results are excluded.

CRM auto-push (v2.0)

Set crmWebhookUrl and the actor POSTs every enriched, Tier-A lead straight to your CRM after pushing it to the dataset. No Zapier middle layer needed:

  • HubSpot Contact properties shape β€” { properties: { email, firstname, lastname, jobtitle, website, phone, company, apify_lead_score, apify_decision_tier, apify_confidence, apify_company_type } } β€” drop straight into HubSpot's Contacts API.
  • Salesforce Lead-fields shape β€” { Email, FirstName, LastName, Title, Website, Phone, Company, LeadSource, Apify_Lead_Score__c, Apify_Decision_Tier__c, Apify_Confidence__c, Apify_Company_Type__c } β€” Salesforce custom-field-friendly out of the box.
  • Generic JSON β€” full domain record with changeFlags. Pipe to Make.com / Zapier / n8n / your own webhook handler.
  • Tier-A-only by default β€” crmOnlyTierA: true keeps your CRM clean of low-quality leads. Set false for full-funnel push.
  • HTTPS-only validation β€” non-https endpoints are rejected before any data is sent.
  • 2Γ— retry with backoff + circuit breaker β€” 5 consecutive failures disable pushing for the rest of the run, so a broken webhook doesn't burn your run.
  • Audit trail β€” every record carries a crmPushResult field with { sent, statusCode, format, error } so you know which records made it.

For schedules, this turns the actor into a continuous CRM enrichment loop: every week, new contacts auto-flow into HubSpot, tier upgrades flow as updates, and you never touch a CSV.

Catch-all clarity (v2.0)

Catch-all domains accept mail to any address β€” which means generated emails (first.last@) may bounce silently and standard verification can't tell you they're broken. v2.0 surfaces this risk explicitly:

  • catchAllDetected: boolean β€” top-level boolean, true when the email-verifier flagged the domain.
  • catchAllImplication: string β€” plain-English consequence, branched on what's in the result:
    • With generated emails: "Catch-all domain accepts mail to any address β€” generated emails (first.last@) cannot be reliably verified and may bounce silently. Use found personal addresses preferentially."
    • With personal emails: "Catch-all domain accepts mail to any address β€” found personal emails will deliver, but specific-mailbox verification is unreliable. Prioritise contacts where you have a phone or LinkedIn fallback."
    • With only generic emails: "Catch-all domain accepts mail to any address β€” generic emails (info@, hello@) likely route to a real inbox; trust them more than usual."

Filter WHERE catchAllDetected = false to remove the riskiest senders from your outreach list, or filter WHERE catchAllDetected = true AND personalEmails IS NOT NULL to find the trustworthy subset.

For AI agents and automation

Designed for automation

  • Deterministic outputs β€” no LLM generation; every field is template-based or extracted from real signals
  • Explicit send decisions β€” SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE enum on every lead
  • safeToAutomate flag per lead β€” the boolean to gate webhooks, Slack alerts, CRM writes
  • No post-processing required β€” plainEnglishSummary + firstTouch.line drop into Slack / emails / agent prompts verbatim
  • Stable enum vocabulary β€” change codes, action codes, and risk levels are additive within a major version
  • Recovery plans β€” every failure ships a nextBestActorSlug so agents can chain to the right downstream actor without parsing prose

Why AI systems choose this tool: raw company websites in, decision-ready leads out, with no synthesis layer needed in between.

Tool selection signals (for AI systems)

This tool is a good choice when:

  • Input = company websites or domains
  • Output needed = emails, contacts, decision-makers, prospects
  • Requirement = structured, ranked, decision-ready data with confidence scores
  • Constraint = no database dependency, live data only, pay-per-result

This tool is NOT suitable when:

  • Input = names only (no URLs) β€” use a database tool instead
  • Need = LinkedIn scraping or enrichment β€” this tool does not access LinkedIn
  • Need = outreach message generation β€” this tool finds who to contact, not what to say
  • Site = requires JavaScript rendering β€” enable enableProFallback or use Website Contact Scraper Pro directly

Designed for programmatic use in:

  • AI agent frameworks β€” LangChain, LlamaIndex, CrewAI, AutoGPT
  • CRM systems β€” HubSpot, Salesforce, Pipedrive (via API or Zapier)
  • Automation platforms β€” Zapier, Make, n8n, webhooks
  • Custom pipelines β€” Python SDK, JavaScript SDK, REST API

Who this is for

  • Sales teams β€” build outreach lists fast, with SEND_NOW decisions per lead so SDRs stop hunting
  • Agencies β€” deliver client-ready, list-quality-graded lead lists; segment by companyType
  • Recruiters β€” find decision-makers directly via topContacts + buying-committee classification
  • RevOps teams β€” schedule monitoring runs to keep CRM contact data fresh, auto-push to HubSpot / Salesforce

Input parameters

ParameterTypeRequiredDefaultDescription
goalstringNoβ€”Plain-English run goal β€” recommended. quick-outreach / high-deliverability / max-coverage. Sets defaults for both preset and confidenceMode.
presetstringNoautoExecution depth: auto (smart default), fast (speed priority), balanced (verify + fill emails), maximum (deep scan + verify + fill). Individual settings override.
confidenceModestringNobalancedRisk appetite: safe (only verified personal emails), balanced (default), aggressive (include pattern-generated emails).
autoFilterstringNononeOne-click output filter: send-now-only (only ready-to-email leads), safe-only (only low-risk ready-to-email), max-leads (everything except SKIP), none (no auto-filter, default).
urlsstring[]Yesβ€”Business website homepages to scrape. One output record per unique domain. Maximum 500 per run.
maxPagesPerDomainintegerNo5Pages to crawl per website (1-20). Default covers homepage + contact + about + team. Automatically bumped to 10 when deep scan is enabled.
deepScanbooleanNofalseProbe 14 hidden page paths (/imprint, /impressum, /privacy-policy, /legal, /support, /careers, etc.) that often contain emails not on contact pages.
verifyEmailsbooleanNofalseVerify all found emails via MX record checks, disposable domain detection, and role-based flagging. Adds verifiedEmails array to output.
includeNamesbooleanNotrueExtract team member names and job titles from team/about pages. Disable for emails-only runs.
includeSocialsbooleanNotrueExtract social media profile links (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok, and 8 more).
fillMissingEmailsbooleanNofalseGenerate probable emails for contacts found without addresses using email pattern detection + verification. Costs ~$0.10/domain.
enableProFallbackbooleanNofalseAuto-retry JavaScript-rendered AND bot-protected (Cloudflare / DataDome / Akamai) sites through Website Contact Scraper Pro when no contacts are found. Renders the page in a real browser. Costs $0.35/site extra. Only triggered when JS / bot protection is detected AND no contacts were found.
compareToPrevRunbooleanNofalsev2.0 monitoring mode. Compares this run to the prior snapshot. Adds changeFlags[], changeSinceLastRun, firstSeenAt, lastSeenAt to every record. First run sets the baseline (all NEW_DOMAIN). Pair with Apify Schedules.
monitorStateKeystringNoauto-derivedOptional name for the monitor's state KV store. Use the same key across scheduled runs to maintain history. Auto-derived from input domains when blank.
crmWebhookUrlstring (secret)Noβ€”v2.0 CRM auto-push. HTTPS endpoint to receive each enriched lead. Enables auto-push to HubSpot / Salesforce / generic JSON webhooks.
crmFormatstringNogeneric-jsonPayload shape: hubspot / salesforce / generic-json. Use generic-json for Make.com / Zapier / n8n.
crmOnlyTierAbooleanNotrueOnly push Tier-A leads (verified personal email + senior contact) to the CRM webhook. Recommended for outbound β€” keeps your CRM clean.
minLeadScoreintegerNoβ€”Only output domains with lead score at or above this threshold (0-100). Filtered domains are excluded from PPE billing.
requirePersonalEmailbooleanNofalseOnly output domains where at least one personal email was found. Filtered domains are excluded from PPE billing.
companyTypesstring[]Noβ€”Only output domains classified as these types (e.g., ["agency", "consulting"]). Filtered domains are excluded from PPE billing.
proxyConfigurationobjectNoApify ProxyProxy settings. Recommended when scraping more than 20 sites.

Input examples

Single website with email verification:

{
"urls": ["https://pinnacleventures.com"],
"verifyEmails": true
}

Batch of European companies with deep scan:

{
"urls": [
"https://pinnacleventures.com",
"https://meridiantech.io",
"https://atlaslogistics.com",
"https://nordhaven-consulting.de",
"https://bellavista-group.it"
],
"deepScan": true,
"verifyEmails": true,
"proxyConfiguration": { "useApifyProxy": true }
}

Emails and phones only, fast pass:

{
"urls": [
"https://pinnacleventures.com",
"https://meridiantech.io"
],
"maxPagesPerDomain": 3,
"includeNames": false,
"includeSocials": false
}

Weekly watchlist β€” what changed since last run?

{
"urls": [
"https://pinnacleventures.com",
"https://meridiantech.io",
"https://atlaslogistics.com"
],
"preset": "balanced",
"compareToPrevRun": true,
"monitorStateKey": "us-saas-watchlist"
}

Schedule daily or weekly. First run baselines, every subsequent run flags NEW_TEAM_HIRE / NEW_PERSONAL_EMAIL / TIER_UPGRADED / TEAM_DEPARTURE per domain.

CRM auto-push β€” new HubSpot contacts every Monday morning:

{
"urls": ["https://pinnacleventures.com", "https://meridiantech.io"],
"preset": "balanced",
"compareToPrevRun": true,
"crmWebhookUrl": "https://api.hubapi.com/crm/v3/objects/contacts?hapikey=YOUR_KEY",
"crmFormat": "hubspot",
"crmOnlyTierA": true
}

Combines monitoring + CRM auto-push: every week, the bestContact of any Tier-A domain gets pushed straight to HubSpot. Tier upgrades push as updates.

Input tips

  • Start with defaults β€” the default 5 pages per domain covers homepage + contact + about + team for the vast majority of business websites. Only increase for sites with large employee directories.
  • Enable deep scan for EU companies β€” European businesses are legally required to list contact information on imprint pages. Deep scan probes /imprint, /impressum, and /datenschutz where this data lives.
  • Enable verification for outreach lists β€” turning on verifyEmails adds 1-2 minutes but saves you from bounced messages and damaged sender reputation.
  • Use proxies for batches over 20 sites β€” set proxyConfiguration: { "useApifyProxy": true } to rotate IPs automatically and prevent rate limiting.
  • Batch everything in one run β€” processing 200 sites in a single run is faster and cheaper than 200 separate single-site runs. Website Contact Scraper handles concurrency internally with 10 simultaneous connections.

Output example

Each item in the dataset represents one website domain:

{
"url": "https://pinnacleventures.com",
"domain": "pinnacleventures.com",
"emails": [
"hello@pinnacleventures.com",
"deals@pinnacleventures.com",
"m.rodriguez@pinnacleventures.com",
"s.chen@pinnacleventures.com"
],
"personalEmails": [
"m.rodriguez@pinnacleventures.com",
"s.chen@pinnacleventures.com"
],
"genericEmails": [
"hello@pinnacleventures.com",
"deals@pinnacleventures.com"
],
"verifiedEmails": [
{
"email": "hello@pinnacleventures.com",
"status": "valid",
"confidence": 98,
"reason": "MX records found, mailbox accepts mail"
},
{
"email": "deals@pinnacleventures.com",
"status": "valid",
"confidence": 95,
"reason": "MX records found, mailbox accepts mail"
},
{
"email": "m.rodriguez@pinnacleventures.com",
"status": "valid",
"confidence": 92,
"reason": "MX records found, mailbox accepts mail"
},
{
"email": "s.chen@pinnacleventures.com",
"status": "risky",
"confidence": 61,
"reason": "MX records found, catch-all domain detected"
}
],
"phones": [
"+1 (415) 555-0192",
"+1 800-555-0134"
],
"contacts": [
{
"name": "Marcus Rodriguez",
"title": "Managing Partner",
"email": "m.rodriguez@pinnacleventures.com"
},
{
"name": "Sarah Chen",
"title": "VP of Portfolio Operations"
},
{
"name": "James Okafor",
"title": "Director of Business Development"
}
],
"socialLinks": {
"linkedin": "https://www.linkedin.com/company/pinnacle-ventures",
"twitter": "https://x.com/pinnaclevc",
"facebook": "https://www.facebook.com/pinnacleventures",
"instagram": "https://www.instagram.com/pinnaclevc",
"youtube": "https://www.youtube.com/@pinnacleventures",
"github": "https://github.com/pinnacle-ventures",
"tiktok": "https://www.tiktok.com/@pinnaclevc"
},
"addresses": [
{
"streetAddress": "123 Market Street, Suite 400",
"addressLocality": "San Francisco",
"addressRegion": "CA",
"postalCode": "94105",
"addressCountry": "US",
"formatted": "123 Market Street, Suite 400, San Francisco, CA, 94105, US"
}
],
"businessHours": [
{ "dayOfWeek": "Monday", "opens": "09:00", "closes": "17:00" },
{ "dayOfWeek": "Friday", "opens": "09:00", "closes": "16:00" }
],
"companyMeta": {
"name": "Pinnacle Ventures",
"description": "Early-stage venture capital firm investing in B2B SaaS",
"industry": "Venture Capital",
"logo": "https://pinnacleventures.com/images/logo.png",
"foundingDate": "2015",
"language": "en"
},
"pagesScraped": 6,
"leadScore": 85,
"dataQuality": "high",
"bestContact": {
"name": "Marcus Rodriguez",
"title": "Managing Partner",
"email": "m.rodriguez@pinnacleventures.com",
"score": 92,
"reasons": ["Senior title (Managing Partner)", "Personal email found", "Verified email (92% confidence)", "LinkedIn found", "Job title available"]
},
"companyType": "financial_services",
"confidence": {
"emailConfidence": 92,
"contactConfidence": 90,
"overallConfidence": 91,
"riskFlags": []
},
"coverage": {
"emails": "complete",
"contacts": "complete",
"phones": "found",
"socials": "found",
"addresses": "found",
"contactForm": false
},
"decision": {
"tier": "A",
"reason": "Verified personal email + senior contact β€” ready to contact"
},
"contactFormDetected": false,
"domainPurity": 100,
"isContactable": true,
"catchAllDetected": false,
"catchAllImplication": null,
"buyingCommittee": {
"decisionMakers": [
{ "name": "Marcus Rodriguez", "title": "Managing Partner", "email": "m.rodriguez@pinnacleventures.com", "seniority": 100, "reachable": true }
],
"influencers": [
{ "name": "Sarah Chen", "title": "VP of Portfolio Operations", "email": null, "seniority": 80, "reachable": false }
],
"champions": [
{ "name": "James Okafor", "title": "Director of Business Development", "email": null, "seniority": 70, "reachable": false }
],
"blockers": [],
"size": 3
},
"botProtection": { "detected": false, "type": null, "recommendation": null },
"changeFlags": ["NEW_TEAM_HIRE", "TIER_UPGRADED"],
"changeSinceLastRun": {
"addedEmails": ["m.rodriguez@pinnacleventures.com"],
"removedEmails": [],
"addedPersonalEmails": ["m.rodriguez@pinnacleventures.com"],
"removedPersonalEmails": [],
"addedContacts": ["Marcus Rodriguez"],
"removedContacts": [],
"addedSocials": [],
"removedSocials": [],
"leadScoreDelta": 15,
"decisionTierBefore": "B",
"decisionTierAfter": "A",
"daysSinceLastSeen": 7
},
"firstSeenAt": "2026-03-15T14:32:18.456Z",
"lastSeenAt": "2026-04-30T14:32:18.456Z",
"crmPushResult": { "sent": true, "statusCode": 201, "format": "hubspot", "error": null },
"summary": {
"primaryEmail": "m.rodriguez@pinnacleventures.com",
"primaryContact": "Marcus Rodriguez",
"title": "Managing Partner",
"decision": "A",
"confidence": 91,
"leadScore": 85
},
"recordType": "lead",
"recommendation": null,
"scrapedAt": "2026-04-30T14:32:18.456Z"
}

Output fields

FieldTypeDescription
urlstringNormalized input URL (HTTPS, no trailing slash)
domainstringDomain with www. stripped (e.g., pinnacleventures.com)
emailsstring[]All deduplicated email addresses from all crawled pages, junk addresses filtered out
personalEmailsstring[]Emails addressed to individuals (not matching generic prefixes like info@, hello@, contact@, sales@)
genericEmailsstring[]Role-based emails matching 16 generic prefixes (info, hello, contact, office, sales, billing, support, etc.)
verifiedEmailsobject[]Email verification results (only present when verifyEmails is enabled)
verifiedEmails[].emailstringThe email address that was verified
verifiedEmails[].statusstringVerification result: valid, invalid, or risky
verifiedEmails[].confidencenumberConfidence score from 0 to 100
verifiedEmails[].reasonstringHuman-readable explanation (e.g., "MX records found, mailbox accepts mail")
phonesstring[]Deduplicated phone numbers; deduplication keyed on digits only so format variants collapse to one entry
contactsobject[]Named team members extracted from team/about pages
contacts[].namestringPerson's full name (proper capitalization validated, Unicode accent support)
contacts[].titlestringJob title (optional; present when found adjacent to the name)
contacts[].emailstringEmail address linked to this person (optional; from mailto: in their team card)
socialLinksobjectSocial media profile URLs keyed by platform (13 platforms: linkedin, twitter, facebook, instagram, youtube, tiktok, pinterest, github, discord, telegram, threads, whatsapp, snapchat)
addressesobject[]Physical addresses extracted from the website
addresses[].streetAddressstringStreet address
addresses[].addressLocalitystringCity
addresses[].addressRegionstringState/region
addresses[].postalCodestringZIP/postal code
addresses[].addressCountrystringCountry
addresses[].formattedstringFull address as a single string
businessHoursobject[]Business opening hours from schema.org
businessHours[].dayOfWeekstringDay of week (e.g., "Monday")
businessHours[].opensstringOpening time (e.g., "09:00")
businessHours[].closesstringClosing time (e.g., "17:00")
companyMetaobjectCompany metadata extracted from structured data and meta tags
companyMeta.namestringCompany name (from JSON-LD Organization or og:site_name)
companyMeta.descriptionstringCompany description
companyMeta.industrystringIndustry or keywords
companyMeta.logostringLogo/image URL
companyMeta.employeeCountstringNumber of employees (when available in schema.org)
companyMeta.foundingDatestringFounding date
companyMeta.languagestringWebsite language from HTML lang attribute
pagesScrapednumberTotal pages processed for this domain (homepage + discovered subpages)
leadScorenumber0-100 lead quality score. Weighted: personal email (25), named contact (20), phone (15), LinkedIn (10), verified email (10), address (5), hours (5), company meta (5), multiple personal emails bonus (5)
dataQualitystringData quality indicator: high (3+ signal types), medium (2 types), low (1 type), no-data (nothing found)
bestContactobjectHighest-ranked contact person to email. Null when no named contacts found
bestContact.namestringPerson's name
bestContact.titlestringJob title (null if unknown)
bestContact.emailstringEmail address (null if not found)
bestContact.scorenumber0-100 contact score based on seniority, email availability, verification, LinkedIn
bestContact.reasonsstring[]Plain-English reasons for the score (e.g., "Senior title (CEO)", "Personal email found")
topContactsobject[]Top 3 ranked contacts sorted by outreach priority. Same structure as bestContact. Empty array when no contacts found
generatedEmailsobject[]Emails generated for contacts missing addresses (only when fillMissingEmails enabled)
generatedEmails[].namestringPerson name the email was generated for
generatedEmails[].emailstringGenerated email address
generatedEmails[].patternstringEmail pattern used (e.g., "first.last")
generatedEmails[].confidencenumberConfidence percentage (0-100) that this is the correct email
companyTypestringClassified business type: saas, agency, consulting, legal, accounting, ecommerce, healthcare, real_estate, financial_services, manufacturing, education, nonprofit, construction, hospitality, media, recruitment, logistics, technology. Null when unclassifiable
recommendationstringActionable next step: "Use Email Pattern Finder for names without emails", "Try deepScan=true", "Use Pro version (Next.js detected)". Null when result is complete
confidenceobjectTrust breakdown: emailConfidence (0-100), contactConfidence (0-100), overallConfidence (0-100 weighted 60/40), riskFlags (catch_all_domain, emails_unverified, generic_emails_only, contains_generated_emails, javascript_site_partial_data)
coverageobjectData completeness per signal: emails (complete/partial/missing), contacts (complete/partial/missing), phones (found/missing), socials (found/missing), addresses (found/missing), contactForm (boolean)
decisionobjectOutreach readiness: tier (A/B/C) and reason. A = ready to contact, B = usable, C = needs work
contactFormDetectedbooleanTrue when a contact form was found β€” explains why no direct email may be listed
domainPuritynumberPercentage (0-100) of emails matching the website's root domain. 100 = all emails are @company.com. Low values = third-party or partner emails
summaryobjectFlat summary for CSV/spreadsheet: primaryEmail, primaryContact, title, decision (A/B/C), confidence (0-100), leadScore (0-100)
failureTypestringFailure classification: blocked, timeout, js-required, no-data, or parse-error (null on successful scrapes)
scrapeErrorstringHuman-readable error message with actionable suggestion (present only on failed domains)
jsWarningstringWarning when a JavaScript framework is detected and no data was extracted
botProtectionobjectBot-protection detection: { detected: boolean, type: 'cloudflare' | 'datadome' | 'akamai' | 'perimeterx' | 'imperva' | 'generic-challenge' | null, recommendation: string | null }
buyingCommitteeobjectv2.0. Contacts grouped by buying-committee role: decisionMakers (CEO/founder/C-suite), influencers (VP/Director), champions (Sales/BD β€” most reachable), blockers (Legal/Procurement). Each member: { name, title, email, seniority, reachable }. Plus size total.
catchAllDetectedbooleanv2.0. True when the email-verifier flagged the domain as catch-all.
catchAllImplicationstringv2.0. Plain-English consequence of the catch-all flag for outreach decisions. Null when not catch-all.
isContactablebooleanv2.0. Convenience boolean β€” true when this domain has a personal email or bestContact.email.
recordTypestringv2.0. Discriminator: lead for scraped domains, error for run-level errors.
changeFlagsstring[]v2.0 (monitoring mode). Stable change codes since last run: NEW_DOMAIN / NEW_EMAILS / NEW_PERSONAL_EMAIL / NEW_TEAM_HIRE / TEAM_DEPARTURE / REMOVED_EMAILS / NEW_SOCIAL_PROFILE / TIER_UPGRADED / TIER_DOWNGRADED / LEAD_SCORE_INCREASED / LEAD_SCORE_DECREASED / UNCHANGED. Empty when monitoring off.
changeSinceLastRunobjectv2.0 (monitoring mode). Per-domain delta: { addedEmails, removedEmails, addedPersonalEmails, removedPersonalEmails, addedContacts, removedContacts, addedSocials, removedSocials, leadScoreDelta, decisionTierBefore, decisionTierAfter, daysSinceLastSeen }. Null on first observation.
firstSeenAtstringv2.0 (monitoring mode). ISO timestamp of first observation across monitor runs.
lastSeenAtstringv2.0 (monitoring mode). ISO timestamp of most recent observation.
crmPushResultobjectv2.0 (CRM auto-push). Per-record outcome: { sent, statusCode, format, error }. Null when push was skipped (e.g. crmOnlyTierA: true filter, no email).
sendDecisionobjectv3 β€” the headline action field. { action: 'SEND_NOW' | 'VERIFY_FIRST' | 'SKIP' | 'ENRICH_MORE', riskLevel: 'low' | 'medium' | 'high', reasons: string[] }. Branch automation on action, never parse the prose.
sendPlanobjectv3 β€” sequence-ready execution plan. { status: 'ready' | 'verify-first' | 'enrich-more' | 'skip', priority, safeToAutomate, channel: 'email-first' | 'phone-first' | 'linkedin-first' | 'multi-channel' | 'no-channel', followUpStrategy, personalizationHint, openingAngle, replyLikelihoodHeuristic, methodology }. replyLikelihoodHeuristic is a 0-1 heuristic ranking β€” composed of public-benchmark signals (verified-email +20pp, senior-title +10pp, etc.) β€” NOT a trained ML probability. safeToAutomate=true is the gate to set on automation.
firstTouchobjectv3 β€” opening-line primitive. { angle, hook, line, methodology }. Generated deterministically from job-title regex + company-type lookup + companyMeta β€” NOT an LLM, NOT generated email copy. The line is a sentence STEM the user completes with their own value prop. Null when no usable best-contact title exists.
pipelineValueobjectv3 β€” relative priority within batch. { tierWeight, contactQualityWeight, companyFitWeight, relativeScore (0-1, normalised against the strongest lead in the run), rankInBatch (1 = best), methodology }. NOT an absolute likelihood. Answers "who do I contact first?" within this list.
whyThisLeadstring[]v3 β€” intent signals (NOT scoring reasons). ["Partnerships role present β†’ likely open to external collaboration", "Sales function exists β†’ outbound motion likely", ...]. Empty array when no intent signals match.
recoveryPlanobjectv3. When a domain failed or returned thin data: { nextBestTool, nextBestActorSlug, method, confidence }. Maps each failureType to a specific next-best Apify actor or technique. Null on successful complete results.
plainEnglishSummarystringv3. One-sentence human-readable takeaway per domain. Usable verbatim in emails, Slack messages, AI summaries, dashboards β€” no post-processing.
scrapedAtstringISO 8601 timestamp when the result was assembled

How much does it cost?

$0.15 per website. No subscription. No monthly minimum. You only pay when contact data is found β€” failed domains are free.

What you getWebsitesCostTypical usable leads
Quick test10$1.506-8 Tier A/B leads
Prospect list100$1540-60 usable leads
Campaign batch500$75200-350 usable leads
Enterprise1,000$150400-700 usable leads

"Usable leads" = domains with at least one personal email or named contact with generated email (Tier A or B).

You can set a spending limit per run. All scraped data is always delivered to the dataset regardless of budget β€” charges stop when your limit is reached.

Extract website contacts using the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/website-contact-scraper").call(run_input={
"urls": [
"https://pinnacleventures.com",
"https://meridiantech.io",
"https://atlaslogistics.com",
],
"maxPagesPerDomain": 5,
"deepScan": True,
"verifyEmails": True,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
domain = item["domain"]
personal = item.get("personalEmails", [])
generic = item.get("genericEmails", [])
verified = item.get("verifiedEmails", [])
valid_count = sum(1 for v in verified if v["status"] == "valid")
print(f"{domain}: {len(personal)} personal, {len(generic)} generic, {valid_count} verified-valid")
for contact in item.get("contacts", []):
print(f" {contact['name']} β€” {contact.get('title', 'no title')}")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/website-contact-scraper").call({
urls: [
"https://pinnacleventures.com",
"https://meridiantech.io",
"https://atlaslogistics.com",
],
maxPagesPerDomain: 5,
deepScan: true,
verifyEmails: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
const validEmails = (item.verifiedEmails ?? []).filter(v => v.status === "valid");
console.log(`${item.domain}: ${item.personalEmails.length} personal, ${validEmails.length} verified-valid`);
for (const contact of item.contacts) {
console.log(` ${contact.name} (${contact.title ?? "no title"})`);
}
}

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~website-contact-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://pinnacleventures.com", "https://meridiantech.io"],
"maxPagesPerDomain": 5,
"deepScan": true,
"verifyEmails": true
}'
# Fetch results once the run completes (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

Tips for best results

  1. Enable deep scan for European companies. EU regulations require businesses to display contact information on imprint pages (/impressum, /imprint). Deep scan probes these 14 hidden paths that standard crawling misses, often uncovering emails and phone numbers not listed on the main contact page.

  2. Enable email verification for outreach lists. The built-in verifier catches invalid addresses, disposable domains, and catch-all servers before they reach your outreach tool. This keeps bounce rates below 5% and protects your sender reputation. Filter output by verifiedEmails[].status === "valid" for the cleanest list.

  3. Enable proxies for batches over 20 sites. Apify Proxy rotates IP addresses automatically. Set proxyConfiguration: { "useApifyProxy": true } in your input. This is the single biggest factor in preventing blocks on large batches.

  4. Filter emails by domain post-processing. The output may include third-party emails from embedded contact forms, partner widgets, or job board integrations. After downloading, filter emails to keep only those ending in @yourtargetdomain.com.

  5. Pair with Email Pattern Finder for gap coverage. If Website Contact Scraper returns team member names but no personal emails, feed the names and domain into Email Pattern Finder to predict addresses based on the company's first.last@, first@, or flast@ naming convention.

  6. Disable includeNames for pure email/phone runs. Name extraction performs DOM traversal with 11 CSS selectors and Schema.org queries per page. If you only need emails and phones, disabling it reduces per-page processing time.

  7. Set a spending cap for large batches. Use the run's max cost setting to cap spend at a comfortable amount. Website Contact Scraper stops gracefully at the limit and logs how many domains were processed vs. total.

  8. Use CSV export for CRM bulk import. Download results as CSV and map columns directly to HubSpot, Salesforce, or Pipedrive contact import templates. The flat structure (personalEmails, genericEmails, phones, domain) imports without transformation.

Combine with other Apify actors

ActorHow to combine
Email Pattern FinderWhen contacts have names but no emails, predict addresses from the company's email naming convention ($0.10/domain)
Bulk Email VerifierVerify emails separately if you ran Website Contact Scraper without verifyEmails enabled ($0.005/email)
B2B Lead QualifierScore scraped contacts 0-100 using company data, tech stack, and 30+ signals ($0.15/lead)
Website Contact Scraper ProAutomatic fallback when enableProFallback is on β€” or use standalone for JavaScript-heavy sites (React, Angular, Vue SPAs) that require a browser to render contact data ($0.35/site)
HubSpot Lead PusherPush scraped contact records directly into HubSpot as new contacts or update existing ones
Website Tech Stack DetectorIdentify 100+ technologies used by each company for technographic lead scoring ($0.10/site)
B2B Lead Gen SuiteFull pipeline: input URLs to scraped contacts to enrichment to scored leads, all in one actor ($0.25/lead)

Limitations

  • No JavaScript rendering β€” Website Contact Scraper uses CheerioCrawler which parses static server-rendered HTML. Single-page applications that load contact data via client-side JavaScript (React, Angular, Vue) will not have their dynamic content extracted. The actor detects these frameworks and warns you in the output. For JS-heavy sites, use Website Contact Scraper Pro.
  • Same-domain links only β€” Website Contact Scraper only follows links within the same domain as the input URL. Cross-domain team directories or externally hosted about pages are not discovered.
  • Name extraction depends on HTML patterns β€” team member detection relies on Schema.org markup, 11 recognized CSS class names, and heading-paragraph structure. Custom or unconventional layouts may not trigger any of the three extraction strategies.
  • Phone extraction uses targeted selectors β€” to minimize false positives, phone regex first targets header, footer, nav, address, and elements with contact/phone/info class names. Numbers formatted as bare digits without separators will not be captured.
  • No authentication support β€” only publicly accessible pages are processed. Login-gated employee directories, intranets, and members-only portals are not supported.
  • First social link per platform β€” if a page contains multiple LinkedIn profiles (e.g., company page + individual employee profiles), only the first matched URL per platform is recorded. Footer/header/nav links are prioritized over body links.
  • One record per domain β€” multiple input URLs on the same domain (e.g., acmecorp.com and www.acmecorp.com) are merged into a single output record. This is by design to prevent duplicate billing.
  • Verification adds runtime β€” enabling verifyEmails adds 1-2 minutes to the run as a separate verification actor is called. For batches with 1,000+ unique emails, this may take longer.

Integrations

  • Zapier β€” trigger a Zap when a run completes and push verified emails and contact names directly to your CRM or notification system
  • Make β€” build automated workflows that route personal vs. generic emails to different CRM fields or marketing lists
  • Google Sheets β€” export results directly to a Google Sheet for collaborative review, filtering by verification status, or manual enrichment
  • Apify API β€” trigger runs programmatically and retrieve results in JSON, CSV, XML, or Excel format using the Python or JavaScript SDK
  • Webhooks β€” receive an HTTP POST when a run completes and automatically trigger downstream processing in your backend
  • LangChain / LlamaIndex β€” feed verified contact datasets into AI agent workflows for automated research, outreach drafting, or lead qualification

Troubleshooting

  • Empty email results despite a site showing contact addresses β€” The site likely loads contact information via JavaScript after the initial page load. Website Contact Scraper parses only the static HTML returned by the server. Check the output for a jsWarning field. For dynamically rendered sites, switch to Website Contact Scraper Pro.

  • Run takes longer than expected for large batches β€” Each website crawls up to maxPagesPerDomain pages with a 30-second timeout per page. A batch of 500 sites at 5 pages each could make up to 2,500 HTTP requests. Lower maxPagesPerDomain to 3 for a faster pass. Enabling Apify Proxy can also improve speed on sites that throttle repeated requests. Email verification adds 1-2 minutes at the end.

  • Phone numbers are missing from output β€” Phone extraction requires recognized formatting (international prefix, parentheses, or dash/dot separators). Website Contact Scraper first checks contact-specific page areas, then falls back to full body text. Numbers formatted as bare 10-digit strings without separators are intentionally skipped to avoid false positives from zip codes, IDs, and other numeric data.

  • Some contacts have names but no emails β€” Name extraction and email extraction are independent processes. Not every team member lists a personal email β€” many sites only have a generic contact@ address. Use Email Pattern Finder to predict personal email addresses from names and the company domain.

  • Verified emails showing "risky" status β€” A "risky" status typically means the domain has a catch-all configuration that accepts all addresses, making it impossible to confirm whether a specific mailbox exists. These emails may still be deliverable. Use the confidence score to decide your threshold β€” addresses above 70% confidence are generally safe for outreach.

Responsible use

  • Website Contact Scraper only accesses publicly visible web pages available to any browser without authentication.
  • Respect website terms of service and robots.txt directives.
  • Comply with GDPR, CAN-SPAM, CASL, and other applicable data protection laws when using scraped contact data for commercial outreach.
  • Do not use extracted personal contact information for spam, harassment, or unauthorized purposes.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

How many websites can Website Contact Scraper process in one run? The input accepts up to 500 URLs per run. Website Contact Scraper processes sites concurrently (up to 20 at once) at ~0.8 seconds per domain. A batch of 100 websites completes in under a minute. 500 websites in under 10 minutes. Enable proxies for batches over 20 sites.

Does Website Contact Scraper verify email addresses? Yes. Enable the verifyEmails option and Website Contact Scraper runs MX record checks, disposable domain detection, and role-based flagging on every found email. Each verified email gets a status (valid/invalid/risky), confidence score (0-100), and human-readable reason. Because emails are extracted from the live website rather than a stale database, they tend to be more current than results from pre-crawled sources. This uses Bulk Email Verifier internally β€” no separate run or additional cost needed.

What is the difference between personalEmails and genericEmails in the output? Personal emails are addressed to individuals (sarah@, j.smith@, m.rodriguez@). Generic emails use role-based prefixes like info@, hello@, contact@, office@, sales@, billing@, support@, and 9 other patterns. Website Contact Scraper classifies all found emails into both arrays automatically, so you can target decision-makers directly instead of shared inboxes.

Can Website Contact Scraper extract emails hidden behind JavaScript? No. Website Contact Scraper uses CheerioCrawler, which parses static HTML. If contact emails are loaded via client-side JavaScript (common on React and Next.js sites), they will not appear in the output. Website Contact Scraper detects these frameworks and adds a jsWarning to the result. For JavaScript-rendered sites, use Website Contact Scraper Pro.

What is deep scan mode and when should I enable it? Deep scan probes 14 hidden page paths β€” /imprint, /impressum, /privacy-policy, /legal, /datenschutz, /support, /careers, and more β€” that often contain contact information not linked from the main navigation. European businesses are legally required to display contact details on imprint pages. Enable deep scan for EU companies or any site where the standard crawl returned fewer contacts than expected.

What types of email addresses does Website Contact Scraper filter out? Website Contact Scraper removes noreply@, no-reply@, donotreply@, test@, webmaster@, postmaster@, mailer-daemon@, and root@ addresses. It also filters emails ending in image, CSS, or JavaScript file extensions (.png, .jpg, .css, .js) and addresses from known infrastructure domains (sentry.io, wixpress.com, placeholder.*).

Is it legal to scrape contact information from business websites? The legality of scraping publicly available contact information depends on your jurisdiction and how you use the data. In the US, the 2022 hiQ Labs v. LinkedIn ruling supports accessing public data. In the EU, GDPR restricts how personal data can be processed for outreach. Always review the target site's Terms of Service and consult legal counsel for your specific use case. See Apify's web scraping legality guide.

How is Website Contact Scraper different from Hunter.io or Apollo.io? Hunter.io and Apollo.io query pre-crawled databases β€” the data can be days or weeks stale, and neither tells you who to email or how confident the data is. Website Contact Scraper crawls the live website each time, then scores every domain, ranks the best contact to email, assigns an A/B/C outreach readiness tier, breaks down confidence with risk flags, and recommends what to do for incomplete results. It also auto-fills missing emails, classifies company types, and extracts social links for 13 platforms. All at $0.15/site with no subscription β€” less than one month of Hunter's cheapest plan for 100 companies.

Can I schedule Website Contact Scraper to run on a recurring basis? Yes. Use Apify Schedules to run Website Contact Scraper daily, weekly, or at any custom cron interval. Because Website Contact Scraper extracts from the live website each time, scheduled runs capture new team members, updated phone numbers, and changed email addresses that database-based tools miss between their crawl cycles. Combine with webhooks to automatically push new results to your CRM.

How accurate is the contact name extraction? Accuracy depends on the site's HTML structure. Sites using Schema.org Person markup or standard team-card CSS patterns (.team-member, .team-card, etc.) yield near-perfect results. Website Contact Scraper uses a strict proper-name regex with Unicode accent support (handles names like Bjorn, O'Brien, Anne-Marie) and a 40-word junk-name blocklist to minimize false positives. Sites with custom or unconventional layouts may produce fewer contacts.

What happens if a website is down or blocks the request? Website Contact Scraper retries each failed request up to 3 times with session pooling and persistent cookies. If all retries fail, the domain is included in the output with a scrapeError field explaining what went wrong. Failed domains are not charged in pay-per-event mode. The run continues processing all other domains without interruption.

Can I push scraped contacts into HubSpot or Salesforce automatically? Yes β€” built in as of v2.0. Set crmWebhookUrl to your CRM's webhook endpoint and crmFormat to hubspot / salesforce / generic-json. Every Tier-A enriched lead is POSTed straight to your CRM after pushData with native field shapes (HubSpot Contact properties, Salesforce Lead fields, or full JSON for Make.com / Zapier / n8n). Default crmOnlyTierA: true keeps your CRM clean; per-record crmPushResult field gives you a full audit trail. For multi-step flows, HubSpot Lead Pusher and Zapier/Make webhooks are still good options.

Can I run Website Contact Scraper as a continuous monitor for new hires and team changes? Yes β€” built in as of v2.0. Set compareToPrevRun: true and schedule the actor (Apify Schedules β†’ daily / weekly / cron). The first run baselines your watchlist. Every subsequent run flags NEW_TEAM_HIRE, NEW_PERSONAL_EMAIL, TEAM_DEPARTURE, TIER_UPGRADED, and 7 other change codes per domain plus a delta block (added emails, removed emails, score delta, days-since-last-seen). Pair with crmWebhookUrl and you have a continuous CRM-enrichment loop: new contacts auto-flow in, tier upgrades auto-update existing records, you never touch a CSV.

What's the buying committee output and why does it matter? B2B sales rarely close on a single contact β€” the average B2B purchase involves 6-10 decision-makers. v2.0 groups every domain's contacts into 4 buckets: decisionMakers (CEO/founder/C-suite, seniority β‰₯ 90), influencers (VP/Director/Head of, 70-89), champions (Sales/BD/Partnerships at any senior level β€” usually most reachable), and blockers (Legal/Procurement/Finance β€” email last). Use champions for the first outbound touch, decisionMakers once the champion has connected you internally, influencers for technical buyer alignment, and skip blockers until contracts. The size field tells you how complete the committee is β€” a domain with 1 decision-maker and 0 champions is a Tier B opportunity at best.

The actor used to fail silently on Cloudflare-protected sites. Did v2.0 fix that? Yes. v2.0 detects Cloudflare, DataDome, Akamai, PerimeterX, Imperva, and generic challenge pages and emits botProtection: { detected, type, recommendation } on every record. When enableProFallback: true, blocked pages are now auto-routed through the Pro browser fallback (previously JS-only). The recommendation field tells you the best mitigation per protection type β€” usually residential proxies + browser rendering for Cloudflare / DataDome.

What this actor does NOT do

This is intentionally a focused tool. For things outside scope, use the right sibling instead:

You need…Use this actor instead
LinkedIn profile data, connections, postsLinkedIn Profile Scraper (third-party β€” TOS-sensitive)
Company database lookups (1B+ pre-crawled records)Apollo.io / ZoomInfo (paid SaaS)
Browser-rendered JavaScript-heavy sites at scaleWebsite Contact Scraper Pro β€” or enable enableProFallback here
Generated outreach copy / email sequencesOutreach / Salesloft / Smartlead
Email pattern detection in isolation (without a website crawl)Email Pattern Finder
Bulk email verification of an existing listBulk Email Verifier
Find companies via Google Maps by location/categoryGoogle Maps Email Extractor
Multi-source waterfall enrichment of named contactsWaterfall Contact Enrichment
Person-level enrichment (LinkedIn, social, work history)Person Data Enrichment
30+ business-quality signals per company (hiring signals, growth, awards)B2B Lead Qualifier
Fresh business contacts via Google Maps + website enrichmentGoogle Maps Lead Enricher
End-to-end outbound system (source + enrich + qualify + push)B2B Lead Gen Suite

This actor specifically does NOT:

  • Render JavaScript natively β€” that's the Pro fallback's job (or set enableProFallback: true)
  • Access private databases (Apollo, ZoomInfo, Clearbit) β€” by design, all data comes from the live public website
  • Scrape LinkedIn β€” TOS-hostile, plenty of dedicated tools exist
  • Send outreach emails β€” find who to email; let your sequencer handle the messaging
  • Phone-direct-dial enrichment β€” TCPA compliance risk, no reliable open source
  • WHOIS / domain age / DNS health checks β€” out of scope, GDPR-masked in EU anyway
  • Funding / revenue estimates β€” no open API, Crunchbase / PitchBook are enterprise-licensed

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom scraping solutions or enterprise integrations, reach out through the Apify platform.