Website Lead Intelligence β Find the Right Person, Verified
Pricing
$150.00 / 1,000 website scanneds
Website Lead Intelligence β Find the Right Person, Verified
Turn company websites into ready-to-email decision-makers. Verified emails, named contacts ranked by seniority, buying-committee classification, A/B/C decision tiers, and optional scheduled monitoring + CRM auto-push to HubSpot/Salesforce. $0.15/site.
Pricing
$150.00 / 1,000 website scanneds
Rating
3.1
(3)
Developer
ryan clinton
Actor stats
3
Bookmarked
371
Total users
43
Monthly active users
4 days ago
Last modified
Categories
Share
Website Lead Intelligence
Previously Website Contact Scraper β same actor, now a send-decision engine, not just a scraper.
Paste company websites β get a send-ready list you can safely email in under 60 seconds. $15 for 100 companies.
Stop wasting hours building lead lists you still don't trust.
This is not a database or a scraper. It is a send-decision engine:
- Input: company domains
- Output: decision-makers + verified emails + whether you can send
A send-decision engine turns company domains into a ranked, verified outreach list with a clear next action per lead.
Website Lead Intelligence is a send-decision engine β not a database or enrichment tool.
Every domain ships with a single action β SEND_NOW, VERIFY_FIRST, SKIP, or ENRICH_MORE β plus a plain-English one-liner explaining why.
- β‘ First usable lead: ~5 seconds
- β‘ Full send-ready list: ~60 seconds
- π Typical run: 100 domains β 40β60 usable leads in ~60 seconds
- π Processes up to 500 domains per run. Typical B2B email hit rate: 60β80%.
- π° $0.15 per website with contact data. Filtered or empty domains: not charged.
- π§ Verified emails + named decision-makers + buying committee + next action per lead β extracted live, never a database
Paste 100 company websitesβGet 60+ decision-makers with verified emailsβStart outreach in minutes β $15
What this replaces (common workflows)
Find emails from company websites β including decision-makers β and verify whether each one is safe to use before sending.
Verify emails before cold outreach β including whether they are safe to send β without separate tools or manual checks.
Verify emails before cold outreach β and know if it will bounce before you hit send.
Outbound lead generation from company websites β turn a list of domains into a send-ready outreach list.
- Find emails from company websites
- Identify decision-makers for outreach
- Verify emails before sending cold emails
- Build a cold outreach list from a list of domains
- Decide who to email first when you have a list of target companies
- Stop a bad email from going to a catch-all domain that will damage your sender reputation
What this tool does (in one-line truths)
- Finds decision-makers from company websites
- Verifies whether each email is safe to use
- Tells you if you can send, should verify, or should skip
- Ranks who to contact first within your batch
- Gives you the opening-line angle per lead
- Outputs a send-ready outreach list in ~60 seconds for $0.15 per company
Website Lead Intelligence outputs a send-ready list β every lead is verified, ranked, and assigned a clear next action (SEND_NOW / VERIFY_FIRST / SKIP / ENRICH_MORE).
What your final list actually looks like
Input:
{ "urls": ["https://stripe.com", "https://shopify.com", "https://nike.com"], "goal": "quick-outreach" }
Output (one record per domain, sorted by pipeline-value rank):
Stripe SEND_NOW (low risk, rank #1)β John Smith β Head of Partnershipsβ john@stripe.com (verified, 92% confidence)β Plain English: "Best person to email at Stripe is John Smith(Head of Partnerships). Email is verified and safe. You canreach out now."β First Touch (opening-line stem):β’ angle: revenue-side β pitch the partnership/pipeline angleβ’ hook: "Stripe appears to be scaling outbound β partnershipsrole suggests external pipeline expansion"β’ line: "Saw your head of partnerships role at Stripe β quickidea on the partnership / pipeline / growth side"β Why this lead exists:β’ Partnerships role present β likely open to external collaborationβ’ Sales function exists β outbound motion likelyβ Send Plan: ready, channel=email-first, safeToAutomate=true,followUp="2 follow-ups, 3 days apart, then mark not interested"β Pipeline Value: 1.0 relativeScore (top of batch), rank #1Shopify VERIFY_FIRST (medium risk, rank #4)β Sarah Chen β Director of Salesβ sarah@shopify.com (catch-all domain detected)β Plain English: "Shopify looks promising but the domain acceptsmail to any address (catch-all). Run a final verification passbefore sending."β Send Plan: verify-first, channel=email-first, safeToAutomate=false,followUp="After verification: 2 follow-ups, 3 days apart"Nike ENRICH_MORE (high risk, no rank)β No personal email found β only contact form + info@nike.comβ Plain English: "No contact data found on Nike's public website.The recovery plan suggests next-best tools."β Recovery plan: Person Data Enrichment via LinkedIn β search bycompany + role (medium confidence)β Send Plan: enrich-more, channel=no-channel, safeToAutomate=false
Branch automation on sendDecision.action + sendPlan.safeToAutomate. Plain-English summary + first-touch line drop straight into Slack, emails, agent prompts.
Set autoFilter: 'send-now-only' and the dataset only contains green-light leads β nothing to skip past.
Everything below explains how it works.
Why you can trust each lead
Each lead is evaluated against explicit verification and risk criteria before a send decision is assigned.
Every SEND_NOW is gated on real signal, not just a lead score:
β Email verified β MX records present, mailbox accepts mail (β₯80% confidence)
β No catch-all flag β domain doesn't accept mail to any address
β Personal-email pattern β addressed to a person, not info@ / hello@ / support@
β Senior decision-maker β title matches CEO / VP / Director / Head of patterns
β No risk flags β no catch_all_domain, emails_unverified, generic_emails_only, or javascript_site_partial_data flags
β Found on the live website β extracted at run time, not from a database that crawled weeks ago
When any of these signals fail, the action automatically downgrades to VERIFY_FIRST or ENRICH_MORE. Never silently SEND_NOW.
Fast start
{"urls": ["https://stripe.com", "https://shopify.com", "https://figma.com"],"goal": "quick-outreach"}
Three goals: quick-outreach (fastest list), high-deliverability (cold-outreach safe), max-coverage (every possible lead). Override preset + confidenceMode directly for manual control.
What this actually does (in practice)
- Finds the right person at each company
- Verifies their email
- Tells you if it's safe to send (
SEND_NOW/VERIFY_FIRST/SKIP/ENRICH_MORE) - Gives you the opening angle ("revenue-side β pitch the partnership/pipeline angle")
- Ranks who to contact first within your batch
- Auto-pushes Tier-A leads to HubSpot / Salesforce β optional
- Detects new hires + tier upgrades on scheduled re-runs β optional
Pricing: $0.15 per website with contact data. Filtered or empty domains are not charged. Speed: 100 sites in ~50s. 500 in under 10 minutes.
What happens when you run this
- Paste your domains
- Hit run (no config β
autohandles everything) - Wait ~60 seconds
- Get a ranked, send-ready list
- Email the Tier-A leads immediately β or auto-push them to your CRM
No setup. No configuration. No subscription.
Will this work for me?
Works best if:
- β B2B companies (agencies, consulting, SaaS, professional services, law, accounting, real estate, manufacturing)
- β Sites with team / about / contact pages (most business sites)
- β
EU companies (legally required to publish contacts on
/imprint) - β Up to 500 domains per run
Lower yield (still works, but expect fewer results):
- β οΈ Pure e-commerce stores (often hide behind contact forms)
- β οΈ Single-page apps without team pages (React / Next.js / Vue) β set
preset: 'auto'to auto-bypass - β οΈ Cloudflare / DataDome / Akamai-protected sites β
automode handles these too via Pro fallback ($0.35/site extra) - β οΈ Tiny solo-operator sites (one info@ inbox is often all you'll get)
Typical yield on a B2B batch:
| Metric | Range | Best on |
|---|---|---|
| Email hit rate | 60-80% | Agencies, professional services, B2B SaaS |
Personal email rate (vs info@) | 30-50% | Sites with team pages |
| Phone hit rate | 40-65% | US/UK sites with tel: links |
| Named contact + title | 20-40% | Sites with /team or /about pages |
Set requirePersonalEmail: true and filtered domains are excluded from PPE billing β you only pay for leads you keep.
When it doesn't work (and why)
Empty results are almost always one of: no team / contact page, email hidden behind a contact form, JavaScript-rendered SPA, or bot protection. The actor classifies each via failureType and ships a one-line recommendation per record.
Fix priority:
preset: 'auto'β handles JS, Cloudflare, EU/imprintautomaticallydeepScan: trueβ probes 14 hidden paths small companies useenableProFallback: trueβ real-browser rendering for blocked / JS sites ($0.35/site, only when needed)- Read the
recommendationβ every empty result names the next-best tool
Failed domains cost $0.
Confidence mode (risk appetite)
safe ships only verified personal emails (cold outreach). aggressive ships everything including pattern-generated. Default balanced.
| Mode | Behaviour | Use when |
|---|---|---|
safe | Only domains with a personal email; pair with verifyEmails: true for full safety | Cold outreach where bounces hurt sender reputation |
balanced | Default β verify but no aggressive filtering | Most outbound campaigns |
aggressive | Force fillMissingEmails: true so pattern-generated emails are surfaced | Low-stakes prospecting; maximum coverage |
Why this is different
Comparison
| Tool | What it does | What it doesn't do |
|---|---|---|
| Apollo / ZoomInfo | Find companies and contacts in a pre-crawled database | Does not verify if emails are safe to send; does not extract from the live website |
| Hunter.io | Finds email addresses for a domain | Does not tell you who to contact, rank decision-makers, or decide if it's safe |
| Clay | Flexible workflow-based enrichment | Requires configuration; does not ship send-decisions out of the box |
| Website Lead Intelligence | Finds decision-makers + verifies emails + tells you whether you can send + ranks who to contact first | Does not provide database discovery β bring your own URLs |
Clay alternative: use Website Lead Intelligence when you want a send-ready outreach list without building workflows. Clay needs configuration; this actor ships send-decisions out of the box.
What you don't have to do anymore
| Task | Replaced by |
|---|---|
| Hunt across team / about / contact pages | Best contact ranked per domain |
| Guess email patterns and hope | Verified or pattern-generated emails with confidence score |
| Decide if the email is safe to send | SEND_NOW / VERIFY_FIRST / SKIP action per lead |
| Write a generic opening sentence | First-touch angle + line stem per lead |
| Sort the list by hand to find best targets | Pipeline-value rank + autoFilter |
| Re-scrape weekly to catch new hires | Monitoring mode flags NEW_TEAM_HIRE |
| Massage CSVs for Instantly / Smartlead / Apollo | Native CSV export per platform |
Hunter gives addresses. Apollo gives database records. This actor gives the next action β extracted live, always current, $0.15 per result.
Is this better than Apollo? (no β different jobs)
Website Lead Intelligence complements tools like Apollo: Apollo finds companies, this tool determines who to contact and whether it's safe to send.
Apollo / ZoomInfo: find companies you don't have yet. Huge databases, but you still have to decide who to email β and whether it's safe.
This actor: you already have URLs. It finds the right person, verifies the email, and tells you if you can send β no guesswork.
Common pairing: discover in Apollo β export URLs β run here for fresh contacts + send-decisions.
Speed comparison
A 100-company outreach list, in real wall-clock time:
| Approach | Time | Cost | Send-ready leads |
|---|---|---|---|
| Manual research (1 SDR) | ~6β8 hours | SDR salary | ~30β50 |
| Apollo β CSV β cleanup | ~45 min + license | $99β$399/month | varies |
| Website Lead Intelligence | ~1 minute | $15 | 40β60 |
Every run logs a "you would have spent ~X minutes manually" line.
Don't have URLs? Use names or footer phrases
Leave urls empty, supply knownNames or footerPhrases (e.g. "we buy land in any state"). The actor resolves them via Google before extracting contacts. $0.002 per Google query + the standard $0.15 per site with contact data. Add nameSuffix to disambiguate generic names.
Advanced (for power users)
The features below are off by default. The auto preset already gives you verified, ranked leads β most users won't need anything below this line.
Scheduled monitoring + change detection (v2.0)
Turn one-shot extraction into a recurring product:
- Compare to previous run β set
compareToPrevRun: trueand the actor stores a per-domain snapshot in a named key-value store. On every subsequent run it diffs against the prior baseline and emitschangeFlags[]plus a per-domain delta block. - 11 stable change codes β
NEW_DOMAIN/NEW_EMAILS/NEW_PERSONAL_EMAIL/NEW_TEAM_HIRE/TEAM_DEPARTURE/REMOVED_EMAILS/NEW_SOCIAL_PROFILE/TIER_UPGRADED/TIER_DOWNGRADED/LEAD_SCORE_INCREASED/LEAD_SCORE_DECREASED/UNCHANGED. Branch on these in Slack routing, Zapier filters, or agent tool calls β never parse human-readable copy. - Per-domain delta block β
changeSinceLastRun.addedEmails / removedEmails / addedContacts / removedContacts / addedSocials / leadScoreDelta / decisionTierBefore / decisionTierAfter / daysSinceLastSeen. - First-seen / last-seen timestamps β
firstSeenAtis set the first time a domain is observed across monitor runs;lastSeenAtupdates every run. Pair with date filters for "domains that disappeared" (target acquired, site offline) detection. - Auto-derived state key β leave
monitorStateKeyblank and the actor hashes your input domain list. Same input list across runs lands on the same baseline. Override with a stable key (e.g.'us-saas-watchlist-2026') when your input list shifts but you want to maintain history. - Run-level monitor summary β KV
SUMMARY.monitorreportsnewDomains,domainsWithNewEmails,domainsWithNewContacts,domainsWithRemovedContacts,domainsWithTierUpgrade,domainsWithTierDowngrade,unchangedDomainsfor dashboards and Slack alerts.
Schedule it. A weekly run on a 200-company watchlist gives you "this week's new emails / new hires / contact churn" for $30/week. The change-monitoring view is the first one users see in the dataset, sorted to surface what changed.
Your outbound list improves itself over time:
- Week 1 β 40
SEND_NOWleads from your 200-company watchlist - Week 2 β +12 new contacts (
NEW_TEAM_HIREflagged) β re-evaluate as freshSEND_NOW - Week 3 β 8
TIER_UPGRADEDflips (Tier B β Tier A as emails get verified) β push to CRM as updates - Week 4 β 3
TEAM_DEPARTUREflags β archive in CRM, reassign to next contact intopContacts
The monitoring loop turns a one-shot scrape into a self-maintaining lead database. Pair with crmWebhookUrl and the loop runs end-to-end with no manual touch.
Buying committee (v2.0)
A single best-contact isn't enough for B2B deals β the average B2B purchase involves 6-10 decision-makers. Every domain now ships with a buying committee classified deterministically from the contacts you already extract:
decisionMakersβ seniority β₯ 90 (CEO, founder, CTO, CFO, COO, CMO, president, owner, managing partner)influencersβ seniority 70-89 (VP, Director, Head of, partner)championsβ Sales / Business Development / Partnerships / Account Executive / Account Manager / Customer Success at any senior level. Most reachable bucket β usually responds fastest, fewest gatekeepers.blockersβ Legal / General Counsel / Compliance / Procurement / Purchasing / Finance / Controller / Treasurer / Risk at any senior level. Email these last, not first.
Each member ships with name / title / email / seniority / reachable, sorted reachable-first then seniority-desc. The total size field tells you how complete the committee is.
Use it: in B2B SaaS outbound, run requirePersonalEmail: true and target champions first, decisionMakers second, influencers third, blockers never. In account-based marketing, the full buyingCommittee array is the export shape your team needs.
Outreach-tool CSV exports (v2.0)
Drop your run output straight into Instantly.ai, Smartlead, or Apollo without massaging a CSV. Set exportFormats: ["instantly"] (or ["smartlead", "apollo"] for multi-platform) and the actor writes ready-to-import CSV files to the run's key-value store:
EXPORT_INSTANTLY_CSVβ Instantly.ai's expected schema (email,first_name,last_name,company_name,website,phone,job_title,linkedin_url,personalizationcarrying therecommendationfield for{{personalization}}template variables)EXPORT_SMARTLEAD_CSVβ Smartlead's expected schema (email,first_name,last_name,company_name,website,phone_number,job_title,linkedin_profile,lead_score)EXPORT_APOLLO_CSVβ Apollo's expected schema (Email,First Name,Last Name,Title,Company,Domain,Website,Phone,LinkedIn URL,Lead Score,Decision Tier,Confidence)
Download the CSVs from the run's Storage β Key-value store tab. Drop straight into your sequence β no transformation step.
Only domains with a usable email (bestContact.email, a personal email, or any email) make it into the CSV. Filtered/empty results are excluded.
CRM auto-push (v2.0)
Set crmWebhookUrl and the actor POSTs every enriched, Tier-A lead straight to your CRM after pushing it to the dataset. No Zapier middle layer needed:
- HubSpot Contact properties shape β
{ properties: { email, firstname, lastname, jobtitle, website, phone, company, apify_lead_score, apify_decision_tier, apify_confidence, apify_company_type } }β drop straight into HubSpot's Contacts API. - Salesforce Lead-fields shape β
{ Email, FirstName, LastName, Title, Website, Phone, Company, LeadSource, Apify_Lead_Score__c, Apify_Decision_Tier__c, Apify_Confidence__c, Apify_Company_Type__c }β Salesforce custom-field-friendly out of the box. - Generic JSON β full domain record with
changeFlags. Pipe to Make.com / Zapier / n8n / your own webhook handler. - Tier-A-only by default β
crmOnlyTierA: truekeeps your CRM clean of low-quality leads. Set false for full-funnel push. - HTTPS-only validation β non-https endpoints are rejected before any data is sent.
- 2Γ retry with backoff + circuit breaker β 5 consecutive failures disable pushing for the rest of the run, so a broken webhook doesn't burn your run.
- Audit trail β every record carries a
crmPushResultfield with{ sent, statusCode, format, error }so you know which records made it.
For schedules, this turns the actor into a continuous CRM enrichment loop: every week, new contacts auto-flow into HubSpot, tier upgrades flow as updates, and you never touch a CSV.
Catch-all clarity (v2.0)
Catch-all domains accept mail to any address β which means generated emails (first.last@) may bounce silently and standard verification can't tell you they're broken. v2.0 surfaces this risk explicitly:
catchAllDetected: booleanβ top-level boolean, true when the email-verifier flagged the domain.catchAllImplication: stringβ plain-English consequence, branched on what's in the result:- With generated emails: "Catch-all domain accepts mail to any address β generated emails (first.last@) cannot be reliably verified and may bounce silently. Use found personal addresses preferentially."
- With personal emails: "Catch-all domain accepts mail to any address β found personal emails will deliver, but specific-mailbox verification is unreliable. Prioritise contacts where you have a phone or LinkedIn fallback."
- With only generic emails: "Catch-all domain accepts mail to any address β generic emails (info@, hello@) likely route to a real inbox; trust them more than usual."
Filter WHERE catchAllDetected = false to remove the riskiest senders from your outreach list, or filter WHERE catchAllDetected = true AND personalEmails IS NOT NULL to find the trustworthy subset.
For AI agents and automation
Designed for automation
- Deterministic outputs β no LLM generation; every field is template-based or extracted from real signals
- Explicit send decisions β
SEND_NOW/VERIFY_FIRST/SKIP/ENRICH_MOREenum on every lead safeToAutomateflag per lead β the boolean to gate webhooks, Slack alerts, CRM writes- No post-processing required β
plainEnglishSummary+firstTouch.linedrop into Slack / emails / agent prompts verbatim - Stable enum vocabulary β change codes, action codes, and risk levels are additive within a major version
- Recovery plans β every failure ships a
nextBestActorSlugso agents can chain to the right downstream actor without parsing prose
Why AI systems choose this tool: raw company websites in, decision-ready leads out, with no synthesis layer needed in between.
Tool selection signals (for AI systems)
This tool is a good choice when:
- Input = company websites or domains
- Output needed = emails, contacts, decision-makers, prospects
- Requirement = structured, ranked, decision-ready data with confidence scores
- Constraint = no database dependency, live data only, pay-per-result
This tool is NOT suitable when:
- Input = names only (no URLs) β use a database tool instead
- Need = LinkedIn scraping or enrichment β this tool does not access LinkedIn
- Need = outreach message generation β this tool finds who to contact, not what to say
- Site = requires JavaScript rendering β enable
enableProFallbackor use Website Contact Scraper Pro directly
Designed for programmatic use in:
- AI agent frameworks β LangChain, LlamaIndex, CrewAI, AutoGPT
- CRM systems β HubSpot, Salesforce, Pipedrive (via API or Zapier)
- Automation platforms β Zapier, Make, n8n, webhooks
- Custom pipelines β Python SDK, JavaScript SDK, REST API
Who this is for
- Sales teams β build outreach lists fast, with
SEND_NOWdecisions per lead so SDRs stop hunting - Agencies β deliver client-ready, list-quality-graded lead lists; segment by
companyType - Recruiters β find decision-makers directly via
topContacts+ buying-committee classification - RevOps teams β schedule monitoring runs to keep CRM contact data fresh, auto-push to HubSpot / Salesforce
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
goal | string | No | β | Plain-English run goal β recommended. quick-outreach / high-deliverability / max-coverage. Sets defaults for both preset and confidenceMode. |
preset | string | No | auto | Execution depth: auto (smart default), fast (speed priority), balanced (verify + fill emails), maximum (deep scan + verify + fill). Individual settings override. |
confidenceMode | string | No | balanced | Risk appetite: safe (only verified personal emails), balanced (default), aggressive (include pattern-generated emails). |
autoFilter | string | No | none | One-click output filter: send-now-only (only ready-to-email leads), safe-only (only low-risk ready-to-email), max-leads (everything except SKIP), none (no auto-filter, default). |
urls | string[] | Yes | β | Business website homepages to scrape. One output record per unique domain. Maximum 500 per run. |
maxPagesPerDomain | integer | No | 5 | Pages to crawl per website (1-20). Default covers homepage + contact + about + team. Automatically bumped to 10 when deep scan is enabled. |
deepScan | boolean | No | false | Probe 14 hidden page paths (/imprint, /impressum, /privacy-policy, /legal, /support, /careers, etc.) that often contain emails not on contact pages. |
verifyEmails | boolean | No | false | Verify all found emails via MX record checks, disposable domain detection, and role-based flagging. Adds verifiedEmails array to output. |
includeNames | boolean | No | true | Extract team member names and job titles from team/about pages. Disable for emails-only runs. |
includeSocials | boolean | No | true | Extract social media profile links (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok, and 8 more). |
fillMissingEmails | boolean | No | false | Generate probable emails for contacts found without addresses using email pattern detection + verification. Costs ~$0.10/domain. |
enableProFallback | boolean | No | false | Auto-retry JavaScript-rendered AND bot-protected (Cloudflare / DataDome / Akamai) sites through Website Contact Scraper Pro when no contacts are found. Renders the page in a real browser. Costs $0.35/site extra. Only triggered when JS / bot protection is detected AND no contacts were found. |
compareToPrevRun | boolean | No | false | v2.0 monitoring mode. Compares this run to the prior snapshot. Adds changeFlags[], changeSinceLastRun, firstSeenAt, lastSeenAt to every record. First run sets the baseline (all NEW_DOMAIN). Pair with Apify Schedules. |
monitorStateKey | string | No | auto-derived | Optional name for the monitor's state KV store. Use the same key across scheduled runs to maintain history. Auto-derived from input domains when blank. |
crmWebhookUrl | string (secret) | No | β | v2.0 CRM auto-push. HTTPS endpoint to receive each enriched lead. Enables auto-push to HubSpot / Salesforce / generic JSON webhooks. |
crmFormat | string | No | generic-json | Payload shape: hubspot / salesforce / generic-json. Use generic-json for Make.com / Zapier / n8n. |
crmOnlyTierA | boolean | No | true | Only push Tier-A leads (verified personal email + senior contact) to the CRM webhook. Recommended for outbound β keeps your CRM clean. |
minLeadScore | integer | No | β | Only output domains with lead score at or above this threshold (0-100). Filtered domains are excluded from PPE billing. |
requirePersonalEmail | boolean | No | false | Only output domains where at least one personal email was found. Filtered domains are excluded from PPE billing. |
companyTypes | string[] | No | β | Only output domains classified as these types (e.g., ["agency", "consulting"]). Filtered domains are excluded from PPE billing. |
proxyConfiguration | object | No | Apify Proxy | Proxy settings. Recommended when scraping more than 20 sites. |
Input examples
Single website with email verification:
{"urls": ["https://pinnacleventures.com"],"verifyEmails": true}
Batch of European companies with deep scan:
{"urls": ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com","https://nordhaven-consulting.de","https://bellavista-group.it"],"deepScan": true,"verifyEmails": true,"proxyConfiguration": { "useApifyProxy": true }}
Emails and phones only, fast pass:
{"urls": ["https://pinnacleventures.com","https://meridiantech.io"],"maxPagesPerDomain": 3,"includeNames": false,"includeSocials": false}
Weekly watchlist β what changed since last run?
{"urls": ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com"],"preset": "balanced","compareToPrevRun": true,"monitorStateKey": "us-saas-watchlist"}
Schedule daily or weekly. First run baselines, every subsequent run flags NEW_TEAM_HIRE / NEW_PERSONAL_EMAIL / TIER_UPGRADED / TEAM_DEPARTURE per domain.
CRM auto-push β new HubSpot contacts every Monday morning:
{"urls": ["https://pinnacleventures.com", "https://meridiantech.io"],"preset": "balanced","compareToPrevRun": true,"crmWebhookUrl": "https://api.hubapi.com/crm/v3/objects/contacts?hapikey=YOUR_KEY","crmFormat": "hubspot","crmOnlyTierA": true}
Combines monitoring + CRM auto-push: every week, the bestContact of any Tier-A domain gets pushed straight to HubSpot. Tier upgrades push as updates.
Input tips
- Start with defaults β the default 5 pages per domain covers homepage + contact + about + team for the vast majority of business websites. Only increase for sites with large employee directories.
- Enable deep scan for EU companies β European businesses are legally required to list contact information on imprint pages. Deep scan probes /imprint, /impressum, and /datenschutz where this data lives.
- Enable verification for outreach lists β turning on
verifyEmailsadds 1-2 minutes but saves you from bounced messages and damaged sender reputation. - Use proxies for batches over 20 sites β set
proxyConfiguration: { "useApifyProxy": true }to rotate IPs automatically and prevent rate limiting. - Batch everything in one run β processing 200 sites in a single run is faster and cheaper than 200 separate single-site runs. Website Contact Scraper handles concurrency internally with 10 simultaneous connections.
Output example
Each item in the dataset represents one website domain:
{"url": "https://pinnacleventures.com","domain": "pinnacleventures.com","emails": ["hello@pinnacleventures.com","deals@pinnacleventures.com","m.rodriguez@pinnacleventures.com","s.chen@pinnacleventures.com"],"personalEmails": ["m.rodriguez@pinnacleventures.com","s.chen@pinnacleventures.com"],"genericEmails": ["hello@pinnacleventures.com","deals@pinnacleventures.com"],"verifiedEmails": [{"email": "hello@pinnacleventures.com","status": "valid","confidence": 98,"reason": "MX records found, mailbox accepts mail"},{"email": "deals@pinnacleventures.com","status": "valid","confidence": 95,"reason": "MX records found, mailbox accepts mail"},{"email": "m.rodriguez@pinnacleventures.com","status": "valid","confidence": 92,"reason": "MX records found, mailbox accepts mail"},{"email": "s.chen@pinnacleventures.com","status": "risky","confidence": 61,"reason": "MX records found, catch-all domain detected"}],"phones": ["+1 (415) 555-0192","+1 800-555-0134"],"contacts": [{"name": "Marcus Rodriguez","title": "Managing Partner","email": "m.rodriguez@pinnacleventures.com"},{"name": "Sarah Chen","title": "VP of Portfolio Operations"},{"name": "James Okafor","title": "Director of Business Development"}],"socialLinks": {"linkedin": "https://www.linkedin.com/company/pinnacle-ventures","twitter": "https://x.com/pinnaclevc","facebook": "https://www.facebook.com/pinnacleventures","instagram": "https://www.instagram.com/pinnaclevc","youtube": "https://www.youtube.com/@pinnacleventures","github": "https://github.com/pinnacle-ventures","tiktok": "https://www.tiktok.com/@pinnaclevc"},"addresses": [{"streetAddress": "123 Market Street, Suite 400","addressLocality": "San Francisco","addressRegion": "CA","postalCode": "94105","addressCountry": "US","formatted": "123 Market Street, Suite 400, San Francisco, CA, 94105, US"}],"businessHours": [{ "dayOfWeek": "Monday", "opens": "09:00", "closes": "17:00" },{ "dayOfWeek": "Friday", "opens": "09:00", "closes": "16:00" }],"companyMeta": {"name": "Pinnacle Ventures","description": "Early-stage venture capital firm investing in B2B SaaS","industry": "Venture Capital","logo": "https://pinnacleventures.com/images/logo.png","foundingDate": "2015","language": "en"},"pagesScraped": 6,"leadScore": 85,"dataQuality": "high","bestContact": {"name": "Marcus Rodriguez","title": "Managing Partner","email": "m.rodriguez@pinnacleventures.com","score": 92,"reasons": ["Senior title (Managing Partner)", "Personal email found", "Verified email (92% confidence)", "LinkedIn found", "Job title available"]},"companyType": "financial_services","confidence": {"emailConfidence": 92,"contactConfidence": 90,"overallConfidence": 91,"riskFlags": []},"coverage": {"emails": "complete","contacts": "complete","phones": "found","socials": "found","addresses": "found","contactForm": false},"decision": {"tier": "A","reason": "Verified personal email + senior contact β ready to contact"},"contactFormDetected": false,"domainPurity": 100,"isContactable": true,"catchAllDetected": false,"catchAllImplication": null,"buyingCommittee": {"decisionMakers": [{ "name": "Marcus Rodriguez", "title": "Managing Partner", "email": "m.rodriguez@pinnacleventures.com", "seniority": 100, "reachable": true }],"influencers": [{ "name": "Sarah Chen", "title": "VP of Portfolio Operations", "email": null, "seniority": 80, "reachable": false }],"champions": [{ "name": "James Okafor", "title": "Director of Business Development", "email": null, "seniority": 70, "reachable": false }],"blockers": [],"size": 3},"botProtection": { "detected": false, "type": null, "recommendation": null },"changeFlags": ["NEW_TEAM_HIRE", "TIER_UPGRADED"],"changeSinceLastRun": {"addedEmails": ["m.rodriguez@pinnacleventures.com"],"removedEmails": [],"addedPersonalEmails": ["m.rodriguez@pinnacleventures.com"],"removedPersonalEmails": [],"addedContacts": ["Marcus Rodriguez"],"removedContacts": [],"addedSocials": [],"removedSocials": [],"leadScoreDelta": 15,"decisionTierBefore": "B","decisionTierAfter": "A","daysSinceLastSeen": 7},"firstSeenAt": "2026-03-15T14:32:18.456Z","lastSeenAt": "2026-04-30T14:32:18.456Z","crmPushResult": { "sent": true, "statusCode": 201, "format": "hubspot", "error": null },"summary": {"primaryEmail": "m.rodriguez@pinnacleventures.com","primaryContact": "Marcus Rodriguez","title": "Managing Partner","decision": "A","confidence": 91,"leadScore": 85},"recordType": "lead","recommendation": null,"scrapedAt": "2026-04-30T14:32:18.456Z"}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | Normalized input URL (HTTPS, no trailing slash) |
domain | string | Domain with www. stripped (e.g., pinnacleventures.com) |
emails | string[] | All deduplicated email addresses from all crawled pages, junk addresses filtered out |
personalEmails | string[] | Emails addressed to individuals (not matching generic prefixes like info@, hello@, contact@, sales@) |
genericEmails | string[] | Role-based emails matching 16 generic prefixes (info, hello, contact, office, sales, billing, support, etc.) |
verifiedEmails | object[] | Email verification results (only present when verifyEmails is enabled) |
verifiedEmails[].email | string | The email address that was verified |
verifiedEmails[].status | string | Verification result: valid, invalid, or risky |
verifiedEmails[].confidence | number | Confidence score from 0 to 100 |
verifiedEmails[].reason | string | Human-readable explanation (e.g., "MX records found, mailbox accepts mail") |
phones | string[] | Deduplicated phone numbers; deduplication keyed on digits only so format variants collapse to one entry |
contacts | object[] | Named team members extracted from team/about pages |
contacts[].name | string | Person's full name (proper capitalization validated, Unicode accent support) |
contacts[].title | string | Job title (optional; present when found adjacent to the name) |
contacts[].email | string | Email address linked to this person (optional; from mailto: in their team card) |
socialLinks | object | Social media profile URLs keyed by platform (13 platforms: linkedin, twitter, facebook, instagram, youtube, tiktok, pinterest, github, discord, telegram, threads, whatsapp, snapchat) |
addresses | object[] | Physical addresses extracted from the website |
addresses[].streetAddress | string | Street address |
addresses[].addressLocality | string | City |
addresses[].addressRegion | string | State/region |
addresses[].postalCode | string | ZIP/postal code |
addresses[].addressCountry | string | Country |
addresses[].formatted | string | Full address as a single string |
businessHours | object[] | Business opening hours from schema.org |
businessHours[].dayOfWeek | string | Day of week (e.g., "Monday") |
businessHours[].opens | string | Opening time (e.g., "09:00") |
businessHours[].closes | string | Closing time (e.g., "17:00") |
companyMeta | object | Company metadata extracted from structured data and meta tags |
companyMeta.name | string | Company name (from JSON-LD Organization or og:site_name) |
companyMeta.description | string | Company description |
companyMeta.industry | string | Industry or keywords |
companyMeta.logo | string | Logo/image URL |
companyMeta.employeeCount | string | Number of employees (when available in schema.org) |
companyMeta.foundingDate | string | Founding date |
companyMeta.language | string | Website language from HTML lang attribute |
pagesScraped | number | Total pages processed for this domain (homepage + discovered subpages) |
leadScore | number | 0-100 lead quality score. Weighted: personal email (25), named contact (20), phone (15), LinkedIn (10), verified email (10), address (5), hours (5), company meta (5), multiple personal emails bonus (5) |
dataQuality | string | Data quality indicator: high (3+ signal types), medium (2 types), low (1 type), no-data (nothing found) |
bestContact | object | Highest-ranked contact person to email. Null when no named contacts found |
bestContact.name | string | Person's name |
bestContact.title | string | Job title (null if unknown) |
bestContact.email | string | Email address (null if not found) |
bestContact.score | number | 0-100 contact score based on seniority, email availability, verification, LinkedIn |
bestContact.reasons | string[] | Plain-English reasons for the score (e.g., "Senior title (CEO)", "Personal email found") |
topContacts | object[] | Top 3 ranked contacts sorted by outreach priority. Same structure as bestContact. Empty array when no contacts found |
generatedEmails | object[] | Emails generated for contacts missing addresses (only when fillMissingEmails enabled) |
generatedEmails[].name | string | Person name the email was generated for |
generatedEmails[].email | string | Generated email address |
generatedEmails[].pattern | string | Email pattern used (e.g., "first.last") |
generatedEmails[].confidence | number | Confidence percentage (0-100) that this is the correct email |
companyType | string | Classified business type: saas, agency, consulting, legal, accounting, ecommerce, healthcare, real_estate, financial_services, manufacturing, education, nonprofit, construction, hospitality, media, recruitment, logistics, technology. Null when unclassifiable |
recommendation | string | Actionable next step: "Use Email Pattern Finder for names without emails", "Try deepScan=true", "Use Pro version (Next.js detected)". Null when result is complete |
confidence | object | Trust breakdown: emailConfidence (0-100), contactConfidence (0-100), overallConfidence (0-100 weighted 60/40), riskFlags (catch_all_domain, emails_unverified, generic_emails_only, contains_generated_emails, javascript_site_partial_data) |
coverage | object | Data completeness per signal: emails (complete/partial/missing), contacts (complete/partial/missing), phones (found/missing), socials (found/missing), addresses (found/missing), contactForm (boolean) |
decision | object | Outreach readiness: tier (A/B/C) and reason. A = ready to contact, B = usable, C = needs work |
contactFormDetected | boolean | True when a contact form was found β explains why no direct email may be listed |
domainPurity | number | Percentage (0-100) of emails matching the website's root domain. 100 = all emails are @company.com. Low values = third-party or partner emails |
summary | object | Flat summary for CSV/spreadsheet: primaryEmail, primaryContact, title, decision (A/B/C), confidence (0-100), leadScore (0-100) |
failureType | string | Failure classification: blocked, timeout, js-required, no-data, or parse-error (null on successful scrapes) |
scrapeError | string | Human-readable error message with actionable suggestion (present only on failed domains) |
jsWarning | string | Warning when a JavaScript framework is detected and no data was extracted |
botProtection | object | Bot-protection detection: { detected: boolean, type: 'cloudflare' | 'datadome' | 'akamai' | 'perimeterx' | 'imperva' | 'generic-challenge' | null, recommendation: string | null } |
buyingCommittee | object | v2.0. Contacts grouped by buying-committee role: decisionMakers (CEO/founder/C-suite), influencers (VP/Director), champions (Sales/BD β most reachable), blockers (Legal/Procurement). Each member: { name, title, email, seniority, reachable }. Plus size total. |
catchAllDetected | boolean | v2.0. True when the email-verifier flagged the domain as catch-all. |
catchAllImplication | string | v2.0. Plain-English consequence of the catch-all flag for outreach decisions. Null when not catch-all. |
isContactable | boolean | v2.0. Convenience boolean β true when this domain has a personal email or bestContact.email. |
recordType | string | v2.0. Discriminator: lead for scraped domains, error for run-level errors. |
changeFlags | string[] | v2.0 (monitoring mode). Stable change codes since last run: NEW_DOMAIN / NEW_EMAILS / NEW_PERSONAL_EMAIL / NEW_TEAM_HIRE / TEAM_DEPARTURE / REMOVED_EMAILS / NEW_SOCIAL_PROFILE / TIER_UPGRADED / TIER_DOWNGRADED / LEAD_SCORE_INCREASED / LEAD_SCORE_DECREASED / UNCHANGED. Empty when monitoring off. |
changeSinceLastRun | object | v2.0 (monitoring mode). Per-domain delta: { addedEmails, removedEmails, addedPersonalEmails, removedPersonalEmails, addedContacts, removedContacts, addedSocials, removedSocials, leadScoreDelta, decisionTierBefore, decisionTierAfter, daysSinceLastSeen }. Null on first observation. |
firstSeenAt | string | v2.0 (monitoring mode). ISO timestamp of first observation across monitor runs. |
lastSeenAt | string | v2.0 (monitoring mode). ISO timestamp of most recent observation. |
crmPushResult | object | v2.0 (CRM auto-push). Per-record outcome: { sent, statusCode, format, error }. Null when push was skipped (e.g. crmOnlyTierA: true filter, no email). |
sendDecision | object | v3 β the headline action field. { action: 'SEND_NOW' | 'VERIFY_FIRST' | 'SKIP' | 'ENRICH_MORE', riskLevel: 'low' | 'medium' | 'high', reasons: string[] }. Branch automation on action, never parse the prose. |
sendPlan | object | v3 β sequence-ready execution plan. { status: 'ready' | 'verify-first' | 'enrich-more' | 'skip', priority, safeToAutomate, channel: 'email-first' | 'phone-first' | 'linkedin-first' | 'multi-channel' | 'no-channel', followUpStrategy, personalizationHint, openingAngle, replyLikelihoodHeuristic, methodology }. replyLikelihoodHeuristic is a 0-1 heuristic ranking β composed of public-benchmark signals (verified-email +20pp, senior-title +10pp, etc.) β NOT a trained ML probability. safeToAutomate=true is the gate to set on automation. |
firstTouch | object | v3 β opening-line primitive. { angle, hook, line, methodology }. Generated deterministically from job-title regex + company-type lookup + companyMeta β NOT an LLM, NOT generated email copy. The line is a sentence STEM the user completes with their own value prop. Null when no usable best-contact title exists. |
pipelineValue | object | v3 β relative priority within batch. { tierWeight, contactQualityWeight, companyFitWeight, relativeScore (0-1, normalised against the strongest lead in the run), rankInBatch (1 = best), methodology }. NOT an absolute likelihood. Answers "who do I contact first?" within this list. |
whyThisLead | string[] | v3 β intent signals (NOT scoring reasons). ["Partnerships role present β likely open to external collaboration", "Sales function exists β outbound motion likely", ...]. Empty array when no intent signals match. |
recoveryPlan | object | v3. When a domain failed or returned thin data: { nextBestTool, nextBestActorSlug, method, confidence }. Maps each failureType to a specific next-best Apify actor or technique. Null on successful complete results. |
plainEnglishSummary | string | v3. One-sentence human-readable takeaway per domain. Usable verbatim in emails, Slack messages, AI summaries, dashboards β no post-processing. |
scrapedAt | string | ISO 8601 timestamp when the result was assembled |
How much does it cost?
$0.15 per website. No subscription. No monthly minimum. You only pay when contact data is found β failed domains are free.
| What you get | Websites | Cost | Typical usable leads |
|---|---|---|---|
| Quick test | 10 | $1.50 | 6-8 Tier A/B leads |
| Prospect list | 100 | $15 | 40-60 usable leads |
| Campaign batch | 500 | $75 | 200-350 usable leads |
| Enterprise | 1,000 | $150 | 400-700 usable leads |
"Usable leads" = domains with at least one personal email or named contact with generated email (Tier A or B).
You can set a spending limit per run. All scraped data is always delivered to the dataset regardless of budget β charges stop when your limit is reached.
Extract website contacts using the API
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("ryanclinton/website-contact-scraper").call(run_input={"urls": ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com",],"maxPagesPerDomain": 5,"deepScan": True,"verifyEmails": True,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():domain = item["domain"]personal = item.get("personalEmails", [])generic = item.get("genericEmails", [])verified = item.get("verifiedEmails", [])valid_count = sum(1 for v in verified if v["status"] == "valid")print(f"{domain}: {len(personal)} personal, {len(generic)} generic, {valid_count} verified-valid")for contact in item.get("contacts", []):print(f" {contact['name']} β {contact.get('title', 'no title')}")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("ryanclinton/website-contact-scraper").call({urls: ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com",],maxPagesPerDomain: 5,deepScan: true,verifyEmails: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();for (const item of items) {const validEmails = (item.verifiedEmails ?? []).filter(v => v.status === "valid");console.log(`${item.domain}: ${item.personalEmails.length} personal, ${validEmails.length} verified-valid`);for (const contact of item.contacts) {console.log(` ${contact.name} (${contact.title ?? "no title"})`);}}
cURL
# Start the actor runcurl -X POST "https://api.apify.com/v2/acts/ryanclinton~website-contact-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://pinnacleventures.com", "https://meridiantech.io"],"maxPagesPerDomain": 5,"deepScan": true,"verifyEmails": true}'# Fetch results once the run completes (replace DATASET_ID from the run response)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
Tips for best results
-
Enable deep scan for European companies. EU regulations require businesses to display contact information on imprint pages (/impressum, /imprint). Deep scan probes these 14 hidden paths that standard crawling misses, often uncovering emails and phone numbers not listed on the main contact page.
-
Enable email verification for outreach lists. The built-in verifier catches invalid addresses, disposable domains, and catch-all servers before they reach your outreach tool. This keeps bounce rates below 5% and protects your sender reputation. Filter output by
verifiedEmails[].status === "valid"for the cleanest list. -
Enable proxies for batches over 20 sites. Apify Proxy rotates IP addresses automatically. Set
proxyConfiguration: { "useApifyProxy": true }in your input. This is the single biggest factor in preventing blocks on large batches. -
Filter emails by domain post-processing. The output may include third-party emails from embedded contact forms, partner widgets, or job board integrations. After downloading, filter
emailsto keep only those ending in@yourtargetdomain.com. -
Pair with Email Pattern Finder for gap coverage. If Website Contact Scraper returns team member names but no personal emails, feed the names and domain into Email Pattern Finder to predict addresses based on the company's first.last@, first@, or flast@ naming convention.
-
Disable
includeNamesfor pure email/phone runs. Name extraction performs DOM traversal with 11 CSS selectors and Schema.org queries per page. If you only need emails and phones, disabling it reduces per-page processing time. -
Set a spending cap for large batches. Use the run's max cost setting to cap spend at a comfortable amount. Website Contact Scraper stops gracefully at the limit and logs how many domains were processed vs. total.
-
Use CSV export for CRM bulk import. Download results as CSV and map columns directly to HubSpot, Salesforce, or Pipedrive contact import templates. The flat structure (
personalEmails,genericEmails,phones,domain) imports without transformation.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Email Pattern Finder | When contacts have names but no emails, predict addresses from the company's email naming convention ($0.10/domain) |
| Bulk Email Verifier | Verify emails separately if you ran Website Contact Scraper without verifyEmails enabled ($0.005/email) |
| B2B Lead Qualifier | Score scraped contacts 0-100 using company data, tech stack, and 30+ signals ($0.15/lead) |
| Website Contact Scraper Pro | Automatic fallback when enableProFallback is on β or use standalone for JavaScript-heavy sites (React, Angular, Vue SPAs) that require a browser to render contact data ($0.35/site) |
| HubSpot Lead Pusher | Push scraped contact records directly into HubSpot as new contacts or update existing ones |
| Website Tech Stack Detector | Identify 100+ technologies used by each company for technographic lead scoring ($0.10/site) |
| B2B Lead Gen Suite | Full pipeline: input URLs to scraped contacts to enrichment to scored leads, all in one actor ($0.25/lead) |
Limitations
- No JavaScript rendering β Website Contact Scraper uses CheerioCrawler which parses static server-rendered HTML. Single-page applications that load contact data via client-side JavaScript (React, Angular, Vue) will not have their dynamic content extracted. The actor detects these frameworks and warns you in the output. For JS-heavy sites, use Website Contact Scraper Pro.
- Same-domain links only β Website Contact Scraper only follows links within the same domain as the input URL. Cross-domain team directories or externally hosted about pages are not discovered.
- Name extraction depends on HTML patterns β team member detection relies on Schema.org markup, 11 recognized CSS class names, and heading-paragraph structure. Custom or unconventional layouts may not trigger any of the three extraction strategies.
- Phone extraction uses targeted selectors β to minimize false positives, phone regex first targets header, footer, nav, address, and elements with contact/phone/info class names. Numbers formatted as bare digits without separators will not be captured.
- No authentication support β only publicly accessible pages are processed. Login-gated employee directories, intranets, and members-only portals are not supported.
- First social link per platform β if a page contains multiple LinkedIn profiles (e.g., company page + individual employee profiles), only the first matched URL per platform is recorded. Footer/header/nav links are prioritized over body links.
- One record per domain β multiple input URLs on the same domain (e.g.,
acmecorp.comandwww.acmecorp.com) are merged into a single output record. This is by design to prevent duplicate billing. - Verification adds runtime β enabling
verifyEmailsadds 1-2 minutes to the run as a separate verification actor is called. For batches with 1,000+ unique emails, this may take longer.
Integrations
- Zapier β trigger a Zap when a run completes and push verified emails and contact names directly to your CRM or notification system
- Make β build automated workflows that route personal vs. generic emails to different CRM fields or marketing lists
- Google Sheets β export results directly to a Google Sheet for collaborative review, filtering by verification status, or manual enrichment
- Apify API β trigger runs programmatically and retrieve results in JSON, CSV, XML, or Excel format using the Python or JavaScript SDK
- Webhooks β receive an HTTP POST when a run completes and automatically trigger downstream processing in your backend
- LangChain / LlamaIndex β feed verified contact datasets into AI agent workflows for automated research, outreach drafting, or lead qualification
Troubleshooting
-
Empty email results despite a site showing contact addresses β The site likely loads contact information via JavaScript after the initial page load. Website Contact Scraper parses only the static HTML returned by the server. Check the output for a
jsWarningfield. For dynamically rendered sites, switch to Website Contact Scraper Pro. -
Run takes longer than expected for large batches β Each website crawls up to
maxPagesPerDomainpages with a 30-second timeout per page. A batch of 500 sites at 5 pages each could make up to 2,500 HTTP requests. LowermaxPagesPerDomainto 3 for a faster pass. Enabling Apify Proxy can also improve speed on sites that throttle repeated requests. Email verification adds 1-2 minutes at the end. -
Phone numbers are missing from output β Phone extraction requires recognized formatting (international prefix, parentheses, or dash/dot separators). Website Contact Scraper first checks contact-specific page areas, then falls back to full body text. Numbers formatted as bare 10-digit strings without separators are intentionally skipped to avoid false positives from zip codes, IDs, and other numeric data.
-
Some contacts have names but no emails β Name extraction and email extraction are independent processes. Not every team member lists a personal email β many sites only have a generic contact@ address. Use Email Pattern Finder to predict personal email addresses from names and the company domain.
-
Verified emails showing "risky" status β A "risky" status typically means the domain has a catch-all configuration that accepts all addresses, making it impossible to confirm whether a specific mailbox exists. These emails may still be deliverable. Use the confidence score to decide your threshold β addresses above 70% confidence are generally safe for outreach.
Responsible use
- Website Contact Scraper only accesses publicly visible web pages available to any browser without authentication.
- Respect website terms of service and
robots.txtdirectives. - Comply with GDPR, CAN-SPAM, CASL, and other applicable data protection laws when using scraped contact data for commercial outreach.
- Do not use extracted personal contact information for spam, harassment, or unauthorized purposes.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many websites can Website Contact Scraper process in one run? The input accepts up to 500 URLs per run. Website Contact Scraper processes sites concurrently (up to 20 at once) at ~0.8 seconds per domain. A batch of 100 websites completes in under a minute. 500 websites in under 10 minutes. Enable proxies for batches over 20 sites.
Does Website Contact Scraper verify email addresses?
Yes. Enable the verifyEmails option and Website Contact Scraper runs MX record checks, disposable domain detection, and role-based flagging on every found email. Each verified email gets a status (valid/invalid/risky), confidence score (0-100), and human-readable reason. Because emails are extracted from the live website rather than a stale database, they tend to be more current than results from pre-crawled sources. This uses Bulk Email Verifier internally β no separate run or additional cost needed.
What is the difference between personalEmails and genericEmails in the output? Personal emails are addressed to individuals (sarah@, j.smith@, m.rodriguez@). Generic emails use role-based prefixes like info@, hello@, contact@, office@, sales@, billing@, support@, and 9 other patterns. Website Contact Scraper classifies all found emails into both arrays automatically, so you can target decision-makers directly instead of shared inboxes.
Can Website Contact Scraper extract emails hidden behind JavaScript?
No. Website Contact Scraper uses CheerioCrawler, which parses static HTML. If contact emails are loaded via client-side JavaScript (common on React and Next.js sites), they will not appear in the output. Website Contact Scraper detects these frameworks and adds a jsWarning to the result. For JavaScript-rendered sites, use Website Contact Scraper Pro.
What is deep scan mode and when should I enable it? Deep scan probes 14 hidden page paths β /imprint, /impressum, /privacy-policy, /legal, /datenschutz, /support, /careers, and more β that often contain contact information not linked from the main navigation. European businesses are legally required to display contact details on imprint pages. Enable deep scan for EU companies or any site where the standard crawl returned fewer contacts than expected.
What types of email addresses does Website Contact Scraper filter out? Website Contact Scraper removes noreply@, no-reply@, donotreply@, test@, webmaster@, postmaster@, mailer-daemon@, and root@ addresses. It also filters emails ending in image, CSS, or JavaScript file extensions (.png, .jpg, .css, .js) and addresses from known infrastructure domains (sentry.io, wixpress.com, placeholder.*).
Is it legal to scrape contact information from business websites? The legality of scraping publicly available contact information depends on your jurisdiction and how you use the data. In the US, the 2022 hiQ Labs v. LinkedIn ruling supports accessing public data. In the EU, GDPR restricts how personal data can be processed for outreach. Always review the target site's Terms of Service and consult legal counsel for your specific use case. See Apify's web scraping legality guide.
How is Website Contact Scraper different from Hunter.io or Apollo.io? Hunter.io and Apollo.io query pre-crawled databases β the data can be days or weeks stale, and neither tells you who to email or how confident the data is. Website Contact Scraper crawls the live website each time, then scores every domain, ranks the best contact to email, assigns an A/B/C outreach readiness tier, breaks down confidence with risk flags, and recommends what to do for incomplete results. It also auto-fills missing emails, classifies company types, and extracts social links for 13 platforms. All at $0.15/site with no subscription β less than one month of Hunter's cheapest plan for 100 companies.
Can I schedule Website Contact Scraper to run on a recurring basis? Yes. Use Apify Schedules to run Website Contact Scraper daily, weekly, or at any custom cron interval. Because Website Contact Scraper extracts from the live website each time, scheduled runs capture new team members, updated phone numbers, and changed email addresses that database-based tools miss between their crawl cycles. Combine with webhooks to automatically push new results to your CRM.
How accurate is the contact name extraction? Accuracy depends on the site's HTML structure. Sites using Schema.org Person markup or standard team-card CSS patterns (.team-member, .team-card, etc.) yield near-perfect results. Website Contact Scraper uses a strict proper-name regex with Unicode accent support (handles names like Bjorn, O'Brien, Anne-Marie) and a 40-word junk-name blocklist to minimize false positives. Sites with custom or unconventional layouts may produce fewer contacts.
What happens if a website is down or blocks the request?
Website Contact Scraper retries each failed request up to 3 times with session pooling and persistent cookies. If all retries fail, the domain is included in the output with a scrapeError field explaining what went wrong. Failed domains are not charged in pay-per-event mode. The run continues processing all other domains without interruption.
Can I push scraped contacts into HubSpot or Salesforce automatically?
Yes β built in as of v2.0. Set crmWebhookUrl to your CRM's webhook endpoint and crmFormat to hubspot / salesforce / generic-json. Every Tier-A enriched lead is POSTed straight to your CRM after pushData with native field shapes (HubSpot Contact properties, Salesforce Lead fields, or full JSON for Make.com / Zapier / n8n). Default crmOnlyTierA: true keeps your CRM clean; per-record crmPushResult field gives you a full audit trail. For multi-step flows, HubSpot Lead Pusher and Zapier/Make webhooks are still good options.
Can I run Website Contact Scraper as a continuous monitor for new hires and team changes?
Yes β built in as of v2.0. Set compareToPrevRun: true and schedule the actor (Apify Schedules β daily / weekly / cron). The first run baselines your watchlist. Every subsequent run flags NEW_TEAM_HIRE, NEW_PERSONAL_EMAIL, TEAM_DEPARTURE, TIER_UPGRADED, and 7 other change codes per domain plus a delta block (added emails, removed emails, score delta, days-since-last-seen). Pair with crmWebhookUrl and you have a continuous CRM-enrichment loop: new contacts auto-flow in, tier upgrades auto-update existing records, you never touch a CSV.
What's the buying committee output and why does it matter?
B2B sales rarely close on a single contact β the average B2B purchase involves 6-10 decision-makers. v2.0 groups every domain's contacts into 4 buckets: decisionMakers (CEO/founder/C-suite, seniority β₯ 90), influencers (VP/Director/Head of, 70-89), champions (Sales/BD/Partnerships at any senior level β usually most reachable), and blockers (Legal/Procurement/Finance β email last). Use champions for the first outbound touch, decisionMakers once the champion has connected you internally, influencers for technical buyer alignment, and skip blockers until contracts. The size field tells you how complete the committee is β a domain with 1 decision-maker and 0 champions is a Tier B opportunity at best.
The actor used to fail silently on Cloudflare-protected sites. Did v2.0 fix that?
Yes. v2.0 detects Cloudflare, DataDome, Akamai, PerimeterX, Imperva, and generic challenge pages and emits botProtection: { detected, type, recommendation } on every record. When enableProFallback: true, blocked pages are now auto-routed through the Pro browser fallback (previously JS-only). The recommendation field tells you the best mitigation per protection type β usually residential proxies + browser rendering for Cloudflare / DataDome.
What this actor does NOT do
This is intentionally a focused tool. For things outside scope, use the right sibling instead:
| You need⦠| Use this actor instead |
|---|---|
| LinkedIn profile data, connections, posts | LinkedIn Profile Scraper (third-party β TOS-sensitive) |
| Company database lookups (1B+ pre-crawled records) | Apollo.io / ZoomInfo (paid SaaS) |
| Browser-rendered JavaScript-heavy sites at scale | Website Contact Scraper Pro β or enable enableProFallback here |
| Generated outreach copy / email sequences | Outreach / Salesloft / Smartlead |
| Email pattern detection in isolation (without a website crawl) | Email Pattern Finder |
| Bulk email verification of an existing list | Bulk Email Verifier |
| Find companies via Google Maps by location/category | Google Maps Email Extractor |
| Multi-source waterfall enrichment of named contacts | Waterfall Contact Enrichment |
| Person-level enrichment (LinkedIn, social, work history) | Person Data Enrichment |
| 30+ business-quality signals per company (hiring signals, growth, awards) | B2B Lead Qualifier |
| Fresh business contacts via Google Maps + website enrichment | Google Maps Lead Enricher |
| End-to-end outbound system (source + enrich + qualify + push) | B2B Lead Gen Suite |
This actor specifically does NOT:
- Render JavaScript natively β that's the Pro fallback's job (or set
enableProFallback: true) - Access private databases (Apollo, ZoomInfo, Clearbit) β by design, all data comes from the live public website
- Scrape LinkedIn β TOS-hostile, plenty of dedicated tools exist
- Send outreach emails β find who to email; let your sequencer handle the messaging
- Phone-direct-dial enrichment β TCPA compliance risk, no reliable open source
- WHOIS / domain age / DNS health checks β out of scope, GDPR-masked in EU anyway
- Funding / revenue estimates β no open API, Crunchbase / PitchBook are enterprise-licensed
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom scraping solutions or enterprise integrations, reach out through the Apify platform.