Website Contact Scraper
Pricing
from $150.00 / 1,000 website-scanneds
Website Contact Scraper
Extract emails, phone numbers, team member names, job titles, and social media links from business websites. Crawls contact, about, and team pages automatically. Batch process hundreds of URLs. $0.15/site.
Pricing
from $150.00 / 1,000 website-scanneds
Rating
2.9
(2)
Developer
ryan clinton
Actor stats
1
Bookmarked
247
Total users
126
Monthly active users
10 hours ago
Last modified
Categories
Share
Website Contact Scraper extracts and verifies emails, phone numbers, team member names, job titles, and social media links from any business website. Give it a list of URLs and it returns one structured, deduplicated contact record per domain β with verified emails, personal vs. generic classification, and named contacts with titles. No other contact scraper on Apify includes built-in email verification.
The actor crawls each site's homepage, discovers contact, about, team, and leadership pages automatically, and optionally probes hidden pages like /imprint, /impressum, and /privacy-policy that often contain emails not listed anywhere else. All emails are classified as personal (sarah@company.com) or generic (info@company.com), and optional MX-level verification flags invalid and risky addresses before they reach your CRM. Batch processing, automatic deduplication, structured JSON/CSV output β no code required.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| π§ Personal emails | mailto links, body text, anchor hrefs | sarah.chen@pinnacleventures.com |
| π§ Generic emails | Same sources, classified by prefix | hello@pinnacleventures.com |
| π§ Verified emails | MX record + disposable/role checks | valid (95% confidence) |
| π Phone numbers | tel: links, formatted text, body fallback | +1 (415) 555-0192 |
| π€ Contact names | Schema.org Person, team cards, heading pairs | Marcus Rodriguez |
| πΌ Job titles | itemprop="jobTitle", adjacent text, CSS selectors | VP of Business Development |
| π LinkedIn | Company pages and personal profiles | linkedin.com/company/pinnacle-ventures |
| π¦ Twitter / X | twitter.com and x.com links | x.com/pinnaclevc |
| π Facebook | Page links from footer/header/nav | facebook.com/pinnacleventures |
| πΈ Instagram | Profile links | instagram.com/pinnaclevc |
| βΆοΈ YouTube | Channel, user, and @ links | youtube.com/@pinnacleventures |
| π Domain | Parsed and normalized from input URL | pinnacleventures.com |
| π Pages scraped | Per-domain crawl count | 6 |
| β οΈ Scrape error | Failure reason when domain returns no data | All retry attempts failed |
| π Timestamp | ISO 8601 run completion time | 2026-03-25T14:32:18.456Z |
Why use Website Contact Scraper?
Building prospect lists from company websites by hand means opening each site, hunting for a contact page, scanning footers for emails, checking about pages for team names, copying everything into a spreadsheet β then repeating that for 200 more companies. A thorough researcher might process 15 sites per hour. At that rate, 500 websites takes two full working days, and the data is already stale before you finish. And you still have no idea which emails are valid.
This actor automates the entire process β scraping, classification, and verification in one run. Paste a list of URLs, press Start, and return to a structured dataset with personal and generic emails separated, verification status attached, phone numbers extracted, team members named with titles, and social profiles linked for every domain. A batch of 500 websites typically completes in under 45 minutes for roughly $75 β less than two hours of minimum wage labor, with verified data you can trust.
Built on Apify, the actor gives you production capabilities beyond a one-off script:
- Scheduling β run daily or weekly to keep contact databases fresh without manual effort
- API access β trigger runs from Python, JavaScript, or any HTTP client and pipe results directly into your stack
- Proxy rotation β scrape large batches without IP blocks using Apify's built-in residential and datacenter proxy network
- Monitoring β receive Slack or email alerts when runs fail or return unexpected result counts
- Integrations β connect directly to Zapier, Make, Google Sheets, HubSpot, or webhooks with no extra code
Features
- Built-in email verification β optionally verify all found emails using MX record checks, disposable domain detection, and role-based address flagging. Returns a
verifiedEmailsarray with status (valid/invalid/risky), confidence score (0-100), and human-readable reason. Uses Bulk Email Verifier internally β no separate run needed - Deep scan mode β probes 14 hidden page paths including /imprint, /impressum, /privacy-policy, /legal, /legal-notice, /datenschutz, /disclaimer, /terms, /support, /help, /faq, /jobs, and /careers. European business sites are legally required to display contact information on imprint pages, making this a high-value source for EU company data
- Email classification β automatically separates
personalEmails(sarah@company.com, j.smith@company.com) fromgenericEmails(info@, hello@, contact@, office@, sales@, billing@, support@, and 8 more role-based prefixes) - Three-source email extraction from mailto: link hrefs, full body text (with script, style, and noscript nodes stripped to avoid tracking pixel leakage), and all anchor href attributes β catches emails placed anywhere on the page
- 13-pattern junk email filter that removes noreply, no-reply, donotreply, test, admin, postmaster, mailer-daemon, webmaster, and root addresses, plus emails ending in image/CSS/JS file extensions and addresses from placeholder domains (sentry.io, wixpress.io, example.com)
- Three-strategy contact name detection: (1) Schema.org Person structured data with itemprop="name" and itemprop="jobTitle", (2) 11 team-card CSS selectors (.team-member, .team-card, .staff-member, .person-card, .member-card, .leadership-card, .employee, .bio-card, .team-item, .people-card, .about-member), and (3) heading-paragraph pairs where h3/h4 matches a strict proper-name regex and the next sibling contains one of 35+ job title keywords
- International name support β the name regex handles Unicode accents, hyphens, and apostrophes (Bjorn Lindqvist, Anne-Marie Dupont, Sean O'Brien) via the \u00C0-\u024F range
- Smart phone extraction from tel: links as the primary source, supplemented by 3 regex patterns covering international (+1 prefix), parenthesized ((555) format), and separated (dash/dot) formats. Falls back to full body text when contact-area selectors find nothing β handles Tailwind and BEM sites that lack standard class names
- Phone validation rejecting all-same-digit sequences, sequential numbers (1234567), and requiring 7-15 digits with proper formatting
- Social link extraction for LinkedIn, Twitter/X, Facebook, Instagram, and YouTube with footer/header/nav priority over body links β company profiles are captured before employee profiles
- Smart result sorting β domains with the most data (emails + phones + contacts) appear first; failed domains sort to the bottom
- Failed domain transparency β the
scrapeErrorfield tells you exactly why a domain returned no data, so you can decide whether to retry with proxies or switch to Website Contact Scraper Pro - Live progress updates β real-time status messages showing which domain is being scraped and running email/phone counts
- 40+ junk-name word filter preventing page headings like "Free Plan" or "Our Services" from appearing in the contacts list
- Automatic contact-page discovery following same-domain links matching 19 contact-related path keywords
- Atomic page-slot reservation preventing concurrent handlers from exceeding the per-domain page limit even at maximum concurrency
- Deduplication across all pages β emails by exact lowercase string, phones by digit-only key, contacts by case-insensitive name, social links first-match-per-platform
- Pay-per-event pricing with a per-run spending cap β the actor stops delivering results when your budget is reached, and you are only charged when data is found
Use cases for scraping website contacts
Sales prospecting and outreach
Sales development reps building targeted prospect lists paste company websites from a CRM export or LinkedIn search into the actor. The output separates personal emails from generic ones, so reps can reach decision-makers directly instead of landing in a shared inbox. Enable email verification to filter out bounced addresses before importing into Outreach, Salesloft, or Apollo sequences.
Marketing agency lead generation
Agencies building prospect databases for clients scrape industry directories, trade association member lists, or competitor customer pages to extract contact information at scale. The structured CSV output with personal vs. generic email classification maps directly to email marketing tools and CRM import templates. Deep scan mode catches EU business contacts required on imprint pages.
Recruiting and talent sourcing
Recruiters extract team pages from target companies to identify hiring managers, department heads, and engineers along with their direct contact details and LinkedIn profiles. The contacts array with names and titles makes it easy to identify the right person to reach. Verified emails mean no bounced outreach.
Business research and market mapping
Analysts conducting competitive intelligence or market mapping run batches of hundreds of competitor or prospect websites to produce a structured dataset of who works where, what their titles are, and how to reach them. The timestamp and scrapeError fields track data freshness and collection reliability.
Data enrichment for existing CRM records
Operations and RevOps teams augment existing company records in HubSpot, Salesforce, or Pipedrive with fresh contact details, verified email addresses, social profile links, and team member data scraped directly from live company websites. The verification status lets you update only confirmed-valid addresses.
Freelancer and consultant outreach
Independent consultants identify the right decision-maker to pitch at prospective client companies by scraping about and leadership pages for names, titles, and personal email addresses β rather than guessing at generic info@ addresses that rarely convert.
How to scrape website contact information
- Provide website URLs β Enter one or more business website homepages in the input form. Use the root domain (e.g.,
https://pinnacleventures.com), not a deep URL. The actor discovers internal pages automatically. - Configure options β Keep
maxPagesPerDomainat the default of 5 for most sites. Enable Deep scan for European businesses or sites with hidden contact pages. Enable Verify emails if you plan to use the addresses for outreach. - Run the actor β Click "Start". The actor crawls each site concurrently, typically finishing 50 websites in 3-5 minutes and 500 websites in 40-60 minutes. Verification adds 1-2 minutes at the end.
- Download results β Open the Dataset tab and download your data as JSON, CSV, or Excel. Each row is one domain with its complete contact profile: classified emails, verification results, phones, team members, and social links.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string[] | Yes | β | Business website homepages to scrape. One output record per unique domain. |
maxPagesPerDomain | integer | No | 5 | Pages to crawl per website (1-20). Default covers homepage + contact + about + team. Automatically bumped to 10 when deep scan is enabled with default value. |
deepScan | boolean | No | false | Probe 14 hidden page paths (/imprint, /impressum, /privacy-policy, /legal, /support, /careers, etc.) that often contain emails not on contact pages. |
verifyEmails | boolean | No | false | Verify all found emails via MX record checks, disposable domain detection, and role-based flagging. Adds verifiedEmails array to output. |
includeNames | boolean | No | true | Extract team member names and job titles from team/about pages. Disable for emails-only runs. |
includeSocials | boolean | No | true | Extract social media profile links (LinkedIn, Twitter/X, Facebook, Instagram, YouTube). |
proxyConfiguration | object | No | Apify Proxy | Proxy settings. Recommended when scraping more than 20 sites. |
Input examples
Single website with email verification:
{"urls": ["https://pinnacleventures.com"],"verifyEmails": true}
Batch of European companies with deep scan:
{"urls": ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com","https://nordhaven-consulting.de","https://bellavista-group.it"],"deepScan": true,"verifyEmails": true,"proxyConfiguration": { "useApifyProxy": true }}
Emails and phones only, fast pass:
{"urls": ["https://pinnacleventures.com","https://meridiantech.io"],"maxPagesPerDomain": 3,"includeNames": false,"includeSocials": false}
Input tips
- Start with defaults β the default 5 pages per domain covers homepage + contact + about + team for the vast majority of business websites. Only increase for sites with large employee directories.
- Enable deep scan for EU companies β European businesses are legally required to list contact information on imprint pages. Deep scan probes /imprint, /impressum, and /datenschutz where this data lives.
- Enable verification for outreach lists β if you plan to email the contacts, turning on
verifyEmailsadds 1-2 minutes but saves you from bounced messages and damaged sender reputation. - Use proxies for batches over 20 sites β set
proxyConfiguration: { "useApifyProxy": true }to rotate IPs automatically and prevent rate limiting. - Batch everything in one run β processing 200 sites in a single run is faster and cheaper than 200 separate single-site runs. The actor handles concurrency internally with 10 simultaneous connections.
Output example
Each item in the dataset represents one website domain:
{"url": "https://pinnacleventures.com","domain": "pinnacleventures.com","emails": ["hello@pinnacleventures.com","deals@pinnacleventures.com","m.rodriguez@pinnacleventures.com","s.chen@pinnacleventures.com"],"personalEmails": ["m.rodriguez@pinnacleventures.com","s.chen@pinnacleventures.com"],"genericEmails": ["hello@pinnacleventures.com","deals@pinnacleventures.com"],"verifiedEmails": [{"email": "hello@pinnacleventures.com","status": "valid","confidence": 98,"reason": "MX records found, mailbox accepts mail"},{"email": "deals@pinnacleventures.com","status": "valid","confidence": 95,"reason": "MX records found, mailbox accepts mail"},{"email": "m.rodriguez@pinnacleventures.com","status": "valid","confidence": 92,"reason": "MX records found, mailbox accepts mail"},{"email": "s.chen@pinnacleventures.com","status": "risky","confidence": 61,"reason": "MX records found, catch-all domain detected"}],"phones": ["+1 (415) 555-0192","+1 800-555-0134"],"contacts": [{"name": "Marcus Rodriguez","title": "Managing Partner","email": "m.rodriguez@pinnacleventures.com"},{"name": "Sarah Chen","title": "VP of Portfolio Operations"},{"name": "James Okafor","title": "Director of Business Development"}],"socialLinks": {"linkedin": "https://www.linkedin.com/company/pinnacle-ventures","twitter": "https://x.com/pinnaclevc","facebook": "https://www.facebook.com/pinnacleventures","instagram": "https://www.instagram.com/pinnaclevc","youtube": "https://www.youtube.com/@pinnacleventures"},"pagesScraped": 6,"scrapedAt": "2026-03-25T14:32:18.456Z"}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | Normalized input URL (HTTPS, no trailing slash) |
domain | string | Domain with www. stripped (e.g., pinnacleventures.com) |
emails | string[] | All deduplicated email addresses from all crawled pages, junk addresses filtered out |
personalEmails | string[] | Emails addressed to individuals (not matching generic prefixes like info@, hello@, contact@, sales@) |
genericEmails | string[] | Role-based emails matching 16 generic prefixes (info, hello, contact, office, sales, billing, support, etc.) |
verifiedEmails | object[] | Email verification results (only present when verifyEmails is enabled) |
verifiedEmails[].email | string | The email address that was verified |
verifiedEmails[].status | string | Verification result: valid, invalid, or risky |
verifiedEmails[].confidence | number | Confidence score from 0 to 100 |
verifiedEmails[].reason | string | Human-readable explanation (e.g., "MX records found, mailbox accepts mail") |
phones | string[] | Deduplicated phone numbers; deduplication keyed on digits only so format variants collapse to one entry |
contacts | object[] | Named team members extracted from team/about pages |
contacts[].name | string | Person's full name (proper capitalization validated, Unicode accent support) |
contacts[].title | string | Job title (optional; present when found adjacent to the name) |
contacts[].email | string | Email address linked to this person (optional; from mailto: in their team card) |
socialLinks | object | Social media profile URLs keyed by platform |
socialLinks.linkedin | string | LinkedIn company or personal profile URL |
socialLinks.twitter | string | Twitter/X profile URL |
socialLinks.facebook | string | Facebook page URL |
socialLinks.instagram | string | Instagram profile URL |
socialLinks.youtube | string | YouTube channel URL |
pagesScraped | number | Total pages processed for this domain (homepage + discovered subpages) |
scrapeError | string | Error message if the domain could not be scraped (present only on failed domains) |
scrapedAt | string | ISO 8601 timestamp when the result was assembled |
How much does it cost to scrape website contacts?
Website Contact Scraper uses pay-per-event pricing β you pay $0.15 per website scanned. Platform compute costs are included in the price. You are only charged when the actor finds data or completes without error β failed domains that return no data are free.
| Scenario | Websites | Cost per website | Total cost |
|---|---|---|---|
| Quick test | 1 | $0.15 | $0.15 |
| Small batch | 10 | $0.15 | $1.50 |
| Medium batch | 50 | $0.15 | $7.50 |
| Large batch | 200 | $0.15 | $30.00 |
| Enterprise | 1,000 | $0.15 | $150.00 |
You can set a maximum spending limit per run to control costs. The actor stops delivering results when your budget is reached, so you never pay more than you expect. The status log reports how many domains were processed before the limit was hit.
Compare this to Hunter.io at $49-$149/month or Clay at $149-$720/month β most Website Contact Scraper users spend $5-$30/month with no subscription commitment. Apify's free tier also includes $5 of monthly credits, which covers 33 website scans at no cost. And unlike Hunter.io or Clay, you get built-in email verification included in the $0.15 price β no separate verification tool required.
Extract website contacts using the API
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("ryanclinton/website-contact-scraper").call(run_input={"urls": ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com",],"maxPagesPerDomain": 5,"deepScan": True,"verifyEmails": True,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():domain = item["domain"]personal = item.get("personalEmails", [])generic = item.get("genericEmails", [])verified = item.get("verifiedEmails", [])valid_count = sum(1 for v in verified if v["status"] == "valid")print(f"{domain}: {len(personal)} personal, {len(generic)} generic, {valid_count} verified-valid")for contact in item.get("contacts", []):print(f" {contact['name']} β {contact.get('title', 'no title')}")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("ryanclinton/website-contact-scraper").call({urls: ["https://pinnacleventures.com","https://meridiantech.io","https://atlaslogistics.com",],maxPagesPerDomain: 5,deepScan: true,verifyEmails: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();for (const item of items) {const validEmails = (item.verifiedEmails ?? []).filter(v => v.status === "valid");console.log(`${item.domain}: ${item.personalEmails.length} personal, ${validEmails.length} verified-valid`);for (const contact of item.contacts) {console.log(` ${contact.name} (${contact.title ?? "no title"})`);}}
cURL
# Start the actor runcurl -X POST "https://api.apify.com/v2/acts/ryanclinton~website-contact-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://pinnacleventures.com", "https://meridiantech.io"],"maxPagesPerDomain": 5,"deepScan": true,"verifyEmails": true}'# Fetch results once the run completes (replace DATASET_ID from the run response)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Website Contact Scraper works
Phase 1: URL normalization and domain deduplication
Before any crawling begins, each input URL is normalized β HTTPS is enforced, trailing slashes are stripped, and the domain is extracted with www. removed. Duplicate domains are collapsed to a single entry so you never pay twice for the same site. Invalid URLs that fail to parse into a valid hostname (must contain a dot) are skipped with a warning. An empty result object is initialized for each unique domain, and the homepage is queued with label: 'HOMEPAGE'.
Phase 2: Homepage crawl and contact-page discovery
CheerioCrawler fetches each homepage using got-scraping with up to 10 concurrent connections, a 120 requests/minute rate limit, 30-second navigation timeout, 60-second handler timeout, and 3 automatic retries. Session pooling with persistent cookies handles sites that require session state. SSL errors are silently ignored to handle sites with invalid certificates.
On the homepage, all four extraction functions run: emails (from mailto: links, body text regex with script/style/noscript stripped, and anchor hrefs), phones (from tel: links and contact-area text with full-body fallback), social links (5 platform patterns with footer/header/nav priority), and contacts (3 strategies with Unicode name support). Results are merged into the domain's result object with cross-page deduplication.
The homepage handler then scans every <a href> for same-domain URLs matching any of 19 contact-page path segments (contact, about, team, leadership, management, executives, people, staff, company, and variations). When deep scan is enabled, 14 additional hidden paths are appended (/imprint, /impressum, /privacy-policy, /legal, /datenschutz, /support, /careers, etc.). Page slots are reserved atomically on a shared counter map β preventing concurrent handlers from exceeding the per-domain limit.
Phase 3: Subpage extraction and aggregation
Contact, about, team, leadership, imprint, and deep-scan pages run through the same extraction pipeline as the homepage. No additional link-following occurs on subpages β crawl depth is controlled exclusively by the homepage handler. Each page's extracted data is merged into the domain result with deduplication: emails by exact lowercase string, phones by digit-only key, contacts by case-insensitive name, social links by first-match-per-platform.
Phase 4: Email classification, verification, and output
After all pages are crawled, each domain's emails are classified into personal and generic arrays using a 16-prefix regex pattern (info, hello, contact, office, sales, billing, accounts, enquiry, support, help, team, feedback, general, mail, request). When verifyEmails is enabled, all unique emails across all domains are collected into a single set and verified in one batch call to Bulk Email Verifier (256 MB, 15-minute timeout). Verification results are mapped back to each domain by email address.
Results are sorted by data richness β domains with the most emails + phones + contacts appear first, failed domains last. In pay-per-event mode, a website-scanned charge event fires for each domain that produced data or completed without error. If the spending limit is reached mid-batch, the actor stops gracefully and logs how many domains remain undelivered. A summary with totals for emails, personal emails, phones, contacts, and failed domains is saved to the key-value store under the SUMMARY key.
Tips for best results
-
Enable deep scan for European companies. EU regulations require businesses to display contact information on imprint pages (/impressum, /imprint). Deep scan probes these 14 hidden paths that standard crawling misses, often uncovering emails and phone numbers not listed on the main contact page.
-
Enable email verification for outreach lists. The built-in verifier catches invalid addresses, disposable domains, and catch-all servers before they reach your outreach tool. This keeps bounce rates below 5% and protects your sender reputation. Filter output by
verifiedEmails[].status === "valid"for the cleanest list. -
Enable proxies for batches over 20 sites. Apify Proxy rotates IP addresses automatically. Set
proxyConfiguration: { "useApifyProxy": true }in your input. This is the single biggest factor in preventing blocks on large batches. -
Filter emails by domain post-processing. The output may include third-party emails from embedded contact forms, partner widgets, or job board integrations. After downloading, filter
emailsto keep only those ending in@yourtargetdomain.com. -
Pair with Email Pattern Finder for gap coverage. If the scraper returns team member names but no personal emails, feed the names and domain into Email Pattern Finder to predict addresses based on the company's first.last@, first@, or flast@ naming convention.
-
Disable
includeNamesfor pure email/phone runs. Name extraction performs DOM traversal with 11 CSS selectors and Schema.org queries per page. If you only need emails and phones, disabling it reduces per-page processing time. -
Set a spending cap for large batches. Use the run's max cost setting to cap spend at a comfortable amount. The actor stops gracefully at the limit and logs how many domains were processed vs. total.
-
Use CSV export for CRM bulk import. Download results as CSV and map columns directly to HubSpot, Salesforce, or Pipedrive contact import templates. The flat structure (
personalEmails,genericEmails,phones,domain) imports without transformation.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Email Pattern Finder | When contacts have names but no emails, predict addresses from the company's email naming convention ($0.10/domain) |
| Bulk Email Verifier | Verify emails separately if you ran the scraper without verifyEmails enabled ($0.005/email) |
| B2B Lead Qualifier | Score scraped contacts 0-100 using company data, tech stack, and 30+ signals ($0.15/lead) |
| Website Contact Scraper Pro | Use instead for JavaScript-heavy sites (React, Angular, Vue SPAs) that require a real browser to render contact data |
| HubSpot Lead Pusher | Push scraped contact records directly into HubSpot as new contacts or update existing ones |
| Website Tech Stack Detector | Identify 100+ technologies used by each company for technographic lead scoring ($0.10/site) |
| B2B Lead Gen Suite | Full pipeline: input URLs to scraped contacts to enrichment to scored leads, all in one actor ($0.25/lead) |
Limitations
- No JavaScript rendering β the actor uses CheerioCrawler which parses static server-rendered HTML. Single-page applications that load contact data via client-side JavaScript (React, Angular, Vue) will not have their dynamic content extracted. For JS-heavy sites, use Website Contact Scraper Pro.
- Same-domain links only β the actor only follows links within the same domain as the input URL. Cross-domain team directories or externally hosted about pages are not discovered.
- Name extraction depends on HTML patterns β team member detection relies on Schema.org markup, 11 recognized CSS class names, and heading-paragraph structure. Custom or unconventional layouts may not trigger any of the three extraction strategies.
- Phone extraction uses targeted selectors β to minimize false positives, phone regex first targets header, footer, nav, address, and elements with contact/phone/info class names. Full body text is used only as a fallback when contact-area selectors return insufficient content. Numbers formatted as bare digits without separators will not be captured.
- No authentication support β only publicly accessible pages are processed. Login-gated employee directories, intranets, and members-only portals are not supported.
- First social link per platform β if a page contains multiple LinkedIn profiles (e.g., company page + individual employee profiles), only the first matched URL per platform is recorded. Footer/header/nav links are prioritized over body links.
- One record per domain β multiple input URLs on the same domain (e.g.,
acmecorp.comandwww.acmecorp.com) are merged into a single output record. This is by design to prevent duplicate billing. - Verification adds runtime β enabling
verifyEmailsadds 1-2 minutes to the run as a separate verification actor is called with all unique emails. For very large batches (1000+ emails), this may take longer.
Integrations
- Zapier β Trigger a Zap when a run completes and push verified emails and contact names directly to your CRM, email list, or notification system
- Make β Build automated workflows that route personal vs. generic emails to different CRM fields or marketing lists via Make's 1,500+ app connectors
- Google Sheets β Export results directly to a Google Sheet for collaborative review, filtering by verification status, or manual enrichment before CRM import
- Apify API β Trigger runs programmatically and retrieve results in JSON, CSV, XML, or Excel format β use the Python or JavaScript SDK for clean integration
- Webhooks β Receive an HTTP POST when a run completes and automatically trigger downstream processing in your own backend
- LangChain / LlamaIndex β Feed verified contact datasets into AI agent workflows for automated research, outreach drafting, or lead qualification pipelines
Troubleshooting
-
Empty email results despite a site showing contact addresses β The site likely loads contact information via JavaScript after the initial page load. CheerioCrawler parses only the static HTML returned by the server. Switch to Website Contact Scraper Pro, which uses a full browser to render dynamically loaded content.
-
Run takes longer than expected for large batches β Each website crawls up to
maxPagesPerDomainpages with a 30-second timeout per page. A batch of 500 sites at 5 pages each could make up to 2,500 HTTP requests. LowermaxPagesPerDomainto 3 for a faster pass. Enabling Apify Proxy can also improve speed on sites that throttle repeated requests. Email verification adds 1-2 minutes at the end of the run. -
Phone numbers are missing from output β Phone extraction requires recognized formatting (international prefix, parentheses, or dash/dot separators). The actor first checks contact-specific page areas, then falls back to full body text. Numbers formatted as bare 10-digit strings without separators are intentionally skipped to avoid false positives from zip codes, IDs, and other numeric data.
-
Some contacts have names but no emails β Name extraction and email extraction are independent processes. Not every team member lists a personal email β many sites only have a generic contact@ address. Use Email Pattern Finder to predict personal email addresses from names and the company domain.
-
Verified emails showing "risky" status β A "risky" status typically means the domain has a catch-all configuration that accepts all addresses, making it impossible to confirm whether a specific mailbox exists. These emails may still be deliverable. Use the confidence score to decide your threshold β addresses above 70% confidence are generally safe for outreach.
Responsible use
- This actor only accesses publicly visible web pages that are available to any browser without authentication.
- Respect website terms of service and
robots.txtdirectives. - Comply with GDPR, CAN-SPAM, CASL, and other applicable data protection laws when using scraped contact data for commercial outreach.
- Do not use extracted personal contact information for spam, harassment, or unauthorized purposes.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many websites can I scrape for contact information in one run? There is no hard URL limit. The actor processes sites concurrently (up to 10 at once) and enforces per-domain page limits internally. A batch of 1,000 websites at the default 5 pages per domain typically completes in 60-90 minutes. Enable proxies for batches over 20 sites.
Does Website Contact Scraper verify email addresses?
Yes. Enable the verifyEmails option and the actor runs MX record checks, disposable domain detection, and role-based flagging on every found email. Each verified email gets a status (valid/invalid/risky), confidence score (0-100), and human-readable reason. This uses Bulk Email Verifier internally β no separate run or additional cost needed.
What is the difference between personalEmails and genericEmails? Personal emails are addressed to individuals (sarah@, j.smith@, m.rodriguez@). Generic emails use role-based prefixes like info@, hello@, contact@, office@, sales@, billing@, support@, and 9 other patterns. The actor classifies all found emails into both arrays automatically, so you can target decision-makers directly instead of shared inboxes.
Does Website Contact Scraper extract emails hidden behind JavaScript? No. The actor uses CheerioCrawler, which parses static HTML. If contact emails are loaded via client-side JavaScript (common on React and Next.js sites), they will not appear in the output. For JavaScript-rendered sites, use Website Contact Scraper Pro.
What is deep scan mode and when should I use it? Deep scan probes 14 hidden page paths β /imprint, /impressum, /privacy-policy, /legal, /datenschutz, /support, /careers, and more β that often contain contact information not linked from the main navigation. European businesses are legally required to display contact details on imprint pages. Enable deep scan for EU companies or any site where the standard crawl returned fewer contacts than expected.
What types of email addresses are filtered out? The actor removes noreply@, no-reply@, donotreply@, test@, admin@, webmaster@, postmaster@, mailer-daemon@, and root@ addresses. It also filters emails ending in image, CSS, or JavaScript file extensions (.png, .jpg, .css, .js) and addresses from placeholder domains including example.com, sentry.io, wixpress.io, and placeholder.io.
Is it legal to scrape contact information from websites? Scraping publicly available contact information is generally permitted in most jurisdictions β a position supported by the 2022 hiQ Labs v. LinkedIn ruling in the US. However, what you do with the data matters. GDPR in the EU and similar laws restrict how personal data can be processed for outreach. Always review the target site's Terms of Service. See Apify's web scraping legality guide.
How is Website Contact Scraper different from Hunter.io or Clay? Hunter.io and Clay use proprietary databases of pre-scraped contacts β you query their index, and data freshness depends on their crawl schedule. Website Contact Scraper crawls the live website each time you run it, so results reflect the current state of the page. It also extracts structured team members with titles, classifies personal vs. generic emails, and includes built-in email verification β all at $0.15/site versus Hunter.io's $49-$149/month or Clay's $149-$720/month subscription tiers.
Can I schedule Website Contact Scraper to run on a recurring basis? Yes. Use Apify Schedules to run the actor daily, weekly, or at any custom cron interval. This keeps prospect databases refreshed without manual effort. Combine with webhooks to automatically push new results to your CRM.
How accurate is the contact name extraction? Accuracy depends on the site's HTML structure. Sites using Schema.org Person markup or standard team-card CSS patterns (.team-member, .team-card, etc.) yield near-perfect results. The actor uses a strict proper-name regex with Unicode accent support (handles names like Bjorn, O'Brien, Anne-Marie) and a 40-word junk-name blocklist to minimize false positives. Sites with custom or unconventional layouts may produce fewer contacts.
What happens if a website is down or returns an error?
The actor retries each failed request up to 3 times with session pooling and persistent cookies. If all retries fail, the domain is included in the output with a scrapeError field explaining what went wrong. Failed domains are not charged in pay-per-event mode. The run continues processing all other domains without interruption.
How do I push scraped contacts into my CRM automatically? Use HubSpot Lead Pusher to push contacts directly into HubSpot after a scrape run. For Salesforce, Pipedrive, or other CRMs, use the Zapier or Make integration to route new dataset items automatically. The personal/generic email classification helps you map to the right CRM fields.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom scraping solutions or enterprise integrations, reach out through the Apify platform.