Website Data & Email Scraper - Enrichment & Validator avatar

Website Data & Email Scraper - Enrichment & Validator

Pricing

from $0.005 / actor start

Go to Apify Store
Website Data & Email Scraper - Enrichment & Validator

Website Data & Email Scraper - Enrichment & Validator

Extract emails, phone numbers, and social media profiles from any website. Validate email deliverability, detect digital platform accounts, and gather domain intelligence (tech stack, SSL, WHOIS, server location). The complete data enrichment toolkit for B2B lead generation.

Pricing

from $0.005 / actor start

Rating

5.0

(2)

Developer

Expandí tu Marca

Expandí tu Marca

Maintained by Community

Actor stats

8

Bookmarked

120

Total users

16

Monthly active users

18 hours ago

Last modified

Share

Website Data & Email Scraper — Enrichment & Validator

Extract verified contact data and business intelligence from any website. Give it a list of URLs and get back emails, phone numbers, social media profiles, website metadata, and optional enrichment — all in a single structured dataset.

Built for B2B lead generation, sales prospecting, and market research at scale.


What It Does

The actor visits each website you provide, intelligently navigates through its most relevant pages, and extracts every piece of contact and business data it can find. It goes beyond the homepage — it identifies and visits internal pages like Contact, About, Services, and more to maximize data yield.

The result is a clean, deduplicated dataset ready for your CRM, outreach tool, or spreadsheet.


Key Features

  • Email extraction — finds emails from visible text, HTML structure, and mailto: links. Filters out placeholder and tracking addresses automatically.
  • Phone number extraction — captures numbers from tel: links and page text, normalizes them to international E.164 format, and deduplicates regional variants (e.g. +54 11 and +5411 become the same number).
  • Social media profiles — detects direct profile links for Instagram, Facebook, LinkedIn, Twitter/X, YouTube, TikTok, WhatsApp, Telegram, Pinterest, and more.
  • Strategic internal page crawling — visits Contact, About, Services, Portfolio, Pricing, and other relevant sections to find contact data that isn't on the homepage.
  • Website metadata — extracts page title, meta description, keywords, and CMS/generator.
  • Optional enrichment — Email Domain Validation, Platform Account Detection, and Domain Intelligence (described below).

Use Cases

  • B2B lead generation — build verified contact lists from a batch of prospect websites.
  • Sales prospecting — enrich a lead list with emails and phone numbers before outreach.
  • Market research — understand what technology stack and services your target market uses.
  • Competitor analysis — gather public contact and infrastructure data for competitor websites.
  • Agency & freelance — deliver contact datasets to clients from their target industry.

Input

FieldDescriptionDefault
Website URLsOne or more website URLs to scrape. Accepts domain names or full URLs.
Internal Pages to ScanHow many internal pages to visit per site beyond the homepage. Options: 0 (homepage only), 5, 10, 15, or 20. More pages find more contacts, each charged separately.0
Deep Site CrawlWhen enabled, also follows sub-pages discovered within internal pages (e.g. blog posts, portfolio items). Each counts toward your Internal Pages limit. Requires Internal Pages > 0.Off
Email Domain ValidationValidates each email address by checking if its domain has active mail records and probing mailbox reachability.Off
Platform Account DetectionChecks which digital platforms are associated with personal email addresses found on the site (Gmail, Yahoo, Outlook, etc.). Corporate emails are automatically skipped.Off
Domain IntelligenceGathers technical intelligence per domain: detected services, registration age, SSL certificate status, and server location.Off
Proxy ConfigurationOptional proxy for scraping. Residential proxies improve success rates on bot-protected websites.None

What Gets Extracted

Contact Data

Emails

Raw email addresses as found on the website. Placeholder, example, and tracking addresses are filtered out automatically.

Phones (Normalized)

Phone numbers in international E.164 format (e.g. +541155978902). Regional duplicates for the same number are collapsed into one entry.

Social Media

Direct profile URLs for the following platforms (only actual profile links — share buttons and platform homepage links are excluded):

Instagram · Facebook · LinkedIn · Twitter / X · YouTube · TikTok · WhatsApp · Telegram · Pinterest · Snapchat


Website Metadata

FieldDescription
websiteTitlePage <title> tag
websiteDescriptionMeta description
websiteKeywordsMeta keywords (when present)
websiteGeneratorCMS or site builder detected (WordPress, Wix, Squarespace, etc.)
internalPagesList of internal pages visited during the crawl

Enrichment Options

Email Domain Validation

For each email found, this option runs two checks:

  1. MX record check — verifies the email domain has active mail exchange records (the domain can receive email).
  2. SMTP probe — attempts to verify the mailbox directly against the mail server. Not available for major freemail providers (Gmail, Outlook, Yahoo, etc.) since they block these probes.

Each email returns:

  • isFreeMail — whether it belongs to a major free provider
  • provider — provider name (Gmail, Outlook, Yahoo, iCloud, etc.)
  • mxValid — whether the domain has active MX records
  • smtpStatusvalid, invalid, catchall, or unknown

Freemail providers recognized: Gmail, Outlook, Hotmail, Live, Yahoo, iCloud, ProtonMail, Zoho, GMX, Yandex, Mail.ru, AOL, QQ, Tutanota, Fastmail, HEY, and regional ISP providers.


Platform Account Detection

For personal email addresses (Gmail, Yahoo, Outlook, etc.) found on the site, this option checks which digital platforms have an account registered with that address.

  • Applies only to freemail addresses — corporate domain emails are automatically skipped (marked as null).
  • Each freemail email returns a platforms array with the names of platforms where the address is registered.

Domain Intelligence

Runs a technical profile of each domain. All checks run in parallel with a combined timeout.

FieldDescription
servicesBusiness services detected via DNS records: Google Workspace, Microsoft 365, HubSpot, Salesforce, Shopify, Mailchimp, Zendesk, Intercom, Stripe, and 15+ more
whoisCreatedDomain registration date (YYYY-MM-DD)
whoisAgedaysDomain age in days
registrarDomain registrar name
serverCountryCountry where the server IP is located
sslValidWhether the SSL certificate is currently valid
sslDaysRemainingDays until SSL certificate expires
sslExpirySSL certificate expiry date
sslIssuerCertificate authority that issued the SSL

Output Dataset

Results are organized into three views in the Apify dataset:

Overview

A quick-scan lead card per website.

url · domain · emails · phonesNormalized · socialMedia · websiteTitle · status

Website Intel

Technical and SEO metadata about each site.

url · domain · websiteTitle · websiteDescription · websiteKeywords · websiteGenerator · internalPages

Enrichment

Validation and intelligence results (only populated when enrichment options are enabled).

url · domain · emailVerification · platformDetection · domainIntel


Pricing

This actor uses Pay-Per-Event pricing — you only pay for what you actually process.

EventWhen it's charged
Website scrapedOnce per URL processed (homepage)
Internal page scrapedOnce per additional page visited beyond the homepage
Email verifiedOnce per email address run through Email Domain Validation
Platform detectionOnce per freemail address run through Platform Account Detection
Domain intelOnce per domain run through Domain Intelligence

Pricing per event is listed on the actor's page. All enrichment options are off by default — enable only what you need.


Tips & Best Practices

Getting more contacts

  • Enable Internal Pages and set it to 10–20 for data-rich results. Contact and About pages are prioritized first.
  • Enable Deep Site Crawl for sites that spread contact info across many sub-pages (agencies, portfolios, multi-location businesses).

Handling bot-protected sites

  • Some websites block automated requests. Use the Proxy Configuration option with residential proxies for better success rates.

Freemail vs. corporate emails

  • Personal email addresses (Gmail, Yahoo, etc.) are great for Platform Account Detection but SMTP probing is not available for them.
  • Corporate emails (name@company.com) can be SMTP-probed and are the primary target for Email Domain Validation.

Domain Intelligence for sales

  • Use the services field to identify companies running HubSpot (likely have a sales team), Shopify (e-commerce), or Google Workspace (cloud-first business).
  • The whoisAgedays field helps filter out very new domains (< 90 days) that may be spam or placeholder sites.

Limitations

  • Websites that require login, CAPTCHA, or JavaScript-heavy infinite scroll may not yield complete results.
  • SMTP mailbox probing is blocked by major freemail providers and some corporate mail servers behind cloud gateways.
  • WHOIS data availability varies by TLD and registrar — some domains return partial or no registration data.
  • Platform Account Detection applies only to freemail addresses. Corporate emails return null for this field.
  • Memory scales with concurrency: 512 MB processes 1 website at a time; higher memory allocations enable parallel processing (up to 5 concurrent).