Website Data & Email Scraper - Enrichment & Validator
Pricing
from $0.005 / actor start
Website Data & Email Scraper - Enrichment & Validator
Extract emails, phone numbers, and social media profiles from any website. Validate email deliverability, detect digital platform accounts, and gather domain intelligence (tech stack, SSL, WHOIS, server location). The complete data enrichment toolkit for B2B lead generation.
Pricing
from $0.005 / actor start
Rating
5.0
(2)
Developer
Expandí tu Marca
Actor stats
8
Bookmarked
120
Total users
16
Monthly active users
18 hours ago
Last modified
Categories
Share
Website Data & Email Scraper — Enrichment & Validator
Extract verified contact data and business intelligence from any website. Give it a list of URLs and get back emails, phone numbers, social media profiles, website metadata, and optional enrichment — all in a single structured dataset.
Built for B2B lead generation, sales prospecting, and market research at scale.
What It Does
The actor visits each website you provide, intelligently navigates through its most relevant pages, and extracts every piece of contact and business data it can find. It goes beyond the homepage — it identifies and visits internal pages like Contact, About, Services, and more to maximize data yield.
The result is a clean, deduplicated dataset ready for your CRM, outreach tool, or spreadsheet.
Key Features
- Email extraction — finds emails from visible text, HTML structure, and
mailto:links. Filters out placeholder and tracking addresses automatically. - Phone number extraction — captures numbers from
tel:links and page text, normalizes them to international E.164 format, and deduplicates regional variants (e.g.+54 11and+5411become the same number). - Social media profiles — detects direct profile links for Instagram, Facebook, LinkedIn, Twitter/X, YouTube, TikTok, WhatsApp, Telegram, Pinterest, and more.
- Strategic internal page crawling — visits Contact, About, Services, Portfolio, Pricing, and other relevant sections to find contact data that isn't on the homepage.
- Website metadata — extracts page title, meta description, keywords, and CMS/generator.
- Optional enrichment — Email Domain Validation, Platform Account Detection, and Domain Intelligence (described below).
Use Cases
- B2B lead generation — build verified contact lists from a batch of prospect websites.
- Sales prospecting — enrich a lead list with emails and phone numbers before outreach.
- Market research — understand what technology stack and services your target market uses.
- Competitor analysis — gather public contact and infrastructure data for competitor websites.
- Agency & freelance — deliver contact datasets to clients from their target industry.
Input
| Field | Description | Default |
|---|---|---|
| Website URLs | One or more website URLs to scrape. Accepts domain names or full URLs. | — |
| Internal Pages to Scan | How many internal pages to visit per site beyond the homepage. Options: 0 (homepage only), 5, 10, 15, or 20. More pages find more contacts, each charged separately. | 0 |
| Deep Site Crawl | When enabled, also follows sub-pages discovered within internal pages (e.g. blog posts, portfolio items). Each counts toward your Internal Pages limit. Requires Internal Pages > 0. | Off |
| Email Domain Validation | Validates each email address by checking if its domain has active mail records and probing mailbox reachability. | Off |
| Platform Account Detection | Checks which digital platforms are associated with personal email addresses found on the site (Gmail, Yahoo, Outlook, etc.). Corporate emails are automatically skipped. | Off |
| Domain Intelligence | Gathers technical intelligence per domain: detected services, registration age, SSL certificate status, and server location. | Off |
| Proxy Configuration | Optional proxy for scraping. Residential proxies improve success rates on bot-protected websites. | None |
What Gets Extracted
Contact Data
Emails
Raw email addresses as found on the website. Placeholder, example, and tracking addresses are filtered out automatically.
Phones (Normalized)
Phone numbers in international E.164 format (e.g. +541155978902). Regional duplicates for the same number are collapsed into one entry.
Social Media
Direct profile URLs for the following platforms (only actual profile links — share buttons and platform homepage links are excluded):
Instagram · Facebook · LinkedIn · Twitter / X · YouTube · TikTok · WhatsApp · Telegram · Pinterest · Snapchat
Website Metadata
| Field | Description |
|---|---|
websiteTitle | Page <title> tag |
websiteDescription | Meta description |
websiteKeywords | Meta keywords (when present) |
websiteGenerator | CMS or site builder detected (WordPress, Wix, Squarespace, etc.) |
internalPages | List of internal pages visited during the crawl |
Enrichment Options
Email Domain Validation
For each email found, this option runs two checks:
- MX record check — verifies the email domain has active mail exchange records (the domain can receive email).
- SMTP probe — attempts to verify the mailbox directly against the mail server. Not available for major freemail providers (Gmail, Outlook, Yahoo, etc.) since they block these probes.
Each email returns:
isFreeMail— whether it belongs to a major free providerprovider— provider name (Gmail, Outlook, Yahoo, iCloud, etc.)mxValid— whether the domain has active MX recordssmtpStatus—valid,invalid,catchall, orunknown
Freemail providers recognized: Gmail, Outlook, Hotmail, Live, Yahoo, iCloud, ProtonMail, Zoho, GMX, Yandex, Mail.ru, AOL, QQ, Tutanota, Fastmail, HEY, and regional ISP providers.
Platform Account Detection
For personal email addresses (Gmail, Yahoo, Outlook, etc.) found on the site, this option checks which digital platforms have an account registered with that address.
- Applies only to freemail addresses — corporate domain emails are automatically skipped (marked as
null). - Each freemail email returns a
platformsarray with the names of platforms where the address is registered.
Domain Intelligence
Runs a technical profile of each domain. All checks run in parallel with a combined timeout.
| Field | Description |
|---|---|
services | Business services detected via DNS records: Google Workspace, Microsoft 365, HubSpot, Salesforce, Shopify, Mailchimp, Zendesk, Intercom, Stripe, and 15+ more |
whoisCreated | Domain registration date (YYYY-MM-DD) |
whoisAgedays | Domain age in days |
registrar | Domain registrar name |
serverCountry | Country where the server IP is located |
sslValid | Whether the SSL certificate is currently valid |
sslDaysRemaining | Days until SSL certificate expires |
sslExpiry | SSL certificate expiry date |
sslIssuer | Certificate authority that issued the SSL |
Output Dataset
Results are organized into three views in the Apify dataset:
Overview
A quick-scan lead card per website.
url · domain · emails · phonesNormalized · socialMedia · websiteTitle · status
Website Intel
Technical and SEO metadata about each site.
url · domain · websiteTitle · websiteDescription · websiteKeywords · websiteGenerator · internalPages
Enrichment
Validation and intelligence results (only populated when enrichment options are enabled).
url · domain · emailVerification · platformDetection · domainIntel
Pricing
This actor uses Pay-Per-Event pricing — you only pay for what you actually process.
| Event | When it's charged |
|---|---|
| Website scraped | Once per URL processed (homepage) |
| Internal page scraped | Once per additional page visited beyond the homepage |
| Email verified | Once per email address run through Email Domain Validation |
| Platform detection | Once per freemail address run through Platform Account Detection |
| Domain intel | Once per domain run through Domain Intelligence |
Pricing per event is listed on the actor's page. All enrichment options are off by default — enable only what you need.
Tips & Best Practices
Getting more contacts
- Enable Internal Pages and set it to 10–20 for data-rich results. Contact and About pages are prioritized first.
- Enable Deep Site Crawl for sites that spread contact info across many sub-pages (agencies, portfolios, multi-location businesses).
Handling bot-protected sites
- Some websites block automated requests. Use the Proxy Configuration option with residential proxies for better success rates.
Freemail vs. corporate emails
- Personal email addresses (Gmail, Yahoo, etc.) are great for Platform Account Detection but SMTP probing is not available for them.
- Corporate emails (
name@company.com) can be SMTP-probed and are the primary target for Email Domain Validation.
Domain Intelligence for sales
- Use the
servicesfield to identify companies running HubSpot (likely have a sales team), Shopify (e-commerce), or Google Workspace (cloud-first business). - The
whoisAgedaysfield helps filter out very new domains (< 90 days) that may be spam or placeholder sites.
Limitations
- Websites that require login, CAPTCHA, or JavaScript-heavy infinite scroll may not yield complete results.
- SMTP mailbox probing is blocked by major freemail providers and some corporate mail servers behind cloud gateways.
- WHOIS data availability varies by TLD and registrar — some domains return partial or no registration data.
- Platform Account Detection applies only to freemail addresses. Corporate emails return
nullfor this field. - Memory scales with concurrency: 512 MB processes 1 website at a time; higher memory allocations enable parallel processing (up to 5 concurrent).