Website Email & Contact Extractor: Lead Generation Tool avatar

Website Email & Contact Extractor: Lead Generation Tool

Pricing

$9.99/month + usage

Go to Apify Store
Website Email & Contact Extractor: Lead Generation Tool

Website Email & Contact Extractor: Lead Generation Tool

Extract emails, phone numbers and social media links from any website. Auto-scans homepage plus contact and about pages. Returns verified leads with LinkedIn, Twitter, Instagram profiles. Perfect for B2B outreach and lead generation.

Pricing

$9.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

4 days ago

Last modified

Share

πŸ“§ Website Email & Contact Extractor: Lead Generation Tool

Instantly extract emails, phone numbers, social media links, and addresses from any website. The most reliable Website Email & Contact Extractor on Apify β€” built for lead generation, B2B prospecting, and sales outreach at scale. No login, no API keys, no limits.


πŸ“Œ Table of Contents


πŸ” What Is This Actor?

Website Email & Contact Extractor is a production-grade Apify actor that crawls any website and extracts all publicly available contact information β€” emails, phone numbers, social media profile links, and physical addresses β€” from the homepage and up to 5 sub-pages (contact, about, team, impressum) per domain.

This Website Email & Contact Extractor is designed for sales teams, marketing agencies, recruiters, and developers who need to build targeted lead lists at scale without manually visiting each website. Provide a list of URLs and receive back a clean, structured dataset of contact data β€” ready for CRM import, outreach campaigns, or market research.

Whether you need to extract 10 emails from a niche competitor list or scrape contact data from 1,000 company websites for a B2B lead generation campaign, this actor handles it reliably with smart deduplication, false-positive filtering, and residential proxy support.


πŸš€ Why Choose This Website Email & Contact Extractor?

FeatureThis ActorManual ResearchGeneric Scrapers
Email extraction from any websiteβœ… mailto + text + HTML❌⚠️ Partial
Phone number extractionβœ… tel: links + text❌⚠️
Social media profile linksβœ… 7 platforms❌⚠️
Physical address extractionβœ… Schema.org + CSS❌❌
Multi-page scan per domainβœ… Up to 5 pages❌❌
False positive filteringβœ… Smart skip-list❌❌
Bulk URL processingβœ… 1–1000 sites❌⚠️
Keyword filteringβœ… Built-in❌❌
Residential proxy supportβœ… Built-in❌⚠️
Duplicate removalβœ… Smart dedup❌❌

Bottom line: This Website Email & Contact Extractor gives you verified, deduplicated contact data across all pages of a website β€” faster and more accurately than any manual process.


🎯 Use Cases

πŸ“¬ B2B Lead Generation

Use this Website Email & Contact Extractor to build targeted prospect lists for outbound sales campaigns. Feed in a list of company websites in your target industry and extract all decision-maker contact emails, direct phone lines, and LinkedIn profiles automatically.

🏒 Sales Prospecting & CRM Building

Extract contact data from hundreds of business websites in your target segment. Enrich your CRM with emails, phones, and social links without manual data entry. This Website Email & Contact Extractor turns raw URL lists into actionable sales intelligence.

πŸ“£ Marketing & Outreach Campaigns

Build cold email lists, identify influencer contacts, or find PR and media contacts from publisher websites. This contact extractor finds every publicly listed email address across the homepage, contact page, and about page.

πŸ” Competitor Research

Scrape contact pages from competitor websites to understand their support structure, regional offices, and social media presence. The Website Email & Contact Extractor maps out the full contact footprint of any business.

🀝 Partnership & Sponsorship Outreach

Extract partnership or sponsorship contact emails from brand websites and media companies. Find the right contact quickly without browsing through multiple pages manually.

πŸ’Ό Recruitment & HR

Extract contact emails from company career and team pages for headhunting and direct outreach. This Website Email & Contact Extractor finds emails even when they are not in an obvious mailto: link β€” including plain-text and obfuscated addresses in page content.

🌍 Local Business Data Collection

Collect phone numbers, addresses, and email addresses from local business websites for directory building, market mapping, or regional sales coverage analysis.

πŸ“Š Market Research & Data Enrichment

Enrich a dataset of company domains with contact data. Match extracted emails with LinkedIn profiles via social_links output, verify physical addresses, and cross-reference phone numbers β€” all in one run.


πŸ“Š What Data Is Extracted?

This Website Email & Contact Extractor pulls the following contact data from each website:

βœ‰οΈ Email Addresses

  • Extracted from mailto: links (highest accuracy)
  • Extracted from visible page text using regex pattern matching
  • Extracted from raw HTML source including obfuscated patterns
  • False positives filtered out: no-reply, example.com, asset files, CDN domains, tracking pixels

πŸ“ž Phone Numbers

  • Extracted from tel: links (highest accuracy β€” exact format preserved)
  • Extracted from visible page text supporting all international formats
  • Validated by digit count (7–15 digits β€” rejects ZIP codes and short strings)

Extracted from all pages for these 7 platforms:

  • LinkedIn β€” company pages and personal profiles
  • Facebook β€” business pages
  • Twitter / X β€” brand accounts
  • Instagram β€” business profiles
  • YouTube β€” channels
  • TikTok β€” brand accounts
  • GitHub β€” organization and user profiles

🏠 Physical Address

  • Extracted via Schema.org PostalAddress microdata (highest accuracy)
  • Falls back to CSS class/ID patterns like .address, .location

🏷️ Company Name

  • Extracted from og:site_name meta tag (most accurate)
  • Falls back to <title> tag with common suffixes stripped
  • Falls back to domain name if nothing else is found

βš™οΈ How It Scans Each Website

This Website Email & Contact Extractor uses a smart multi-page crawl strategy to maximize contact data coverage:

Phase 1 β€” Homepage Scan

The actor fetches the homepage and immediately extracts all emails, phones, social links, and address data. Most business websites list at least their social profiles and support email on the homepage.

Phase 2 β€” Contact Page Discovery

Using keyword matching on link text and href attributes, the actor finds internal links to contact-related pages such as: contact, about, team, reach, connect, get-in-touch, support, imprint, impressum, legal.

Phase 3 β€” Sub-Page Scanning

The actor visits up to pages_to_scan - 1 discovered contact pages (default: 4 additional pages after homepage). Contact pages almost always contain the most complete and accurate contact data.

Phase 4 β€” Deduplication & Assembly

All extracted emails, phones, socials, and addresses are deduplicated across all scanned pages and assembled into one clean record per domain. A status flag (Verified, Partial, No Data) is assigned based on completeness.

Smart Filtering

  • Email false positives removed: CDN domains, asset domains, schema.org, no-reply addresses, and common test/example addresses are all excluded automatically
  • Phone false positives removed: Strings shorter than 7 digits or longer than 15 digits are rejected
  • Social link false positives removed: Share buttons, login URLs, dialog intents, and plugin embeds are all excluded

πŸ“‹ Output Fields (Full Reference)

Each record produced by this Website Email & Contact Extractor contains:

FieldTypeDescriptionExample
domainstringWebsite domain"www.example.com"
company_namestringExtracted company name"Acme Corporation"
emailsarrayAll unique emails found (max 20)["info@example.com", "sales@example.com"]
phone_numbersarrayAll unique phone numbers found (max 10)["+1 (800) 555-0100", "+44 20 7946 0958"]
social_linksobjectSocial media profile URLs by platform{"linkedin": "https://linkedin.com/company/...", "twitter": "https://x.com/..."}
addressstringPhysical address (if found)"123 Main St, New York, NY 10001, US"
source_urlstringThe input URL that was scraped"https://www.example.com"
pages_scannedintegerNumber of pages crawled for this domain4
statusstringData completeness status"Verified", "Partial", "No Data"
extracted_atstringISO timestamp of extraction"2024-11-01T10:30:00Z"
{
"linkedin": "https://www.linkedin.com/company/example-corp",
"facebook": "https://www.facebook.com/ExampleCorp",
"twitter": "https://x.com/ExampleCorp",
"instagram": "https://www.instagram.com/examplecorp",
"youtube": "https://www.youtube.com/@ExampleCorp",
"tiktok": "https://www.tiktok.com/@examplecorp",
"github": "https://github.com/example-corp"
}

Only platforms where a profile link is found are included. Missing platforms are simply absent from the object.


βš™οΈ Input Parameters

{
"target_urls": [
"https://www.hubspot.com",
"https://www.salesforce.com",
"https://www.mailchimp.com"
],
"target_url": "",
"keyword": "",
"pages_to_scan": 5,
"max_results": 50,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
ParameterTypeDefaultDescription
target_urlsarray or string[]List of website URLs to extract contact data from. One URL per item or newline-separated string.
target_urlstring""Single website URL shortcut β€” automatically added to target_urls
keywordstring""Optional filter β€” only return results where this keyword appears in domain, company name, email, or address
pages_to_scaninteger5Number of pages to scan per website (1 = homepage only; 5 = homepage + 4 contact sub-pages)
max_resultsinteger50Maximum number of websites to process in one run
proxyConfigurationobjectOffApify proxy config β€” recommended for large-scale extraction

πŸ“¦ Example Inputs & Outputs

Example 1: Extract Contact from a Single Website

Input:

{
"target_url": "https://www.hubspot.com",
"pages_to_scan": 5
}

Output:

{
"domain": "www.hubspot.com",
"company_name": "HubSpot",
"emails": [
"press@hubspot.com",
"legal@hubspot.com",
"privacy@hubspot.com"
],
"phone_numbers": [
"+1 (888) 482-7768"
],
"social_links": {
"linkedin": "https://www.linkedin.com/company/hubspot",
"facebook": "https://www.facebook.com/hubspot",
"twitter": "https://x.com/HubSpot",
"instagram": "https://www.instagram.com/hubspot",
"youtube": "https://www.youtube.com/@HubSpot"
},
"address": "25 First Street, 2nd Floor, Cambridge, MA 02141, United States",
"source_url": "https://www.hubspot.com",
"pages_scanned": 4,
"status": "Verified",
"extracted_at": "2024-11-01T10:30:00Z"
}

Example 2: Bulk Lead Generation β€” B2B List

Input:

{
"target_urls": [
"https://www.company1.com",
"https://www.company2.io",
"https://www.agency3.co.uk",
"https://www.startup4.com",
"https://www.firm5.de"
],
"pages_to_scan": 4,
"max_results": 5
}

Output: 5 records β€” one per domain β€” each containing deduplicated emails, phones, social links, addresses, company names, and status codes. Ready for direct CRM import.


Example 3: Keyword-Filtered Extraction

Input:

{
"target_urls": ["https://www.agency1.com", "https://www.agency2.com", "https://www.tech3.com"],
"keyword": "marketing",
"pages_to_scan": 3
}

Behavior: Only returns records where the word "marketing" appears in the domain name, company name, email addresses, or physical address. Websites that do not match are skipped with a log message.


Example 4: Homepage-Only Quick Scan

Input:

{
"target_urls": ["https://site1.com", "https://site2.com", "https://site3.com"],
"pages_to_scan": 1
}

Use this for: Large lists where speed matters more than completeness. Setting pages_to_scan: 1 only scans the homepage and skips contact/about page discovery entirely.


πŸ” Keyword Filtering

The keyword parameter lets you filter results at extraction time β€” saving you the need to filter the dataset manually after the run.

When keyword is set, the actor checks whether the keyword appears (case-insensitive) in any of these fields for each record:

  • domain
  • company_name
  • emails (joined as a string)
  • address

If the keyword is not found in any of these fields, the record is skipped and not included in the output.

Example use cases for keyword filtering:

  • keyword: "sales" β†’ only return companies with "sales" in their domain or email
  • keyword: "london" β†’ only return companies with London in their address
  • keyword: "@agency.com" β†’ only return records with a specific domain email format
  • keyword: "recruiting" β†’ filter for HR-related websites

This Website Email & Contact Extractor searches the full raw HTML of every scanned page for social media profile URLs matching patterns for 7 platforms. Extracted links are cleaned and validated before inclusion:

  • Tracking parameters removed from social URLs
  • Share buttons filtered out β€” /share, /sharer, /intent/tweet, /dialog/feed paths are all excluded
  • Login and signup pages excluded β€” /login, /signup paths filtered
  • Only one link per platform β€” the first valid profile URL found per platform is returned
  • Trailing slashes normalized β€” for consistent formatting

Social links are returned as a flat object with platform names as keys, making them easy to map into CRM fields or LinkedIn enrichment workflows.


🏠 Address Extraction

Physical address extraction uses a two-tier approach:

Tier 1 β€” Schema.org Structured Data (Highest Accuracy) Websites using PostalAddress or LocalBusiness microdata markup have structured address fields that the actor reads directly: streetAddress, addressLocality, addressRegion, postalCode, addressCountry. This is common on e-commerce sites, service businesses, and sites built with modern CMS platforms.

Tier 2 β€” CSS Pattern Matching (Fallback) For sites without Schema.org markup, the actor searches for common address containers using selectors like .address, .location, [class*='address'], [id*='address']. Text is extracted and length-validated (10–200 characters) to filter out noise.

Address extraction works best on business websites that display their address in a footer, contact page, or sidebar. It may not work for websites that intentionally hide or obfuscate their physical location.


🏷️ Status Codes Explained

Every record from this Website Email & Contact Extractor includes a status field:

StatusMeaning
"Verified"At least one email and at least one phone number were found
"Partial"Some contact data found (emails only, phones only, or socials only β€” but not all)
"No Data"No emails, phones, or social links were found on any scanned page

Use the status field to prioritize outreach: "Verified" records are the highest-confidence leads. "Partial" records may still be useful for social outreach even without a direct email.


🌐 Proxy Configuration

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

When to Use Proxy

  • Running bulk extraction across 50+ websites
  • Scraping websites that block known datacenter IPs (media sites, legal firms, financial services)
  • Scraping websites in regions with geo-restricted access
  • Any production-scale lead generation run

When Proxy Is Optional

  • Small runs of under 10 websites
  • Scraping small business or startup websites
  • Quick one-off lookups on a single domain

The actor uses curl_cffi with Chrome 110 browser impersonation, which already bypasses most bot detection for small volumes without requiring a proxy.


⚑ Performance & Rate Limits

Speed Benchmarks

ModeWebsitesPages/SiteEstimated Time
Single site, full scan15~15–30 seconds
Small batch105~2–4 minutes
Medium batch503~8–12 minutes
Large batch2002~25–35 minutes
Homepage only, large batch1001~8–12 minutes

Reliability Features

  • 1.5–3 second delay between each website request to avoid rate limiting
  • 0.5–1.5 second delay between sub-pages within the same domain
  • Chrome 110 browser impersonation via curl_cffi to minimize bot detection
  • Graceful failure handling β€” failed websites are logged and skipped, never crashing the run
  • Smart false-positive filtering on both emails and phone numbers for clean output

❓ FAQ

Q: What types of email addresses are extracted? A: All publicly visible emails β€” from mailto: links (highest accuracy), from page text (regex), and from raw HTML. False positives like CDN domains, asset file paths, tracking pixel emails, example.com addresses, and no-reply addresses are automatically filtered out.

Q: Can this extract emails hidden behind JavaScript or contact forms? A: No. This actor extracts emails that are present in the HTML source code. Emails that only appear after JavaScript execution or that are submitted via contact forms are not accessible without a full browser renderer.

Q: Why are some phone numbers showing unusual formats? A: Phone numbers are extracted as-is from page text to preserve their original formatting (which varies by country). You can normalize them post-extraction using a phone parsing library like libphonenumber.

Q: What happens if a website has no contact page? A: The actor scans the homepage only. If no contact sub-pages are discovered via link analysis, only homepage data is extracted. The record is returned with pages_scanned: 1.

Q: Can I use this for Gmail, Outlook, or other webmail login pages? A: No. This Website Email & Contact Extractor only works with public-facing business websites. Login-gated pages, authenticated pages, and private intranets are not accessible.

Q: Is extracted data deduplicated? A: Yes. Emails and phone numbers are deduplicated across all pages scanned for a given domain β€” you will never see the same email twice in a single record.

Q: How accurate is the company name extraction? A: Very high for websites using the og:site_name meta tag (most modern sites). For older sites, it falls back to the <title> tag with common separators stripped. As a last resort, it uses the domain name formatted as a title.

Q: Can I process 1,000 websites in one run? A: Yes. Set max_results: 1000 and provide 1,000 URLs. For very large runs, residential proxy is strongly recommended. Estimated time: 3–5 hours depending on pages_to_scan setting.

Q: Does this work on non-English websites? A: Yes. Email and phone extraction uses language-agnostic regex patterns. Contact page discovery includes German keywords (kontakt, impressum) in addition to English ones. Address extraction via Schema.org works regardless of the page language.

Q: What is the pages_to_scan parameter and what should I set it to? A: It controls how many pages are scanned per website. 1 = homepage only (fastest). 5 = homepage + up to 4 sub-pages (most complete). For lead generation, 3–5 is recommended. For large bulk runs where speed matters, 1–2 is better.


πŸ“œ Changelog

v1.0.0 (Current)

  • βœ… Email extraction from mailto: links, page text, and raw HTML
  • βœ… Phone extraction from tel: links and international format text patterns
  • βœ… Social media link extraction for 7 platforms (LinkedIn, Facebook, Twitter/X, Instagram, YouTube, TikTok, GitHub)
  • βœ… Physical address extraction via Schema.org and CSS fallback
  • βœ… Company name extraction via og:site_name, <title>, and domain fallback
  • βœ… Multi-page scan per domain with automatic contact page discovery
  • βœ… Smart false-positive filtering for emails and phones
  • βœ… Social link cleaning (removes share buttons, login redirects, tracking params)
  • βœ… Keyword filtering support
  • βœ… Verified / Partial / No Data status scoring per record
  • βœ… Residential proxy support via curl_cffi Chrome 110 impersonation
  • βœ… 1.5–3 second random delay between requests for safe rate limiting
  • βœ… Bulk processing up to 1,000 websites per run

This Website Email & Contact Extractor collects contact data that is publicly visible on business websites β€” the same information a person would see when visiting the site in a browser.

Please follow these guidelines:

  • Only extract contact data from websites you have a legitimate reason to access
  • Use extracted emails for opted-in outreach only where permitted by applicable law
  • Comply with CAN-SPAM, GDPR, CASL, and other applicable email and data protection regulations in your jurisdiction
  • Do not use extracted data to send spam, unsolicited bulk emails, or harassing messages
  • Respect the robots.txt file and Terms of Service of each website you scrape
  • Do not use this tool for unauthorized competitive intelligence or data resale without consent

GDPR Note: In the EU, publicly listed business contact information (company emails, phone numbers) may be processed for legitimate business purposes. Personal email addresses require a valid legal basis under GDPR Article 6. Always consult a legal professional for your specific use case.


🀝 Support & Feedback

  • Bug report? Open a GitHub issue or contact via the Apify actor page
  • Feature request? Drop a suggestion in the Apify Community forum
  • Works great? Please leave a ⭐ review on the Apify Store β€” it helps others find this Website Email & Contact Extractor!

Built with ❀️ on Apify · Website Email & Contact Extractor for Lead Generation
Extract emails, phones, social links & addresses from any website β€” fast, clean, and at scale