Website Email & Contact Extractor: Lead Generation Tool
Pricing
$9.99/month + usage
Website Email & Contact Extractor: Lead Generation Tool
Extract emails, phone numbers and social media links from any website. Auto-scans homepage plus contact and about pages. Returns verified leads with LinkedIn, Twitter, Instagram profiles. Perfect for B2B outreach and lead generation.
Pricing
$9.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
π§ Website Email & Contact Extractor: Lead Generation Tool
Instantly extract emails, phone numbers, social media links, and addresses from any website. The most reliable Website Email & Contact Extractor on Apify β built for lead generation, B2B prospecting, and sales outreach at scale. No login, no API keys, no limits.
π Table of Contents
- What Is This Actor?
- Why Choose This Website Email & Contact Extractor?
- Use Cases
- What Data Is Extracted?
- How It Scans Each Website
- Output Fields (Full Reference)
- Input Parameters
- Example Inputs & Outputs
- Keyword Filtering
- Social Media Link Extraction
- Address Extraction
- Status Codes Explained
- Proxy Configuration
- Performance & Rate Limits
- FAQ
- Changelog
- Legal & Terms of Use
π What Is This Actor?
Website Email & Contact Extractor is a production-grade Apify actor that crawls any website and extracts all publicly available contact information β emails, phone numbers, social media profile links, and physical addresses β from the homepage and up to 5 sub-pages (contact, about, team, impressum) per domain.
This Website Email & Contact Extractor is designed for sales teams, marketing agencies, recruiters, and developers who need to build targeted lead lists at scale without manually visiting each website. Provide a list of URLs and receive back a clean, structured dataset of contact data β ready for CRM import, outreach campaigns, or market research.
Whether you need to extract 10 emails from a niche competitor list or scrape contact data from 1,000 company websites for a B2B lead generation campaign, this actor handles it reliably with smart deduplication, false-positive filtering, and residential proxy support.
π Why Choose This Website Email & Contact Extractor?
| Feature | This Actor | Manual Research | Generic Scrapers |
|---|---|---|---|
| Email extraction from any website | β mailto + text + HTML | β | β οΈ Partial |
| Phone number extraction | β tel: links + text | β | β οΈ |
| Social media profile links | β 7 platforms | β | β οΈ |
| Physical address extraction | β Schema.org + CSS | β | β |
| Multi-page scan per domain | β Up to 5 pages | β | β |
| False positive filtering | β Smart skip-list | β | β |
| Bulk URL processing | β 1β1000 sites | β | β οΈ |
| Keyword filtering | β Built-in | β | β |
| Residential proxy support | β Built-in | β | β οΈ |
| Duplicate removal | β Smart dedup | β | β |
Bottom line: This Website Email & Contact Extractor gives you verified, deduplicated contact data across all pages of a website β faster and more accurately than any manual process.
π― Use Cases
π¬ B2B Lead Generation
Use this Website Email & Contact Extractor to build targeted prospect lists for outbound sales campaigns. Feed in a list of company websites in your target industry and extract all decision-maker contact emails, direct phone lines, and LinkedIn profiles automatically.
π’ Sales Prospecting & CRM Building
Extract contact data from hundreds of business websites in your target segment. Enrich your CRM with emails, phones, and social links without manual data entry. This Website Email & Contact Extractor turns raw URL lists into actionable sales intelligence.
π£ Marketing & Outreach Campaigns
Build cold email lists, identify influencer contacts, or find PR and media contacts from publisher websites. This contact extractor finds every publicly listed email address across the homepage, contact page, and about page.
π Competitor Research
Scrape contact pages from competitor websites to understand their support structure, regional offices, and social media presence. The Website Email & Contact Extractor maps out the full contact footprint of any business.
π€ Partnership & Sponsorship Outreach
Extract partnership or sponsorship contact emails from brand websites and media companies. Find the right contact quickly without browsing through multiple pages manually.
πΌ Recruitment & HR
Extract contact emails from company career and team pages for headhunting and direct outreach. This Website Email & Contact Extractor finds emails even when they are not in an obvious mailto: link β including plain-text and obfuscated addresses in page content.
π Local Business Data Collection
Collect phone numbers, addresses, and email addresses from local business websites for directory building, market mapping, or regional sales coverage analysis.
π Market Research & Data Enrichment
Enrich a dataset of company domains with contact data. Match extracted emails with LinkedIn profiles via social_links output, verify physical addresses, and cross-reference phone numbers β all in one run.
π What Data Is Extracted?
This Website Email & Contact Extractor pulls the following contact data from each website:
βοΈ Email Addresses
- Extracted from
mailto:links (highest accuracy) - Extracted from visible page text using regex pattern matching
- Extracted from raw HTML source including obfuscated patterns
- False positives filtered out: no-reply, example.com, asset files, CDN domains, tracking pixels
π Phone Numbers
- Extracted from
tel:links (highest accuracy β exact format preserved) - Extracted from visible page text supporting all international formats
- Validated by digit count (7β15 digits β rejects ZIP codes and short strings)
π Social Media Profile Links
Extracted from all pages for these 7 platforms:
- LinkedIn β company pages and personal profiles
- Facebook β business pages
- Twitter / X β brand accounts
- Instagram β business profiles
- YouTube β channels
- TikTok β brand accounts
- GitHub β organization and user profiles
π Physical Address
- Extracted via Schema.org
PostalAddressmicrodata (highest accuracy) - Falls back to CSS class/ID patterns like
.address,.location
π·οΈ Company Name
- Extracted from
og:site_namemeta tag (most accurate) - Falls back to
<title>tag with common suffixes stripped - Falls back to domain name if nothing else is found
βοΈ How It Scans Each Website
This Website Email & Contact Extractor uses a smart multi-page crawl strategy to maximize contact data coverage:
Phase 1 β Homepage Scan
The actor fetches the homepage and immediately extracts all emails, phones, social links, and address data. Most business websites list at least their social profiles and support email on the homepage.
Phase 2 β Contact Page Discovery
Using keyword matching on link text and href attributes, the actor finds internal links to contact-related pages such as: contact, about, team, reach, connect, get-in-touch, support, imprint, impressum, legal.
Phase 3 β Sub-Page Scanning
The actor visits up to pages_to_scan - 1 discovered contact pages (default: 4 additional pages after homepage). Contact pages almost always contain the most complete and accurate contact data.
Phase 4 β Deduplication & Assembly
All extracted emails, phones, socials, and addresses are deduplicated across all scanned pages and assembled into one clean record per domain. A status flag (Verified, Partial, No Data) is assigned based on completeness.
Smart Filtering
- Email false positives removed: CDN domains, asset domains, schema.org, no-reply addresses, and common test/example addresses are all excluded automatically
- Phone false positives removed: Strings shorter than 7 digits or longer than 15 digits are rejected
- Social link false positives removed: Share buttons, login URLs, dialog intents, and plugin embeds are all excluded
π Output Fields (Full Reference)
Each record produced by this Website Email & Contact Extractor contains:
| Field | Type | Description | Example |
|---|---|---|---|
domain | string | Website domain | "www.example.com" |
company_name | string | Extracted company name | "Acme Corporation" |
emails | array | All unique emails found (max 20) | ["info@example.com", "sales@example.com"] |
phone_numbers | array | All unique phone numbers found (max 10) | ["+1 (800) 555-0100", "+44 20 7946 0958"] |
social_links | object | Social media profile URLs by platform | {"linkedin": "https://linkedin.com/company/...", "twitter": "https://x.com/..."} |
address | string | Physical address (if found) | "123 Main St, New York, NY 10001, US" |
source_url | string | The input URL that was scraped | "https://www.example.com" |
pages_scanned | integer | Number of pages crawled for this domain | 4 |
status | string | Data completeness status | "Verified", "Partial", "No Data" |
extracted_at | string | ISO timestamp of extraction | "2024-11-01T10:30:00Z" |
Social Links Object Structure
{"linkedin": "https://www.linkedin.com/company/example-corp","facebook": "https://www.facebook.com/ExampleCorp","twitter": "https://x.com/ExampleCorp","instagram": "https://www.instagram.com/examplecorp","youtube": "https://www.youtube.com/@ExampleCorp","tiktok": "https://www.tiktok.com/@examplecorp","github": "https://github.com/example-corp"}
Only platforms where a profile link is found are included. Missing platforms are simply absent from the object.
βοΈ Input Parameters
{"target_urls": ["https://www.hubspot.com","https://www.salesforce.com","https://www.mailchimp.com"],"target_url": "","keyword": "","pages_to_scan": 5,"max_results": 50,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
| Parameter | Type | Default | Description |
|---|---|---|---|
target_urls | array or string | [] | List of website URLs to extract contact data from. One URL per item or newline-separated string. |
target_url | string | "" | Single website URL shortcut β automatically added to target_urls |
keyword | string | "" | Optional filter β only return results where this keyword appears in domain, company name, email, or address |
pages_to_scan | integer | 5 | Number of pages to scan per website (1 = homepage only; 5 = homepage + 4 contact sub-pages) |
max_results | integer | 50 | Maximum number of websites to process in one run |
proxyConfiguration | object | Off | Apify proxy config β recommended for large-scale extraction |
π¦ Example Inputs & Outputs
Example 1: Extract Contact from a Single Website
Input:
{"target_url": "https://www.hubspot.com","pages_to_scan": 5}
Output:
{"domain": "www.hubspot.com","company_name": "HubSpot","emails": ["press@hubspot.com","legal@hubspot.com","privacy@hubspot.com"],"phone_numbers": ["+1 (888) 482-7768"],"social_links": {"linkedin": "https://www.linkedin.com/company/hubspot","facebook": "https://www.facebook.com/hubspot","twitter": "https://x.com/HubSpot","instagram": "https://www.instagram.com/hubspot","youtube": "https://www.youtube.com/@HubSpot"},"address": "25 First Street, 2nd Floor, Cambridge, MA 02141, United States","source_url": "https://www.hubspot.com","pages_scanned": 4,"status": "Verified","extracted_at": "2024-11-01T10:30:00Z"}
Example 2: Bulk Lead Generation β B2B List
Input:
{"target_urls": ["https://www.company1.com","https://www.company2.io","https://www.agency3.co.uk","https://www.startup4.com","https://www.firm5.de"],"pages_to_scan": 4,"max_results": 5}
Output: 5 records β one per domain β each containing deduplicated emails, phones, social links, addresses, company names, and status codes. Ready for direct CRM import.
Example 3: Keyword-Filtered Extraction
Input:
{"target_urls": ["https://www.agency1.com", "https://www.agency2.com", "https://www.tech3.com"],"keyword": "marketing","pages_to_scan": 3}
Behavior: Only returns records where the word "marketing" appears in the domain name, company name, email addresses, or physical address. Websites that do not match are skipped with a log message.
Example 4: Homepage-Only Quick Scan
Input:
{"target_urls": ["https://site1.com", "https://site2.com", "https://site3.com"],"pages_to_scan": 1}
Use this for: Large lists where speed matters more than completeness. Setting pages_to_scan: 1 only scans the homepage and skips contact/about page discovery entirely.
π Keyword Filtering
The keyword parameter lets you filter results at extraction time β saving you the need to filter the dataset manually after the run.
When keyword is set, the actor checks whether the keyword appears (case-insensitive) in any of these fields for each record:
domaincompany_nameemails(joined as a string)address
If the keyword is not found in any of these fields, the record is skipped and not included in the output.
Example use cases for keyword filtering:
keyword: "sales"β only return companies with "sales" in their domain or emailkeyword: "london"β only return companies with London in their addresskeyword: "@agency.com"β only return records with a specific domain email formatkeyword: "recruiting"β filter for HR-related websites
π Social Media Link Extraction
This Website Email & Contact Extractor searches the full raw HTML of every scanned page for social media profile URLs matching patterns for 7 platforms. Extracted links are cleaned and validated before inclusion:
- Tracking parameters removed from social URLs
- Share buttons filtered out β
/share,/sharer,/intent/tweet,/dialog/feedpaths are all excluded - Login and signup pages excluded β
/login,/signuppaths filtered - Only one link per platform β the first valid profile URL found per platform is returned
- Trailing slashes normalized β for consistent formatting
Social links are returned as a flat object with platform names as keys, making them easy to map into CRM fields or LinkedIn enrichment workflows.
π Address Extraction
Physical address extraction uses a two-tier approach:
Tier 1 β Schema.org Structured Data (Highest Accuracy)
Websites using PostalAddress or LocalBusiness microdata markup have structured address fields that the actor reads directly: streetAddress, addressLocality, addressRegion, postalCode, addressCountry. This is common on e-commerce sites, service businesses, and sites built with modern CMS platforms.
Tier 2 β CSS Pattern Matching (Fallback)
For sites without Schema.org markup, the actor searches for common address containers using selectors like .address, .location, [class*='address'], [id*='address']. Text is extracted and length-validated (10β200 characters) to filter out noise.
Address extraction works best on business websites that display their address in a footer, contact page, or sidebar. It may not work for websites that intentionally hide or obfuscate their physical location.
π·οΈ Status Codes Explained
Every record from this Website Email & Contact Extractor includes a status field:
| Status | Meaning |
|---|---|
"Verified" | At least one email and at least one phone number were found |
"Partial" | Some contact data found (emails only, phones only, or socials only β but not all) |
"No Data" | No emails, phones, or social links were found on any scanned page |
Use the status field to prioritize outreach: "Verified" records are the highest-confidence leads. "Partial" records may still be useful for social outreach even without a direct email.
π Proxy Configuration
Recommended Setup
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
When to Use Proxy
- Running bulk extraction across 50+ websites
- Scraping websites that block known datacenter IPs (media sites, legal firms, financial services)
- Scraping websites in regions with geo-restricted access
- Any production-scale lead generation run
When Proxy Is Optional
- Small runs of under 10 websites
- Scraping small business or startup websites
- Quick one-off lookups on a single domain
The actor uses curl_cffi with Chrome 110 browser impersonation, which already bypasses most bot detection for small volumes without requiring a proxy.
β‘ Performance & Rate Limits
Speed Benchmarks
| Mode | Websites | Pages/Site | Estimated Time |
|---|---|---|---|
| Single site, full scan | 1 | 5 | ~15β30 seconds |
| Small batch | 10 | 5 | ~2β4 minutes |
| Medium batch | 50 | 3 | ~8β12 minutes |
| Large batch | 200 | 2 | ~25β35 minutes |
| Homepage only, large batch | 100 | 1 | ~8β12 minutes |
Reliability Features
- 1.5β3 second delay between each website request to avoid rate limiting
- 0.5β1.5 second delay between sub-pages within the same domain
- Chrome 110 browser impersonation via
curl_cffito minimize bot detection - Graceful failure handling β failed websites are logged and skipped, never crashing the run
- Smart false-positive filtering on both emails and phone numbers for clean output
β FAQ
Q: What types of email addresses are extracted?
A: All publicly visible emails β from mailto: links (highest accuracy), from page text (regex), and from raw HTML. False positives like CDN domains, asset file paths, tracking pixel emails, example.com addresses, and no-reply addresses are automatically filtered out.
Q: Can this extract emails hidden behind JavaScript or contact forms? A: No. This actor extracts emails that are present in the HTML source code. Emails that only appear after JavaScript execution or that are submitted via contact forms are not accessible without a full browser renderer.
Q: Why are some phone numbers showing unusual formats?
A: Phone numbers are extracted as-is from page text to preserve their original formatting (which varies by country). You can normalize them post-extraction using a phone parsing library like libphonenumber.
Q: What happens if a website has no contact page?
A: The actor scans the homepage only. If no contact sub-pages are discovered via link analysis, only homepage data is extracted. The record is returned with pages_scanned: 1.
Q: Can I use this for Gmail, Outlook, or other webmail login pages? A: No. This Website Email & Contact Extractor only works with public-facing business websites. Login-gated pages, authenticated pages, and private intranets are not accessible.
Q: Is extracted data deduplicated? A: Yes. Emails and phone numbers are deduplicated across all pages scanned for a given domain β you will never see the same email twice in a single record.
Q: How accurate is the company name extraction?
A: Very high for websites using the og:site_name meta tag (most modern sites). For older sites, it falls back to the <title> tag with common separators stripped. As a last resort, it uses the domain name formatted as a title.
Q: Can I process 1,000 websites in one run?
A: Yes. Set max_results: 1000 and provide 1,000 URLs. For very large runs, residential proxy is strongly recommended. Estimated time: 3β5 hours depending on pages_to_scan setting.
Q: Does this work on non-English websites?
A: Yes. Email and phone extraction uses language-agnostic regex patterns. Contact page discovery includes German keywords (kontakt, impressum) in addition to English ones. Address extraction via Schema.org works regardless of the page language.
Q: What is the pages_to_scan parameter and what should I set it to?
A: It controls how many pages are scanned per website. 1 = homepage only (fastest). 5 = homepage + up to 4 sub-pages (most complete). For lead generation, 3β5 is recommended. For large bulk runs where speed matters, 1β2 is better.
π Changelog
v1.0.0 (Current)
- β
Email extraction from
mailto:links, page text, and raw HTML - β
Phone extraction from
tel:links and international format text patterns - β Social media link extraction for 7 platforms (LinkedIn, Facebook, Twitter/X, Instagram, YouTube, TikTok, GitHub)
- β Physical address extraction via Schema.org and CSS fallback
- β
Company name extraction via
og:site_name,<title>, and domain fallback - β Multi-page scan per domain with automatic contact page discovery
- β Smart false-positive filtering for emails and phones
- β Social link cleaning (removes share buttons, login redirects, tracking params)
- β Keyword filtering support
- β
Verified/Partial/No Datastatus scoring per record - β
Residential proxy support via
curl_cffiChrome 110 impersonation - β 1.5β3 second random delay between requests for safe rate limiting
- β Bulk processing up to 1,000 websites per run
βοΈ Legal & Terms of Use
This Website Email & Contact Extractor collects contact data that is publicly visible on business websites β the same information a person would see when visiting the site in a browser.
Please follow these guidelines:
- Only extract contact data from websites you have a legitimate reason to access
- Use extracted emails for opted-in outreach only where permitted by applicable law
- Comply with CAN-SPAM, GDPR, CASL, and other applicable email and data protection regulations in your jurisdiction
- Do not use extracted data to send spam, unsolicited bulk emails, or harassing messages
- Respect the
robots.txtfile and Terms of Service of each website you scrape - Do not use this tool for unauthorized competitive intelligence or data resale without consent
GDPR Note: In the EU, publicly listed business contact information (company emails, phone numbers) may be processed for legitimate business purposes. Personal email addresses require a valid legal basis under GDPR Article 6. Always consult a legal professional for your specific use case.
π€ Support & Feedback
- Bug report? Open a GitHub issue or contact via the Apify actor page
- Feature request? Drop a suggestion in the Apify Community forum
- Works great? Please leave a β review on the Apify Store β it helps others find this Website Email & Contact Extractor!
Built with β€οΈ on Apify Β· Website Email & Contact Extractor for Lead Generation
Extract emails, phones, social links & addresses from any website β fast, clean, and at scale