Website Email Scraper - All Contacts avatar

Website Email Scraper - All Contacts

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Website Email Scraper - All Contacts

Website Email Scraper - All Contacts

Extract emails from websites. This Apify actor crawls pages to discover media links with configurable depth, proxy support & domain filtering. Boost content research & lead gen.

Pricing

from $2.00 / 1,000 results

Rating

4.0

(3)

Developer

The Netaji

The Netaji

Maintained by Community

Actor stats

14

Bookmarked

1.2K

Total users

76

Monthly active users

1.6 hours

Issues response

11 days ago

Last modified

Share

Website Email & Contact Extractor v2.1

🔍 Overview

Website Email & Contact Extractor is an Apify actor that crawls websites and extracts contact information in a clean, consistent format. It finds emails, phone numbers, social media profiles, and physical addresses — perfect for lead generation, sales outreach, local SEO, and market research.

✨ Key Features

  • Contact-First Output: Emails, phones, social profiles, and addresses in one consistent schema
  • Social Platform Detection: Automatically identifies LinkedIn, Instagram, Twitter/X, TikTok, YouTube, Facebook, GitHub, Telegram, WhatsApp, Pinterest, Snapchat, and Reddit
  • Cloudflare Protection Bypass: Decodes Cloudflare-obfuscated email addresses
  • Text + DOM Extraction: Finds contacts in visible text, mailto: / tel: links, structured markup, and social links
  • Adaptive Stealth Browser: Auto-escalates to a headless browser when pages block normal requests
  • Domain Filtering: Stay on the same domain or crawl freely
  • Consistent Schema: Every result has the same 9 fields, with null for absent values

🎯 Use Cases

  • Lead Generation: Build lists of sales prospects from company websites
  • Sales Outreach: Extract decision-maker emails and LinkedIn profiles
  • Local SEO: Collect NAP (Name, Address, Phone) data
  • Market Research: Map social presence across competitor sites
  • Recruiting: Find contact details and social profiles of teams

🛠️ Input Parameters

{
"startUrls": [{ "url": "https://example.com" }],
"mediaType": "all",
"maxCrawlDepth": 2,
"maxConcurrency": 10,
"maxRequestRetries": 3,
"maxUrlsToCrawl": 100,
"useProxy": {
"useApifyProxy": false,
"apifyProxyGroups": [],
"apifyProxyCountry": ""
}
}

Parameter Details

ParameterTypeDescription
startUrlsArrayList of URLs where the crawler will begin
mediaTypeStringContact type: all, contact, email, phone, social, or address
maxCrawlDepthNumberHow many links deep the crawler will go
maxConcurrencyNumberMaximum parallel requests
maxRequestRetriesNumberNumber of retry attempts for failed requests
maxUrlsToCrawlNumberMaximum number of pages to process
useProxyObjectConfiguration for Apify proxy usage
useStealthBooleanAuto-escalate to stealth browser when blocked; auto-enables proxy if none set
solveCloudflareBooleanAutomatically solve Cloudflare challenges
includeContactTextBooleanScan visible page text for contacts not wrapped in links
groupByPageBooleanCombine all contacts from one page into a single dataset item (default: true)

📊 Output Format

By default (groupByPage: true) the actor outputs one item per crawled page, combining all contacts found on that page. Set groupByPage: false to emit one flat item per contact instead.

Grouped output (default)

{
"sourceUrl": "https://apify.com/",
"pageTitle": "Apify: Full-stack web scraping and data extraction platform",
"emails": ["hello@apify.com"],
"phones": ["+1-234-567-8900"],
"socials": {
"github": [
{ "url": "https://github.com/apify", "handle": "apify" }
],
"twitter": [
{ "url": "https://twitter.com/apify", "handle": "apify" }
]
},
"addresses": ["123 Main St, Los Angeles, CA"],
"foundAt": "2026-06-15T15:40:51.184Z"
}

Flat output (groupByPage: false)

Every item uses the same 9-field schema:

{
"type": "contact",
"contactType": "email",
"value": "info@example.com",
"url": null,
"socialPlatform": null,
"socialHandle": null,
"sourceUrl": "https://example.com/contact",
"pageTitle": "Contact Us",
"foundBy": "mailto",
"foundAt": "2026-06-15T04:01:58.105Z"
}

Output fields (flat mode)

FieldDescription
typeAlways "contact"
contactTypeemail, phone, social, or address
valueThe extracted contact value
urlWeb URL for social profiles; null for other types
socialPlatformPlatform name for social items; null otherwise
socialHandleUsername/handle for social items; null otherwise
sourceUrlPage where the contact was found
pageTitleTitle of the source page
foundByDetection method: dom, mailto, tel, text-scan, cfemail
foundAtISO-8601 timestamp

Examples

{
"type": "contact",
"contactType": "email",
"value": "info@eversquaremedical.ca",
"url": null,
"socialPlatform": null,
"socialHandle": null,
"sourceUrl": "https://www.eversquaremedical.ca/",
"pageTitle": "Ever Square Medical",
"foundBy": "mailto",
"foundAt": "2026-06-15T04:01:58.105Z"
}

Phone from a tel: link

{
"type": "contact",
"contactType": "phone",
"value": "310-929-6336",
"url": null,
"socialPlatform": null,
"socialHandle": null,
"sourceUrl": "https://www.conciergehealthcarepartnersinc.com/",
"pageTitle": "Concierge Healthcare Partners",
"foundBy": "tel",
"foundAt": "2026-06-15T04:01:58.084Z"
}

Social profile

{
"type": "contact",
"contactType": "social",
"value": "https://www.instagram.com/example",
"url": "https://www.instagram.com/example",
"socialPlatform": "instagram",
"socialHandle": "example",
"sourceUrl": "https://example.com/about",
"pageTitle": "About Us",
"foundBy": "dom",
"foundAt": "2026-06-15T04:01:58.200Z"
}

💡 Best Practices

  • Start Small: Begin with a low maxUrlsToCrawl value to test results
  • Use Stealth for Protected Sites: Enable useStealth and solveCloudflare for Cloudflare-protected sites. Stealth auto-enables an Apify datacenter proxy if you do not provide one. The actor rotates datacenter IPs on blocks and only escalates to expensive residential proxies after repeated consecutive blocks on the same domain.
  • Optimize Depth: Most contact info is found within 1–2 levels of crawl depth
  • Target Specific Contact Types: Use mediaType to focus on emails, phones, or socials
  • Respect Websites: Use reasonable maxConcurrency values to avoid overloading sites

📚 Examples

Extract emails only

{
"startUrls": [{ "url": "https://company.com" }],
"mediaType": "email",
"maxCrawlDepth": 2,
"maxUrlsToCrawl": 50
}

Extract all contact types

{
"startUrls": [{ "url": "https://company.com" }],
"mediaType": "all",
"maxCrawlDepth": 2,
"maxUrlsToCrawl": 100,
"includeContactText": true
}

Collect social media profiles

{
"startUrls": [{ "url": "https://company.com" }],
"mediaType": "social",
"maxCrawlDepth": 1,
"maxUrlsToCrawl": 50
}

⚙️ Technical Implementation

The actor uses multiple extraction strategies:

  1. DOM Selectors: mailto:, tel:, social links, and structured markup
  2. Text Scanning: Regex over visible page text
  3. Cloudflare Decode: Reverses data-cfemail obfuscation
  4. Adaptive Escalation: Rotates datacenter IPs on blocks; only falls back to residential stealth when a domain repeatedly fails with datacenter proxies

📈 Performance Considerations

  • Processing speed depends on website complexity and response times
  • Typical extraction rates: 5–10 pages per second without proxy, 2–5 pages per second with proxy
  • Memory usage scales with concurrency and page complexity
  • The actor uses datacenter proxies by default and escalates to residential proxies only when necessary, keeping costs low for most contact-extraction tasks

🔗 Integration Ideas

  • Connect with Apify Storage for permanent dataset archiving
  • Combine with Google Sheets integration for easy team collaboration
  • Use with Zapier or Make to automate outreach workflows