Website Contact & Lead Extractor avatar

Website Contact & Lead Extractor

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Website Contact & Lead Extractor

Website Contact & Lead Extractor

Extracts email addresses, phone numbers, physical addresses, and social media links (LinkedIn, Twitter/X, Facebook, Instagram, YouTube) from any list of websites. Features customizable crawl depth, per-domain page limits, and proxy configuration.

Pricing

from $2.00 / 1,000 results

Rating

5.0

(1)

Developer

Avira

Avira

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Website Contact Scraper Apify Actor

A production-ready website contact scraper designed to crawl websites recursively, extract contact information, and output structured data. Perfect for B2B lead generation, CRM enrichment, and automated outreach campaigns.

This Actor extracts:

  • Email addresses (filtered to remove static assets like image/font files)
  • Phone numbers (standardized and filtered for false positives)
  • Social Media links (LinkedIn, Facebook, Instagram, Twitter/X, YouTube, TikTok, GitHub, Pinterest, and Medium profiles)
  • Physical Addresses (using heuristic text checks and <address> tag parsers)

Features

  • Extremely Fast: Built with Crawlee's CheerioCrawler to parse raw HTML directly without the overhead of a headless browser.
  • 🎯 Targeted Crawling: Prioritizes contact, about, team, and help pages if crawlShortcutLinksOnly is enabled to save crawl budget.
  • ⚙️ Configurable Limits: Set max pages per domain and maximum depth to prevent runaway runs on large sites.
  • 🔄 Consolidated Output: Automatically aggregates and de-duplicates contact information collected across multiple pages of the same domain into a single clean JSON record.
  • 🛡️ Proxy Support: Fully integrated with Apify Proxy or custom proxies to bypass rate limits and geoblocks.

Input Parameters

The Actor accepts the following inputs (defined in .actor/input_schema.json):

FieldTypeRequiredDefaultDescription
startUrlsArrayYesList of site URLs to scan (e.g. [{"url": "https://crawlee.dev"}]).
maxDepthIntegerNo2Maximum crawl depth. Depth 1 is start URLs only.
maxPagesPerDomainIntegerNo20Maximum pages to crawl per start URL domain to limit budget.
crawlShortcutLinksOnlyBooleanNotrueWhen true, skips pages that do not match contact-related keywords in their path or title.
proxyConfigurationObjectNo{ "useApifyProxy": true }Apify Proxy or custom proxies configuration.

Example Input JSON

{
"startUrls": [
{ "url": "https://crawlee.dev" },
{ "url": "https://apify.com" }
],
"maxDepth": 2,
"maxPagesPerDomain": 15,
"crawlShortcutLinksOnly": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Example Output JSON

For each start URL domain, a single merged dataset row is saved:

{
"domain": "crawlee.dev",
"startUrl": "https://crawlee.dev",
"emails": [
"info@apify.com"
],
"phones": [
"+1 (234) 567-8900"
],
"socialLinks": {
"linkedin": [
"https://linkedin.com/company/apify"
],
"facebook": [
"https://facebook.com/apifytech"
],
"twitter": [
"https://twitter.com/apify"
],
"github": [
"https://github.com/apify"
]
},
"addresses": [
"Prague, Czech Republic"
],
"pagesCrawled": 3,
"crawledUrls": [
"https://crawlee.dev/",
"https://crawlee.dev/docs/introduction",
"https://crawlee.dev/about"
]
}