Website Contact Scraper – Email, Phone & Social Extractor avatar

Website Contact Scraper – Email, Phone & Social Extractor

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Website Contact Scraper – Email, Phone & Social Extractor

Website Contact Scraper – Email, Phone & Social Extractor

Extract emails, phone numbers and social links (LinkedIn, Instagram, X/Twitter, Facebook, YouTube) from any website. Auto-detects Contact/About pages (depth 1) and returns clean JSON per domain. Great for B2B lead gen, outreach, CRM enrichment and research.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

2

Bookmarked

208

Total users

54

Monthly active users

3 days ago

Last modified

Share

Website Contact Scraper — Email, Phone & Social Media Extractor

Website Contact Scraper – Email, Phone & Social Extractor

Extract emails, phone numbers, LinkedIn, Instagram, Twitter/X, Facebook, and YouTube links from any website automatically. The fastest B2B lead scraper on Apify — crawls home pages and Contact/About pages in seconds, with zero manual work. Built for sales teams, growth hackers, recruiters, and marketing agencies.


What Is This Actor?

Finding contact information on websites is tedious, repetitive, and doesn't scale. This actor automates it entirely. Give it a list of websites and it returns every email address, phone number, and social media profile it can find — from the homepage and automatically-detected contact pages alike.

Built for:

  • 🏢 B2B lead generation — build outreach lists from target company websites
  • 📊 Sales prospecting — enrich your CRM with verified contact data
  • 🔍 Competitive research — map competitor social presence and contact channels
  • 🤝 Recruiting & HR — find direct contact details for hiring managers
  • 📣 Marketing agencies — gather client prospect data at scale
  • 🔗 Data enrichment — add contact fields to existing domain lists

Features

  • Email extraction — finds all email addresses in page HTML, including obfuscated formats
  • Phone extraction — parses tel: links (high precision) and international +CC numbers from page text
  • Social media links — automatically detects LinkedIn, Twitter/X, Instagram, Facebook, and YouTube profiles
  • Smart contact page detection — automatically crawls /contact, /contact-us, /about, /about-us, /team, and localized equivalents (/iletisim, /hakkimizda, etc.)
  • Two crawl modes — fast HTTP-only mode (CheerioCrawler) for most sites, Playwright fallback for JavaScript-rendered pages
  • Dual-mode intelligence — start with the cheap mode and switch to the JS browser only when needed
  • Batch dataset writes — efficient memory use for large runs
  • Proxy support — built-in Apify Proxy integration to avoid IP blocks
  • High concurrency — up to 20 parallel requests in HTTP mode for maximum throughput
  • Anti-detection — browser fingerprinting enabled in Playwright mode

Output Data

Each record in the dataset represents one scraped page (homepage or contact/about page).

FieldTypeDescription
urlstringThe exact URL that was scraped
rootDomainstringRoot domain extracted from the start URL (e.g. acme.com)
pageTypestring"Home" for start URLs, "Contact/About" for auto-detected pages
pageTitlestringHTML <title> of the page
metaDescriptionstring<meta name="description"> content
emailsarrayList of unique email addresses found on the page
phonesarrayList of phone numbers found via tel: links and international text matches
socials.linkedinstring | nullLinkedIn company or personal profile URL
socials.twitterstring | nullTwitter / X profile URL
socials.instagramstring | nullInstagram profile URL
socials.facebookstring | nullFacebook page URL
socials.youtubestring | nullYouTube channel URL
scrapedAtstringISO 8601 timestamp of when the record was scraped

Sample Output Record

{
"url": "https://acmecorp.io/contact",
"rootDomain": "acmecorp.io",
"pageType": "Contact/About",
"pageTitle": "Contact Us — Acme Corp",
"metaDescription": "Get in touch with the Acme Corp team.",
"emails": ["hello@acmecorp.io", "sales@acmecorp.io"],
"phones": ["+14155551234", "+442071234567"],
"socials": {
"linkedin": "https://www.linkedin.com/company/acme-corp",
"twitter": "https://twitter.com/acmecorp",
"instagram": "https://www.instagram.com/acmecorp",
"facebook": "https://www.facebook.com/acmecorp",
"youtube": null
},
"scrapedAt": "2025-05-15T14:22:10.000Z"
}

Input Configuration

startUrls · array · required

A list of websites to scrape. Supports the full Apify requestListSources format — paste URLs directly or upload a CSV.

Accepted formats:

[
{ "url": "https://stripe.com" },
{ "url": "https://vercel.com" },
{ "url": "https://acmecorp.io" }
]

Each entry is treated as a root URL. The actor will scrape that page and, if maxDepth is 1, automatically enqueue matching contact/about sub-pages on the same domain.


maxDepth · integer · 0 or 1 · default: 1

Controls how many levels deep the actor crawls.

ValueBehavior
0Only scrapes the provided start URL (homepage only)
1Also crawls auto-detected Contact, About, and Team pages

Recommended: Keep at 1 to maximize contact data found. Set to 0 for speed when you only want homepage-level social links.

The actor looks for sub-pages matching these URL patterns:

/contact, /contact-us, /about, /about-us, /team,
/iletisim, /hakkimizda, /bize-ulasin, /reach-us, /reach-out

maxRequestsPerCrawl · integer · default: 200

A safety cap on the total number of pages fetched across all start URLs in a single run. Prevents runaway crawls on very large sites.

For a list of 50 domains with maxDepth: 1, a value of 200 is typically sufficient (each site contributes ~2–4 requests). Scale up for larger batches.


useJsBrowser · boolean · default: false

Selects which crawl engine to use.

ModeEngineSpeedCostUse When
false (default)CheerioCrawler (HTTP)⚡ Very fast💚 ~200x cheaperMost B2B sites, standard HTML pages
truePlaywrightCrawler (browser)🐢 Slower🔴 Higher costReact/Vue/Angular SPAs, JS-rendered pages

Rule of thumb: Start with false. If emails and phones come back empty on a site you know has them, switch to true for that domain.

In Playwright mode, the actor automatically blocks images, fonts, CSS, video, and third-party analytics scripts to minimize cost and latency even in browser mode.


proxyConfiguration · object · default: Apify Proxy enabled

Configures the proxy used for all HTTP requests.

{ "useApifyProxy": true }

Using a proxy is recommended for large runs to avoid rate limiting and IP blocks, especially when scraping hundreds of domains.


Usage Examples

Example 1 — Scrape a single company

{
"startUrls": [{ "url": "https://stripe.com" }],
"maxDepth": 1,
"maxRequestsPerCrawl": 20,
"useJsBrowser": false,
"proxyConfiguration": { "useApifyProxy": true }
}

Example 2 — Bulk scrape a prospect list

{
"startUrls": [
{ "url": "https://acmecorp.io" },
{ "url": "https://globaltech.com" },
{ "url": "https://startupxyz.io" },
{ "url": "https://saasfirm.co" }
],
"maxDepth": 1,
"maxRequestsPerCrawl": 500,
"useJsBrowser": false,
"proxyConfiguration": { "useApifyProxy": true }
}

Example 3 — Homepage-only quick scan (socials only)

{
"startUrls": [
{ "url": "https://vercel.com" },
{ "url": "https://railway.app" }
],
"maxDepth": 0,
"maxRequestsPerCrawl": 50,
"useJsBrowser": false,
"proxyConfiguration": { "useApifyProxy": false }
}

Example 4 — JavaScript-heavy site

{
"startUrls": [{ "url": "https://some-react-app.io" }],
"maxDepth": 1,
"maxRequestsPerCrawl": 20,
"useJsBrowser": true,
"proxyConfiguration": { "useApifyProxy": true }
}

How It Works

Step 1 — Start URL Processing

The actor fetches each URL from startUrls. The page is loaded via CheerioCrawler (HTTP) or PlaywrightCrawler (browser) depending on useJsBrowser.

Step 2 — Data Extraction

From every page, the actor extracts:

Emails:
Regex scanned across the full HTML source. Duplicates and file extension false positives (e.g. name@file.png) are filtered out. Results are lowercased and deduplicated.

Phones:
Two-source strategy for maximum precision with minimum noise:

  • tel: links → parsed directly from <a href="tel:..."> elements (highest precision — site-declared)
  • Free text → only matches internationally-formatted numbers (+CC ...) to avoid false positives from prices, dates, and IDs

Social Media:
All <a href> elements are checked for known domain patterns:

  • linkedin.com/company/ or linkedin.com/in/
  • twitter.com/ or x.com/
  • instagram.com/
  • facebook.com/
  • youtube.com/

The first match per platform is recorded.

Step 3 — Contact Page Discovery (maxDepth: 1)

After processing the homepage, the actor enqueues sub-pages matching contact/about URL globs on the same domain. Each discovered sub-page goes through the same extraction process and is saved as a separate record tagged "pageType": "Contact/About".

Step 4 — Batched Dataset Write

Results are buffered in memory (batch size: 20 records) and pushed to the Apify Dataset in chunks to minimize API overhead on large runs.

startUrls
Fetch page (HTTP or Browser)
├── Extract: emails, phones, socials, title, meta
├── Push to Dataset
└── maxDepth=1? ──► Enqueue /contact, /about, /team pages
Fetch sub-page
Extract + Push

Crawl Modes In Detail

Mode 1: CheerioCrawler (HTTP-only, default)

  • Downloads raw HTML over HTTP — no browser process
  • Parses HTML with cheerio (server-side jQuery-like API)
  • Cost: ~0.002 ACU per 1,000 pages
  • Speed: Up to 20 concurrent requests
  • Best for: 90%+ of B2B company websites (server-rendered HTML)

Mode 2: PlaywrightCrawler (JS browser fallback)

  • Launches Chromium browser via Playwright
  • Waits for JavaScript to execute before extracting content
  • Cost: ~0.4 ACU per 1,000 pages (~200× more expensive than HTTP mode)
  • Speed: Up to 5 concurrent browser contexts
  • Best for: React, Vue, Angular, Next.js SPAs where HTML is rendered client-side
  • Optimizations active:
    • Blocks images, fonts, CSS, video, audio, PDFs, ZIPs
    • Blocks Google Analytics, Google Tag Manager, Hotjar, Intercom, Zendesk
    • Browser fingerprinting enabled for anti-detection
    • Minimal Chromium flags for low memory usage

Performance & Cost Estimates

ScenarioModePagesEst. TimeEst. Cost
10 domains, depth 1HTTP~30< 30 sec< $0.01
100 domains, depth 1HTTP~300~2 min~$0.05
500 domains, depth 1HTTP~1,500~10 min~$0.25
100 domains, JS modePlaywright~300~15 min~$1.00

Costs are estimates based on Apify ACU pricing. Actual cost depends on page size, proxy usage, and server response time.


Export Formats

Download your leads from the Apify Dataset in:

  • JSON — nested structure including emails, phones, and socials arrays
  • CSV — flat table; array fields are comma-joined strings, ready for Excel or Google Sheets
  • Excel (.xlsx) — native spreadsheet format
  • JSONL — one record per line, ideal for CRM imports and pipeline ingestion

Navigate to Storage → Dataset → Export in the Apify Console.


Tips for Best Results

Getting empty results?

  • Try enabling useJsBrowser: true — the site may render content client-side
  • Check that the domain is publicly accessible (no login wall)
  • Some sites load contact info via async API calls; Playwright mode handles these better

Getting too many false-positive emails?

  • The actor already filters out asset-extension strings and overly long matches
  • Post-process with a simple regex check (MX record validation, format validation) in your pipeline

Phone numbers missing?

  • The actor only captures tel: links and clearly international +CC formatted numbers from text
  • This is intentional — bare digit strings (local format numbers) are indistinguishable from prices and IDs
  • For maximum recall on phone numbers, use Playwright mode so tel: links rendered by JavaScript are also captured

Scaling to thousands of domains?

  • Use the Apify Scheduler to run batches of 500 domains per run
  • Or use the Apify API to trigger runs dynamically with URL lists from your CRM or database

Limitations

  • No login support. The actor only accesses publicly available pages. Contact information behind login walls or gated portals is not accessible.
  • Email obfuscation. Some sites use JavaScript to obfuscate email addresses (e.g. rendering characters via CSS content or DOM manipulation). HTTP mode cannot capture these; Playwright mode handles most cases.
  • Phone number precision over recall. The actor intentionally limits free-text phone extraction to internationally formatted numbers (+CC...) to avoid noise. Local-format numbers (e.g. 0800 123 456) are not captured from text — only from tel: links.
  • One LinkedIn/social link per platform per page. If a page has multiple LinkedIn profiles linked, only the first match is recorded.
  • No email verification. Extracted emails are not validated for deliverability. Use a separate email verification service (e.g. ZeroBounce, NeverBounce) before sending outreach.
  • Contact page detection is pattern-based. The actor matches known URL patterns. Non-standard contact page paths (e.g. /get-in-touch, /connect) will not be auto-discovered.

Frequently Asked Questions

Q: How many domains can I scrape in one run?
There is no hard limit beyond your maxRequestsPerCrawl cap. For bulk runs, set maxRequestsPerCrawl high enough to cover all domains × expected pages per domain. A practical ceiling for a single run is ~1,000 domains in HTTP mode.

Q: Can I paste a CSV list of domains?
Yes — use the Import from text option in the Apify Console's URL input field, or use the Apify API to pass startUrls programmatically.

Q: Will it find emails in images or PDFs?
No. The actor only parses HTML text content. Emails embedded in images, scanned documents, or PDFs are not extracted.

Q: Is the data stored anywhere other than my dataset?
No. All data is written exclusively to your private Apify Dataset. Nothing is stored or shared externally.

Q: Does it handle redirect chains?
Yes. Both got-scraping (HTTP mode) and Playwright follow HTTP redirects automatically.

Q: Can I run this on a schedule?
Yes — use the Apify Scheduler to run this actor on a recurring basis (daily, weekly) to keep your contact database fresh.

Q: What if a site blocks the scraper?
Enable Apify Proxy (useApifyProxy: true). If blocks persist, try enabling useJsBrowser: true which uses browser fingerprinting to appear more human-like.

Q: Is using this scraper legal?
This actor only accesses publicly available information visible to any website visitor. You are responsible for ensuring your use of the collected data complies with applicable laws (GDPR, CAN-SPAM, CCPA) and the target website's Terms of Service. Always obtain proper consent before sending outreach to scraped contacts.


Technical Details

PropertyValue
RuntimeNode.js 18+ (ES Modules)
FrameworkApify SDK v3 + Crawlee v3
HTTP crawlerCheerioCrawler + got-scraping
Browser crawlerPlaywrightCrawler + Chromium
HTML parsercheerio (XML/HTML mode)
Max concurrency (HTTP)20 parallel requests
Max concurrency (browser)5 browser contexts
Request timeout (HTTP)30 seconds
Navigation timeout (browser)25 seconds
Dataset write strategyBatched (20 records per flush)
DeduplicationBuilt-in Crawlee request deduplication

Changelog

v1.0

  • Initial release
  • Dual-mode crawling: CheerioCrawler (default) + PlaywrightCrawler (JS fallback)
  • Email extraction with false-positive filtering
  • Phone extraction from tel: links and international text matches
  • Social media extraction: LinkedIn, Twitter/X, Instagram, Facebook, YouTube
  • Automatic contact/about page discovery (depth 1)
  • Batch dataset writes for memory efficiency
  • Playwright optimizations: asset blocking, analytics blocking, fingerprinting

Support

If you run into unexpected empty results, parsing issues, or proxy errors, please open a support ticket via the Apify Console. Include the target URL, your input configuration, and the actor run ID to help diagnose the issue quickly.