Website Contact Scraper – Email, Phone & Social Extractor
Pricing
from $5.00 / 1,000 results
Website Contact Scraper – Email, Phone & Social Extractor
Extract emails, phone numbers and social links (LinkedIn, Instagram, X/Twitter, Facebook, YouTube) from any website. Auto-detects Contact/About pages (depth 1) and returns clean JSON per domain. Great for B2B lead gen, outreach, CRM enrichment and research.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
2
Bookmarked
208
Total users
54
Monthly active users
3 days ago
Last modified
Categories
Share
Website Contact Scraper — Email, Phone & Social Media Extractor

Extract emails, phone numbers, LinkedIn, Instagram, Twitter/X, Facebook, and YouTube links from any website automatically. The fastest B2B lead scraper on Apify — crawls home pages and Contact/About pages in seconds, with zero manual work. Built for sales teams, growth hackers, recruiters, and marketing agencies.
What Is This Actor?
Finding contact information on websites is tedious, repetitive, and doesn't scale. This actor automates it entirely. Give it a list of websites and it returns every email address, phone number, and social media profile it can find — from the homepage and automatically-detected contact pages alike.
Built for:
- 🏢 B2B lead generation — build outreach lists from target company websites
- 📊 Sales prospecting — enrich your CRM with verified contact data
- 🔍 Competitive research — map competitor social presence and contact channels
- 🤝 Recruiting & HR — find direct contact details for hiring managers
- 📣 Marketing agencies — gather client prospect data at scale
- 🔗 Data enrichment — add contact fields to existing domain lists
Features
- Email extraction — finds all email addresses in page HTML, including obfuscated formats
- Phone extraction — parses
tel:links (high precision) and international+CCnumbers from page text - Social media links — automatically detects LinkedIn, Twitter/X, Instagram, Facebook, and YouTube profiles
- Smart contact page detection — automatically crawls
/contact,/contact-us,/about,/about-us,/team, and localized equivalents (/iletisim,/hakkimizda, etc.) - Two crawl modes — fast HTTP-only mode (CheerioCrawler) for most sites, Playwright fallback for JavaScript-rendered pages
- Dual-mode intelligence — start with the cheap mode and switch to the JS browser only when needed
- Batch dataset writes — efficient memory use for large runs
- Proxy support — built-in Apify Proxy integration to avoid IP blocks
- High concurrency — up to 20 parallel requests in HTTP mode for maximum throughput
- Anti-detection — browser fingerprinting enabled in Playwright mode
Output Data
Each record in the dataset represents one scraped page (homepage or contact/about page).
| Field | Type | Description |
|---|---|---|
url | string | The exact URL that was scraped |
rootDomain | string | Root domain extracted from the start URL (e.g. acme.com) |
pageType | string | "Home" for start URLs, "Contact/About" for auto-detected pages |
pageTitle | string | HTML <title> of the page |
metaDescription | string | <meta name="description"> content |
emails | array | List of unique email addresses found on the page |
phones | array | List of phone numbers found via tel: links and international text matches |
socials.linkedin | string | null | LinkedIn company or personal profile URL |
socials.twitter | string | null | Twitter / X profile URL |
socials.instagram | string | null | Instagram profile URL |
socials.facebook | string | null | Facebook page URL |
socials.youtube | string | null | YouTube channel URL |
scrapedAt | string | ISO 8601 timestamp of when the record was scraped |
Sample Output Record
{"url": "https://acmecorp.io/contact","rootDomain": "acmecorp.io","pageType": "Contact/About","pageTitle": "Contact Us — Acme Corp","metaDescription": "Get in touch with the Acme Corp team.","emails": ["hello@acmecorp.io", "sales@acmecorp.io"],"phones": ["+14155551234", "+442071234567"],"socials": {"linkedin": "https://www.linkedin.com/company/acme-corp","twitter": "https://twitter.com/acmecorp","instagram": "https://www.instagram.com/acmecorp","facebook": "https://www.facebook.com/acmecorp","youtube": null},"scrapedAt": "2025-05-15T14:22:10.000Z"}
Input Configuration
startUrls · array · required
A list of websites to scrape. Supports the full Apify requestListSources format — paste URLs directly or upload a CSV.
Accepted formats:
[{ "url": "https://stripe.com" },{ "url": "https://vercel.com" },{ "url": "https://acmecorp.io" }]
Each entry is treated as a root URL. The actor will scrape that page and, if maxDepth is 1, automatically enqueue matching contact/about sub-pages on the same domain.
maxDepth · integer · 0 or 1 · default: 1
Controls how many levels deep the actor crawls.
| Value | Behavior |
|---|---|
0 | Only scrapes the provided start URL (homepage only) |
1 | Also crawls auto-detected Contact, About, and Team pages |
Recommended: Keep at 1 to maximize contact data found. Set to 0 for speed when you only want homepage-level social links.
The actor looks for sub-pages matching these URL patterns:
/contact, /contact-us, /about, /about-us, /team,/iletisim, /hakkimizda, /bize-ulasin, /reach-us, /reach-out
maxRequestsPerCrawl · integer · default: 200
A safety cap on the total number of pages fetched across all start URLs in a single run. Prevents runaway crawls on very large sites.
For a list of 50 domains with maxDepth: 1, a value of 200 is typically sufficient (each site contributes ~2–4 requests). Scale up for larger batches.
useJsBrowser · boolean · default: false
Selects which crawl engine to use.
| Mode | Engine | Speed | Cost | Use When |
|---|---|---|---|---|
false (default) | CheerioCrawler (HTTP) | ⚡ Very fast | 💚 ~200x cheaper | Most B2B sites, standard HTML pages |
true | PlaywrightCrawler (browser) | 🐢 Slower | 🔴 Higher cost | React/Vue/Angular SPAs, JS-rendered pages |
Rule of thumb: Start with false. If emails and phones come back empty on a site you know has them, switch to true for that domain.
In Playwright mode, the actor automatically blocks images, fonts, CSS, video, and third-party analytics scripts to minimize cost and latency even in browser mode.
proxyConfiguration · object · default: Apify Proxy enabled
Configures the proxy used for all HTTP requests.
{ "useApifyProxy": true }
Using a proxy is recommended for large runs to avoid rate limiting and IP blocks, especially when scraping hundreds of domains.
Usage Examples
Example 1 — Scrape a single company
{"startUrls": [{ "url": "https://stripe.com" }],"maxDepth": 1,"maxRequestsPerCrawl": 20,"useJsBrowser": false,"proxyConfiguration": { "useApifyProxy": true }}
Example 2 — Bulk scrape a prospect list
{"startUrls": [{ "url": "https://acmecorp.io" },{ "url": "https://globaltech.com" },{ "url": "https://startupxyz.io" },{ "url": "https://saasfirm.co" }],"maxDepth": 1,"maxRequestsPerCrawl": 500,"useJsBrowser": false,"proxyConfiguration": { "useApifyProxy": true }}
Example 3 — Homepage-only quick scan (socials only)
{"startUrls": [{ "url": "https://vercel.com" },{ "url": "https://railway.app" }],"maxDepth": 0,"maxRequestsPerCrawl": 50,"useJsBrowser": false,"proxyConfiguration": { "useApifyProxy": false }}
Example 4 — JavaScript-heavy site
{"startUrls": [{ "url": "https://some-react-app.io" }],"maxDepth": 1,"maxRequestsPerCrawl": 20,"useJsBrowser": true,"proxyConfiguration": { "useApifyProxy": true }}
How It Works
Step 1 — Start URL Processing
The actor fetches each URL from startUrls. The page is loaded via CheerioCrawler (HTTP) or PlaywrightCrawler (browser) depending on useJsBrowser.
Step 2 — Data Extraction
From every page, the actor extracts:
Emails:
Regex scanned across the full HTML source. Duplicates and file extension false positives (e.g. name@file.png) are filtered out. Results are lowercased and deduplicated.
Phones:
Two-source strategy for maximum precision with minimum noise:
tel:links → parsed directly from<a href="tel:...">elements (highest precision — site-declared)- Free text → only matches internationally-formatted numbers (
+CC ...) to avoid false positives from prices, dates, and IDs
Social Media:
All <a href> elements are checked for known domain patterns:
linkedin.com/company/orlinkedin.com/in/twitter.com/orx.com/instagram.com/facebook.com/youtube.com/
The first match per platform is recorded.
Step 3 — Contact Page Discovery (maxDepth: 1)
After processing the homepage, the actor enqueues sub-pages matching contact/about URL globs on the same domain. Each discovered sub-page goes through the same extraction process and is saved as a separate record tagged "pageType": "Contact/About".
Step 4 — Batched Dataset Write
Results are buffered in memory (batch size: 20 records) and pushed to the Apify Dataset in chunks to minimize API overhead on large runs.
startUrls│▼Fetch page (HTTP or Browser)│├── Extract: emails, phones, socials, title, meta│├── Push to Dataset│└── maxDepth=1? ──► Enqueue /contact, /about, /team pages│▼Fetch sub-page│Extract + Push
Crawl Modes In Detail
Mode 1: CheerioCrawler (HTTP-only, default)
- Downloads raw HTML over HTTP — no browser process
- Parses HTML with
cheerio(server-side jQuery-like API) - Cost: ~0.002 ACU per 1,000 pages
- Speed: Up to 20 concurrent requests
- Best for: 90%+ of B2B company websites (server-rendered HTML)
Mode 2: PlaywrightCrawler (JS browser fallback)
- Launches Chromium browser via Playwright
- Waits for JavaScript to execute before extracting content
- Cost: ~0.4 ACU per 1,000 pages (~200× more expensive than HTTP mode)
- Speed: Up to 5 concurrent browser contexts
- Best for: React, Vue, Angular, Next.js SPAs where HTML is rendered client-side
- Optimizations active:
- Blocks images, fonts, CSS, video, audio, PDFs, ZIPs
- Blocks Google Analytics, Google Tag Manager, Hotjar, Intercom, Zendesk
- Browser fingerprinting enabled for anti-detection
- Minimal Chromium flags for low memory usage
Performance & Cost Estimates
| Scenario | Mode | Pages | Est. Time | Est. Cost |
|---|---|---|---|---|
| 10 domains, depth 1 | HTTP | ~30 | < 30 sec | < $0.01 |
| 100 domains, depth 1 | HTTP | ~300 | ~2 min | ~$0.05 |
| 500 domains, depth 1 | HTTP | ~1,500 | ~10 min | ~$0.25 |
| 100 domains, JS mode | Playwright | ~300 | ~15 min | ~$1.00 |
Costs are estimates based on Apify ACU pricing. Actual cost depends on page size, proxy usage, and server response time.
Export Formats
Download your leads from the Apify Dataset in:
- JSON — nested structure including
emails,phones, andsocialsarrays - CSV — flat table; array fields are comma-joined strings, ready for Excel or Google Sheets
- Excel (.xlsx) — native spreadsheet format
- JSONL — one record per line, ideal for CRM imports and pipeline ingestion
Navigate to Storage → Dataset → Export in the Apify Console.
Tips for Best Results
Getting empty results?
- Try enabling
useJsBrowser: true— the site may render content client-side - Check that the domain is publicly accessible (no login wall)
- Some sites load contact info via async API calls; Playwright mode handles these better
Getting too many false-positive emails?
- The actor already filters out asset-extension strings and overly long matches
- Post-process with a simple regex check (MX record validation, format validation) in your pipeline
Phone numbers missing?
- The actor only captures
tel:links and clearly international+CCformatted numbers from text - This is intentional — bare digit strings (local format numbers) are indistinguishable from prices and IDs
- For maximum recall on phone numbers, use Playwright mode so
tel:links rendered by JavaScript are also captured
Scaling to thousands of domains?
- Use the Apify Scheduler to run batches of 500 domains per run
- Or use the Apify API to trigger runs dynamically with URL lists from your CRM or database
Limitations
- No login support. The actor only accesses publicly available pages. Contact information behind login walls or gated portals is not accessible.
- Email obfuscation. Some sites use JavaScript to obfuscate email addresses (e.g. rendering characters via CSS
contentor DOM manipulation). HTTP mode cannot capture these; Playwright mode handles most cases. - Phone number precision over recall. The actor intentionally limits free-text phone extraction to internationally formatted numbers (
+CC...) to avoid noise. Local-format numbers (e.g.0800 123 456) are not captured from text — only fromtel:links. - One LinkedIn/social link per platform per page. If a page has multiple LinkedIn profiles linked, only the first match is recorded.
- No email verification. Extracted emails are not validated for deliverability. Use a separate email verification service (e.g. ZeroBounce, NeverBounce) before sending outreach.
- Contact page detection is pattern-based. The actor matches known URL patterns. Non-standard contact page paths (e.g.
/get-in-touch,/connect) will not be auto-discovered.
Frequently Asked Questions
Q: How many domains can I scrape in one run?
There is no hard limit beyond your maxRequestsPerCrawl cap. For bulk runs, set maxRequestsPerCrawl high enough to cover all domains × expected pages per domain. A practical ceiling for a single run is ~1,000 domains in HTTP mode.
Q: Can I paste a CSV list of domains?
Yes — use the Import from text option in the Apify Console's URL input field, or use the Apify API to pass startUrls programmatically.
Q: Will it find emails in images or PDFs?
No. The actor only parses HTML text content. Emails embedded in images, scanned documents, or PDFs are not extracted.
Q: Is the data stored anywhere other than my dataset?
No. All data is written exclusively to your private Apify Dataset. Nothing is stored or shared externally.
Q: Does it handle redirect chains?
Yes. Both got-scraping (HTTP mode) and Playwright follow HTTP redirects automatically.
Q: Can I run this on a schedule?
Yes — use the Apify Scheduler to run this actor on a recurring basis (daily, weekly) to keep your contact database fresh.
Q: What if a site blocks the scraper?
Enable Apify Proxy (useApifyProxy: true). If blocks persist, try enabling useJsBrowser: true which uses browser fingerprinting to appear more human-like.
Q: Is using this scraper legal?
This actor only accesses publicly available information visible to any website visitor. You are responsible for ensuring your use of the collected data complies with applicable laws (GDPR, CAN-SPAM, CCPA) and the target website's Terms of Service. Always obtain proper consent before sending outreach to scraped contacts.
Technical Details
| Property | Value |
|---|---|
| Runtime | Node.js 18+ (ES Modules) |
| Framework | Apify SDK v3 + Crawlee v3 |
| HTTP crawler | CheerioCrawler + got-scraping |
| Browser crawler | PlaywrightCrawler + Chromium |
| HTML parser | cheerio (XML/HTML mode) |
| Max concurrency (HTTP) | 20 parallel requests |
| Max concurrency (browser) | 5 browser contexts |
| Request timeout (HTTP) | 30 seconds |
| Navigation timeout (browser) | 25 seconds |
| Dataset write strategy | Batched (20 records per flush) |
| Deduplication | Built-in Crawlee request deduplication |
Changelog
v1.0
- Initial release
- Dual-mode crawling: CheerioCrawler (default) + PlaywrightCrawler (JS fallback)
- Email extraction with false-positive filtering
- Phone extraction from
tel:links and international text matches - Social media extraction: LinkedIn, Twitter/X, Instagram, Facebook, YouTube
- Automatic contact/about page discovery (depth 1)
- Batch dataset writes for memory efficiency
- Playwright optimizations: asset blocking, analytics blocking, fingerprinting
Support
If you run into unexpected empty results, parsing issues, or proxy errors, please open a support ticket via the Apify Console. Include the target URL, your input configuration, and the actor run ID to help diagnose the issue quickly.