Website Contact & Lead Extractor
Pricing
from $2.00 / 1,000 results
Website Contact & Lead Extractor
Extracts email addresses, phone numbers, physical addresses, and social media links (LinkedIn, Twitter/X, Facebook, Instagram, YouTube) from any list of websites. Features customizable crawl depth, per-domain page limits, and proxy configuration.
Pricing
from $2.00 / 1,000 results
Rating
5.0
(1)
Developer
Avira
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Website Contact Scraper Apify Actor
A production-ready website contact scraper designed to crawl websites recursively, extract contact information, and output structured data. Perfect for B2B lead generation, CRM enrichment, and automated outreach campaigns.
This Actor extracts:
- Email addresses (filtered to remove static assets like image/font files)
- Phone numbers (standardized and filtered for false positives)
- Social Media links (LinkedIn, Facebook, Instagram, Twitter/X, YouTube, TikTok, GitHub, Pinterest, and Medium profiles)
- Physical Addresses (using heuristic text checks and
<address>tag parsers)
Features
- ⚡ Extremely Fast: Built with Crawlee's
CheerioCrawlerto parse raw HTML directly without the overhead of a headless browser. - 🎯 Targeted Crawling: Prioritizes contact, about, team, and help pages if
crawlShortcutLinksOnlyis enabled to save crawl budget. - ⚙️ Configurable Limits: Set max pages per domain and maximum depth to prevent runaway runs on large sites.
- 🔄 Consolidated Output: Automatically aggregates and de-duplicates contact information collected across multiple pages of the same domain into a single clean JSON record.
- 🛡️ Proxy Support: Fully integrated with Apify Proxy or custom proxies to bypass rate limits and geoblocks.
Input Parameters
The Actor accepts the following inputs (defined in .actor/input_schema.json):
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | Yes | List of site URLs to scan (e.g. [{"url": "https://crawlee.dev"}]). | |
maxDepth | Integer | No | 2 | Maximum crawl depth. Depth 1 is start URLs only. |
maxPagesPerDomain | Integer | No | 20 | Maximum pages to crawl per start URL domain to limit budget. |
crawlShortcutLinksOnly | Boolean | No | true | When true, skips pages that do not match contact-related keywords in their path or title. |
proxyConfiguration | Object | No | { "useApifyProxy": true } | Apify Proxy or custom proxies configuration. |
Example Input JSON
{"startUrls": [{ "url": "https://crawlee.dev" },{ "url": "https://apify.com" }],"maxDepth": 2,"maxPagesPerDomain": 15,"crawlShortcutLinksOnly": true,"proxyConfiguration": {"useApifyProxy": true}}
Example Output JSON
For each start URL domain, a single merged dataset row is saved:
{"domain": "crawlee.dev","startUrl": "https://crawlee.dev","emails": ["info@apify.com"],"phones": ["+1 (234) 567-8900"],"socialLinks": {"linkedin": ["https://linkedin.com/company/apify"],"facebook": ["https://facebook.com/apifytech"],"twitter": ["https://twitter.com/apify"],"github": ["https://github.com/apify"]},"addresses": ["Prague, Czech Republic"],"pagesCrawled": 3,"crawledUrls": ["https://crawlee.dev/","https://crawlee.dev/docs/introduction","https://crawlee.dev/about"]}