Website Contacts Crawler
Pricing
from $0.01 / 1,000 results
Website Contacts Crawler
Scrap website searching for contact details, emails and phone numbers
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

AI_Builder
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
Website Contact Crawler
Automatically crawl company websites and extract contact information — emails, phone numbers, team member names, and LinkedIn profiles. Built for lead generation, sales prospecting, and business development.
What does it do?
Give it a list of company website URLs and it will:
- Crawl the homepage of each site
- Automatically discover and follow links to contact, team, about, and people pages
- Extract structured contact data from every page visited
- Output one clean JSON record per company with all contacts found
What it extracts
| Data type | Examples |
|---|---|
| Emails | contact@company.com, john.doe@company.fr |
| Phone numbers | French format (01 23 45 67 89, +33 1 23 45 67 89) and international |
| People | Names paired with roles like CEO, Director, Founder, Manager, VP |
| LinkedIn profiles | linkedin.com/in/john-doe URLs found on the site |
Smart page discovery
The crawler doesn't just hit the homepage. It automatically finds relevant subpages by:
- Trying common URL patterns (
/contact,/team,/about,/equipe,/nous-contacter, etc.) - Parsing links from the homepage that contain keywords like "contact", "team", "about", "people"
- Supporting both English and French page names
Input
{"urls": ["https://www.devoteam.com/fr/","https://www.eskimoz.fr","https://www.junto.fr"],"maxPagesPerSite": 5,"delayBetweenRequests": 2}
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | array of strings | required | List of company website URLs to crawl |
maxPagesPerSite | integer | 5 | Maximum pages to crawl per website (1–20). Includes homepage + discovered subpages |
delayBetweenRequests | integer | 2 | Seconds to wait between requests. Increase for polite crawling |
Tip: You don't need to include https:// — the crawler adds it automatically if missing.
Output
Each company produces one record in the dataset:
{"company_name": "Eskimoz","url": "https://www.eskimoz.fr","emails": ["contact@eskimoz.fr","alexandre.courbin@eskimoz.fr","contact@eskimoz.co.uk"],"phones": ["01 84 88 41 17","04 28 29 59 78","+44 204 525 5581"],"people": ["Alexandre Courbin"],"linkedin": ["https://www.linkedin.com/in/vincentdpnt","https://www.linkedin.com/in/joakim-fatih-940370110"],"pages_crawled": 3}
| Field | Type | Description |
|---|---|---|
company_name | string | Company name derived from the domain |
url | string | The original URL that was crawled |
emails | array | All unique email addresses found (max 20) |
phones | array | All unique phone numbers found (max 10) |
people | array | Names of people found alongside job titles (max 30) |
linkedin | array | LinkedIn profile URLs found on the site (max 30) |
pages_crawled | integer | Number of pages successfully crawled for this site |
Use cases
- Sales prospecting — Find decision-maker contacts at target companies
- Lead generation — Build contact lists from company websites at scale
- Partner research — Identify key people at potential partner organizations
- Market research — Map out teams and contact points across an industry
- CRM enrichment — Add missing contact details to your existing company records
How it works
Input URLs → Homepage crawl → Discover subpages → Crawl contact/team/about pages → Extract data → Output dataset
- For each URL, the crawler fetches the homepage using a fast HTTP client
- It parses the homepage HTML to find links to contact, team, and about pages
- It also tries common URL patterns (e.g.,
/contact,/equipe,/about-us) - Each discovered page is crawled and scanned for emails, phones, names, and LinkedIn URLs
- Results are deduplicated and pushed to the Apify dataset as one record per company
The crawler uses HTTP requests (not a browser), making it fast and lightweight. This works well for most company websites. Sites that rely heavily on JavaScript rendering may return fewer results.
API usage
Run via API
curl -X POST \"https://api.apify.com/v2/acts/quaking_pail~contact-crawler/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://www.eskimoz.fr", "https://www.junto.fr"],"maxPagesPerSite": 5}'
Run synchronously and get results
curl -X POST \"https://api.apify.com/v2/acts/quaking_pail~contact-crawler/run-sync-get-dataset-items?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://www.eskimoz.fr"],"maxPagesPerSite": 5}'
Get results from last run
GET https://api.apify.com/v2/acts/quaking_pail~contact-crawler/runs/last/dataset/items?token=YOUR_TOKEN
Limitations
- JavaScript-heavy sites: The crawler uses HTTP requests, not a browser. Sites that load content dynamically via JavaScript may return empty results.
- Rate limiting: Some sites may block rapid requests. Increase
delayBetweenRequestsif you encounter issues. - LinkedIn profiles: Only profiles linked directly on the company website are found. This does not scrape LinkedIn itself.
- People extraction: Names are detected when they appear next to a job title (CEO, Director, etc.). Names without an associated role may not be captured.
Running locally
# Clone and installapify create contact-crawler -t python-emptycd contact-crawlerpip install -r requirements.txt# Set your input# Edit storage/key_value_stores/default/INPUT.json# Runapify run
Results are saved locally in storage/datasets/default/.
Built with
- Apify SDK for Python — Actor framework
- Crawlee — HTTP crawling engine
- Python regex — Contact data extraction