Website Contacts Crawler avatar

Website Contacts Crawler

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Website Contacts Crawler

Website Contacts Crawler

Scrap website searching for contact details, emails and phone numbers

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

AI_Builder

AI_Builder

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

Website Contact Crawler

Automatically crawl company websites and extract contact information — emails, phone numbers, team member names, and LinkedIn profiles. Built for lead generation, sales prospecting, and business development.

What does it do?

Give it a list of company website URLs and it will:

  1. Crawl the homepage of each site
  2. Automatically discover and follow links to contact, team, about, and people pages
  3. Extract structured contact data from every page visited
  4. Output one clean JSON record per company with all contacts found

What it extracts

Data typeExamples
Emailscontact@company.com, john.doe@company.fr
Phone numbersFrench format (01 23 45 67 89, +33 1 23 45 67 89) and international
PeopleNames paired with roles like CEO, Director, Founder, Manager, VP
LinkedIn profileslinkedin.com/in/john-doe URLs found on the site

Smart page discovery

The crawler doesn't just hit the homepage. It automatically finds relevant subpages by:

  • Trying common URL patterns (/contact, /team, /about, /equipe, /nous-contacter, etc.)
  • Parsing links from the homepage that contain keywords like "contact", "team", "about", "people"
  • Supporting both English and French page names

Input

{
"urls": [
"https://www.devoteam.com/fr/",
"https://www.eskimoz.fr",
"https://www.junto.fr"
],
"maxPagesPerSite": 5,
"delayBetweenRequests": 2
}
ParameterTypeDefaultDescription
urlsarray of stringsrequiredList of company website URLs to crawl
maxPagesPerSiteinteger5Maximum pages to crawl per website (1–20). Includes homepage + discovered subpages
delayBetweenRequestsinteger2Seconds to wait between requests. Increase for polite crawling

Tip: You don't need to include https:// — the crawler adds it automatically if missing.

Output

Each company produces one record in the dataset:

{
"company_name": "Eskimoz",
"url": "https://www.eskimoz.fr",
"emails": [
"contact@eskimoz.fr",
"alexandre.courbin@eskimoz.fr",
"contact@eskimoz.co.uk"
],
"phones": [
"01 84 88 41 17",
"04 28 29 59 78",
"+44 204 525 5581"
],
"people": [
"Alexandre Courbin"
],
"linkedin": [
"https://www.linkedin.com/in/vincentdpnt",
"https://www.linkedin.com/in/joakim-fatih-940370110"
],
"pages_crawled": 3
}
FieldTypeDescription
company_namestringCompany name derived from the domain
urlstringThe original URL that was crawled
emailsarrayAll unique email addresses found (max 20)
phonesarrayAll unique phone numbers found (max 10)
peoplearrayNames of people found alongside job titles (max 30)
linkedinarrayLinkedIn profile URLs found on the site (max 30)
pages_crawledintegerNumber of pages successfully crawled for this site

Use cases

  • Sales prospecting — Find decision-maker contacts at target companies
  • Lead generation — Build contact lists from company websites at scale
  • Partner research — Identify key people at potential partner organizations
  • Market research — Map out teams and contact points across an industry
  • CRM enrichment — Add missing contact details to your existing company records

How it works

Input URLs → Homepage crawl → Discover subpages → Crawl contact/team/about pages → Extract data → Output dataset
  1. For each URL, the crawler fetches the homepage using a fast HTTP client
  2. It parses the homepage HTML to find links to contact, team, and about pages
  3. It also tries common URL patterns (e.g., /contact, /equipe, /about-us)
  4. Each discovered page is crawled and scanned for emails, phones, names, and LinkedIn URLs
  5. Results are deduplicated and pushed to the Apify dataset as one record per company

The crawler uses HTTP requests (not a browser), making it fast and lightweight. This works well for most company websites. Sites that rely heavily on JavaScript rendering may return fewer results.

API usage

Run via API

curl -X POST \
"https://api.apify.com/v2/acts/quaking_pail~contact-crawler/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://www.eskimoz.fr", "https://www.junto.fr"],
"maxPagesPerSite": 5
}'

Run synchronously and get results

curl -X POST \
"https://api.apify.com/v2/acts/quaking_pail~contact-crawler/run-sync-get-dataset-items?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://www.eskimoz.fr"],
"maxPagesPerSite": 5
}'

Get results from last run

GET https://api.apify.com/v2/acts/quaking_pail~contact-crawler/runs/last/dataset/items?token=YOUR_TOKEN

Limitations

  • JavaScript-heavy sites: The crawler uses HTTP requests, not a browser. Sites that load content dynamically via JavaScript may return empty results.
  • Rate limiting: Some sites may block rapid requests. Increase delayBetweenRequests if you encounter issues.
  • LinkedIn profiles: Only profiles linked directly on the company website are found. This does not scrape LinkedIn itself.
  • People extraction: Names are detected when they appear next to a job title (CEO, Director, etc.). Names without an associated role may not be captured.

Running locally

# Clone and install
apify create contact-crawler -t python-empty
cd contact-crawler
pip install -r requirements.txt
# Set your input
# Edit storage/key_value_stores/default/INPUT.json
# Run
apify run

Results are saved locally in storage/datasets/default/.

Built with