Website Contact Extractor (HTTP) avatar

Website Contact Extractor (HTTP)

Pricing

from $5.00 / 1,000 contact extracteds

Go to Apify Store
Website Contact Extractor (HTTP)

Website Contact Extractor (HTTP)

Extract contacts from any company website: names, emails, phones, LinkedIn. Offer targeting mode ranks decision-makers by relevance to your pitch so you always reach the right person first. AI-powered, multilingual, no browser needed.

Pricing

from $5.00 / 1,000 contact extracteds

Rating

0.0

(0)

Developer

Alessandro Santamaria

Alessandro Santamaria

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

3 hours ago

Last modified

Share

Website Contact Extractor

Extract contact details from any company website using AI. Names, emails, phone numbers, job titles, LinkedIn profiles, and company info -- structured, validated, and ready for your CRM.

What it does

  • Finds the right pages automatically -- discovers team, contact, about, and impressum pages from the homepage navigation. Works across languages and URL patterns (/team, /kontakt, /equipe, /chi-siamo).
  • Extracts complete contact profiles -- name, salutation, academic title, position, department, email, phone, LinkedIn URL. Also extracts company-level data: general email, HR email, phone, address, industry, employee count, and social media URLs.
  • Decision-maker targeting -- provide your offer (e.g. "IT Security Consulting") and the AI ranks contacts by relevance: decision_maker, influencer, or fallback. Get the right person to pitch to, not a random list.
  • Anti-hallucination validation -- cross-references every LLM-extracted name against the source HTML. Detects and removes fabricated contacts that appear across multiple companies.

How it works

Phase 1: Page discovery and crawling

  1. Fetches the homepage and scans all navigation links
  2. Matches links against contact-related keywords in 7 languages (team, kontakt, contact, equipe, about, impressum, etc.)
  3. Falls back to common URL paths if navigation discovery finds nothing
  4. Crawls all discovered pages via HTTP (no browser -- fast and lightweight)

Phase 2: Team member detail pages

  1. On team listing pages, identifies links to individual team member profiles
  2. Crawls each profile page for complete contact data (email, phone, LinkedIn)
  3. Merges detail page data with the listing page data

Extraction and validation

  1. Cleans raw HTML to remove scripts, styles, and navigation noise
  2. Sends cleaned HTML to AI for structured extraction
  3. Validates contacts: filters invalid names, checks email domain consistency
  4. Cross-company hallucination detection removes names that suspiciously appear in multiple companies
  5. Assigns confidence scores based on data completeness and source quality
  6. Detects offline websites, parked domains, and domain seller redirects

Input example

Minimal input -- just provide companies and a Gemini API key:

{
"companies": [
{
"company_id": "smartive-ag",
"website_url": "https://smartive.ch",
"company_name": "smartive AG"
},
{
"company_id": "example-gmbh",
"website_url": "https://example.de",
"company_name": "Example GmbH",
"team_page_url": "https://example.de/ueber-uns/team"
}
],
"llmProvider": "gemini",
"geminiApiKey": "your-gemini-api-key"
}

If you already know the team page URL, pass it as team_page_url to skip discovery and save time.

Decision-maker targeting

Find the best contacts to pitch a specific offer:

{
"companies": [
{
"company_id": "1",
"website_url": "https://example.ch",
"company_name": "Example AG"
}
],
"offer": "Social Media Recruiting",
"llmProvider": "gemini",
"geminiApiKey": "your-key"
}

Each contact receives a priority field: decision_maker (e.g. Head of HR), influencer (e.g. Marketing Manager), or fallback (other team members).

Output example

Each company produces one result object:

{
"company_id": "smartive-ag",
"company_name": "smartive AG",
"website_url": "https://smartive.ch",
"contacts": [
{
"name": "Thomas Joss",
"firstname": "Thomas",
"lastname": "Joss",
"salutation": "Mr.",
"position": "COO / Co-Founder",
"department": "Management",
"email": "thomas@smartive.ch",
"phone": "+41 44 123 45 67",
"linkedin_url": "https://linkedin.com/in/thomasjoss",
"source_url": "https://smartive.ch/team/thomas-joss",
"priority": "decision_maker",
"confidence": 0.95
},
{
"name": "Mirco Allenspach",
"firstname": "Mirco",
"lastname": "Allenspach",
"salutation": "Mr.",
"position": "Software Engineer",
"department": "Engineering",
"email": "mirco@smartive.ch",
"source_url": "https://smartive.ch/team",
"priority": "fallback",
"confidence": 0.80
}
],
"company_data": {
"general_email": "hello@smartive.ch",
"general_phone": "+41 44 552 22 22",
"address": "Pfingstweidstrasse 60, 8005 Zurich",
"industry": "Software Development",
"description": "Digital agency specializing in web applications and custom software.",
"employee_count_estimate": "30-50",
"social_urls": {
"linkedin": "https://linkedin.com/company/smartive",
"instagram": "https://instagram.com/smartive_ag"
}
},
"discovered_urls": {
"team_page_url": "https://smartive.ch/team",
"contact_page_url": "https://smartive.ch/kontakt",
"impressum_url": "https://smartive.ch/impressum"
},
"metrics": {
"pages_crawled": 8,
"llm_calls": 3,
"tokens_input": 4200,
"tokens_output": 680,
"llm_provider": "gemini",
"llm_model": "gemini-2.0-flash",
"processing_time_ms": 5200
},
"status": "success",
"scraped_at": "2026-03-05T10:30:00Z"
}

Status values

StatusMeaning
successContacts extracted successfully
partialSome pages failed but contacts were found
no_contactsWebsite accessible but no contacts found
js_rendering_suspectedNo contacts found, JS-rendering detected (see JavaScript-rendered pages)
website_offlineDomain is parked, expired, or redirects to a domain seller
failedWebsite could not be reached or processing error

Input parameters

ParameterTypeDefaultDescription
companiesarrayrequiredList of companies. Each needs company_id, website_url, company_name. Optional: team_page_url.
offerstring--Your product/service (e.g. "IT Outsourcing"). Enables decision-maker targeting with priority ranking.
outputLanguagestring"en"Output language for descriptions and positions: en, de, fr, it, es, pt, nl, or auto (match website language).
llmProviderstring"gemini"Primary AI provider: gemini, groq, or openrouter.
fallbackProviderstring--Second-level fallback if primary provider hits rate limits or errors.
fallback2Providerstring--Third-level fallback for maximum reliability.
geminiApiKeystring--Google Gemini API key. Free tier: 1 million tokens per minute.
llmApiKeystring--API key for Groq or OpenRouter.
openrouterApiKeystring--API key for OpenRouter.
maxPagesPerSiteinteger25Max pages to crawl per website (1-50). Includes team member detail pages.
discoverLinksbooleantrueSmart link discovery from homepage navigation. Disable only if you provide team_page_url for every company.
maxConcurrencyinteger3Parallel HTTP requests (1-10). Higher values are faster but increase rate limit risk.
maxTokensPerCompanyinteger20000Token budget for AI calls per company (500-50000). Higher budgets allow extracting more team members.
proxyConfigurationobject--Apify proxy settings. Datacenter proxies work for 95%+ of websites.

AI providers

The actor supports three LLM providers with automatic fallback -- just provide your API keys and the system builds the longest possible fallback chain.

ProviderFree tierSpeedModelBest for
Gemini (recommended)1M tokens/minFastgemini-2.0-flashMost use cases. Generous free tier.
Groq30 req/minVery fastllama-3.1-8b-instantSpeed-critical workloads.
OpenRouterSome free modelsVariesmistral-small-3.1-24bFallback or model variety.

Auto-discovery fallback chain

Simply provide API keys for the providers you want to use. The actor automatically detects which providers have valid keys and builds a fallback chain:

{
"geminiApiKey": "YOUR_GEMINI_KEY",
"llmApiKey": "YOUR_GROQ_KEY",
"openrouterApiKey": "YOUR_OPENROUTER_KEY"
}

With this setup, the actor automatically uses: Gemini → Groq → OpenRouter. If Gemini's daily quota runs out, it instantly falls back to Groq. If Groq is rate-limited, it falls back to OpenRouter. No manual fallback configuration needed.

Error-aware fallback: The system classifies errors (quota exhaustion vs. transient rate limits vs. auth errors) and provides clear console messages explaining what happened and why it switched providers.

You can still explicitly set llmProvider, fallbackProvider, and fallback2Provider if you want to override the automatic chain order.

Get a free Gemini API key at aistudio.google.com/apikey.

JavaScript-rendered pages

This actor is HTTP-only and does not run a browser. Websites built with JavaScript frameworks (React, Vue, Angular) may render no content in their initial HTML. When this happens, the actor automatically detects JS-rendering signals and flags affected companies in the output.

How detection works

After fetching each page (both Phase 1 and Phase 2), the actor inspects the raw HTML for framework markers, empty root elements, and noscript warnings. If JS-rendering indicators are found and no contacts are extracted, the result status is set to "js_rendering_suspected" instead of "no_contacts".

Detection signals

SignalMeaning
react_root_empty<div id="root"> or <div id="__next"> with near-empty content
react_markersdata-reactroot, __NEXT_DATA__, or _next/static scripts
vue_markersdata-v- attributes or Vue.js script references
angular_markersng-version or ng-app attributes
low_text_ratioLess than 200 chars of text in 5000+ chars of HTML
noscript_warning<noscript> block asking user to enable JavaScript

Flagged output example

When JS rendering is detected, the result includes additional fields:

{
"company_id": "example-ag",
"company_name": "Example AG",
"website_url": "https://example.com",
"contacts": [],
"company_data": {},
"status": "js_rendering_suspected",
"js_rendering_suspected": true,
"js_indicators": ["vue_markers", "low_text_ratio"],
"scraped_at": "2026-03-09T10:00:00Z"
}

Two-stage pipeline with browser fallback

Use the flag to build a two-stage pipeline with the Website Contact Extractor (Browser):

Website Contact Extractor (HTTP, fast, 256MB)
├─ Normal results → use directly
└─ js_rendering_suspected → Website Contact Extractor (Browser, 1-4GB)

Why separate actors? The HTTP actor uses a minimal Docker image (~256MB) with no browser. The browser actor needs Chrome (~1024MB) and costs ~2x more per result. Keeping them separate lets you run the cheap HTTP pass on all companies and only pay for browser rendering on the ~28% that need it.

Automatic chaining: Set enablePlaywrightFallback: true and the HTTP actor automatically triggers the browser actor for JS-flagged companies. The browser run ID is saved in the key-value store.

Filter flagged results in code

const dataset = await Actor.openDataset();
const { items } = await dataset.getData();
const results = items.filter(item => item.status !== 'js_rendering_suspected');
const needsBrowser = items.filter(item => item.js_rendering_suspected === true);
// Re-run flagged companies with the browser actor
for (const result of needsBrowser) {
console.log(`Needs browser: ${result.company_name} (${result.js_indicators.join(', ')})`);
}

Limitations

  • HTTP-only extraction -- this actor does not run a browser. JS-rendered websites are automatically detected and flagged (see JavaScript-rendered pages above). Server-rendered sites and static HTML work well.
  • ATS portals with JavaScript rendering -- some applicant tracking systems (e.g. CVManager) require JavaScript to display team pages. These are detected and flagged.
  • Rate limits -- some LLM providers have strict rate limits on free tiers. Configure fallback providers to handle rate limit errors automatically.
  • DACH coverage is strongest -- the multilingual keyword discovery works in 7 languages, but German, English, French, and Italian sites have the deepest coverage. Expect ~70-80% success rate for DACH company websites, ~40-50% for international sites.
  • Team size -- for companies with 100+ team members, some contacts may be truncated by the token budget. Increase maxTokensPerCompany if you need exhaustive extraction.

Pricing

This actor uses pay-per-result pricing:

EventPrice
Company processed$0.01 per company
Contact extracted$0.005 per contact

Examples:

  • 100 companies, average 5 contacts each = $1.00 + $2.50 = $3.50
  • 500 companies, average 3 contacts each = $5.00 + $7.50 = $12.50

Plus standard Apify platform compute costs (minimal -- the actor uses ~256MB and processes each company in seconds).

LLM API costs are separate but effectively zero when using Gemini Flash free tier.

Use with Google Maps Scraper

Chain with the Google Maps Scraper for a complete lead generation pipeline:

Google Maps Scraper --> find companies with websites
|
v
Website Contact Extractor --> get team contacts with emails and phones

The Google Maps Scraper has a built-in enableContactExtraction option that automatically passes results to this actor.

FAQ

Do I need my own API key? Yes. You need a free API key from at least one LLM provider. Gemini is recommended -- get a key in 30 seconds at aistudio.google.com/apikey.

How many companies can I process in one run? There is no hard limit. The actor processes companies sequentially with configurable concurrency. For large batches (1000+ companies), consider splitting into multiple runs.

What if a website blocks the request? Enable Apify proxy in the input configuration. Datacenter proxies work for the vast majority of company websites. For heavily protected sites, use residential proxies.

Can I use this actor via the Apify API? Yes. Call the actor via the Apify API and retrieve results from the default dataset. Each company produces one dataset item.