Website Contact Extractor (HTTP)
Pricing
from $5.00 / 1,000 contact extracteds
Website Contact Extractor (HTTP)
Extract contacts from any company website: names, emails, phones, LinkedIn. Offer targeting mode ranks decision-makers by relevance to your pitch so you always reach the right person first. AI-powered, multilingual, no browser needed.
Pricing
from $5.00 / 1,000 contact extracteds
Rating
0.0
(0)
Developer

Alessandro Santamaria
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share
Website Contact Extractor
Extract contact details from any company website using AI. Names, emails, phone numbers, job titles, LinkedIn profiles, and company info -- structured, validated, and ready for your CRM.
What it does
- Finds the right pages automatically -- discovers team, contact, about, and impressum pages from the homepage navigation. Works across languages and URL patterns (
/team,/kontakt,/equipe,/chi-siamo). - Extracts complete contact profiles -- name, salutation, academic title, position, department, email, phone, LinkedIn URL. Also extracts company-level data: general email, HR email, phone, address, industry, employee count, and social media URLs.
- Decision-maker targeting -- provide your offer (e.g. "IT Security Consulting") and the AI ranks contacts by relevance:
decision_maker,influencer, orfallback. Get the right person to pitch to, not a random list. - Anti-hallucination validation -- cross-references every LLM-extracted name against the source HTML. Detects and removes fabricated contacts that appear across multiple companies.
How it works
Phase 1: Page discovery and crawling
- Fetches the homepage and scans all navigation links
- Matches links against contact-related keywords in 7 languages (team, kontakt, contact, equipe, about, impressum, etc.)
- Falls back to common URL paths if navigation discovery finds nothing
- Crawls all discovered pages via HTTP (no browser -- fast and lightweight)
Phase 2: Team member detail pages
- On team listing pages, identifies links to individual team member profiles
- Crawls each profile page for complete contact data (email, phone, LinkedIn)
- Merges detail page data with the listing page data
Extraction and validation
- Cleans raw HTML to remove scripts, styles, and navigation noise
- Sends cleaned HTML to AI for structured extraction
- Validates contacts: filters invalid names, checks email domain consistency
- Cross-company hallucination detection removes names that suspiciously appear in multiple companies
- Assigns confidence scores based on data completeness and source quality
- Detects offline websites, parked domains, and domain seller redirects
Input example
Minimal input -- just provide companies and a Gemini API key:
{"companies": [{"company_id": "smartive-ag","website_url": "https://smartive.ch","company_name": "smartive AG"},{"company_id": "example-gmbh","website_url": "https://example.de","company_name": "Example GmbH","team_page_url": "https://example.de/ueber-uns/team"}],"llmProvider": "gemini","geminiApiKey": "your-gemini-api-key"}
If you already know the team page URL, pass it as team_page_url to skip discovery and save time.
Decision-maker targeting
Find the best contacts to pitch a specific offer:
{"companies": [{"company_id": "1","website_url": "https://example.ch","company_name": "Example AG"}],"offer": "Social Media Recruiting","llmProvider": "gemini","geminiApiKey": "your-key"}
Each contact receives a priority field: decision_maker (e.g. Head of HR), influencer (e.g. Marketing Manager), or fallback (other team members).
Output example
Each company produces one result object:
{"company_id": "smartive-ag","company_name": "smartive AG","website_url": "https://smartive.ch","contacts": [{"name": "Thomas Joss","firstname": "Thomas","lastname": "Joss","salutation": "Mr.","position": "COO / Co-Founder","department": "Management","email": "thomas@smartive.ch","phone": "+41 44 123 45 67","linkedin_url": "https://linkedin.com/in/thomasjoss","source_url": "https://smartive.ch/team/thomas-joss","priority": "decision_maker","confidence": 0.95},{"name": "Mirco Allenspach","firstname": "Mirco","lastname": "Allenspach","salutation": "Mr.","position": "Software Engineer","department": "Engineering","email": "mirco@smartive.ch","source_url": "https://smartive.ch/team","priority": "fallback","confidence": 0.80}],"company_data": {"general_email": "hello@smartive.ch","general_phone": "+41 44 552 22 22","address": "Pfingstweidstrasse 60, 8005 Zurich","industry": "Software Development","description": "Digital agency specializing in web applications and custom software.","employee_count_estimate": "30-50","social_urls": {"linkedin": "https://linkedin.com/company/smartive","instagram": "https://instagram.com/smartive_ag"}},"discovered_urls": {"team_page_url": "https://smartive.ch/team","contact_page_url": "https://smartive.ch/kontakt","impressum_url": "https://smartive.ch/impressum"},"metrics": {"pages_crawled": 8,"llm_calls": 3,"tokens_input": 4200,"tokens_output": 680,"llm_provider": "gemini","llm_model": "gemini-2.0-flash","processing_time_ms": 5200},"status": "success","scraped_at": "2026-03-05T10:30:00Z"}
Status values
| Status | Meaning |
|---|---|
success | Contacts extracted successfully |
partial | Some pages failed but contacts were found |
no_contacts | Website accessible but no contacts found |
js_rendering_suspected | No contacts found, JS-rendering detected (see JavaScript-rendered pages) |
website_offline | Domain is parked, expired, or redirects to a domain seller |
failed | Website could not be reached or processing error |
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
companies | array | required | List of companies. Each needs company_id, website_url, company_name. Optional: team_page_url. |
offer | string | -- | Your product/service (e.g. "IT Outsourcing"). Enables decision-maker targeting with priority ranking. |
outputLanguage | string | "en" | Output language for descriptions and positions: en, de, fr, it, es, pt, nl, or auto (match website language). |
llmProvider | string | "gemini" | Primary AI provider: gemini, groq, or openrouter. |
fallbackProvider | string | -- | Second-level fallback if primary provider hits rate limits or errors. |
fallback2Provider | string | -- | Third-level fallback for maximum reliability. |
geminiApiKey | string | -- | Google Gemini API key. Free tier: 1 million tokens per minute. |
llmApiKey | string | -- | API key for Groq or OpenRouter. |
openrouterApiKey | string | -- | API key for OpenRouter. |
maxPagesPerSite | integer | 25 | Max pages to crawl per website (1-50). Includes team member detail pages. |
discoverLinks | boolean | true | Smart link discovery from homepage navigation. Disable only if you provide team_page_url for every company. |
maxConcurrency | integer | 3 | Parallel HTTP requests (1-10). Higher values are faster but increase rate limit risk. |
maxTokensPerCompany | integer | 20000 | Token budget for AI calls per company (500-50000). Higher budgets allow extracting more team members. |
proxyConfiguration | object | -- | Apify proxy settings. Datacenter proxies work for 95%+ of websites. |
AI providers
The actor supports three LLM providers with automatic fallback -- just provide your API keys and the system builds the longest possible fallback chain.
| Provider | Free tier | Speed | Model | Best for |
|---|---|---|---|---|
| Gemini (recommended) | 1M tokens/min | Fast | gemini-2.0-flash | Most use cases. Generous free tier. |
| Groq | 30 req/min | Very fast | llama-3.1-8b-instant | Speed-critical workloads. |
| OpenRouter | Some free models | Varies | mistral-small-3.1-24b | Fallback or model variety. |
Auto-discovery fallback chain
Simply provide API keys for the providers you want to use. The actor automatically detects which providers have valid keys and builds a fallback chain:
{"geminiApiKey": "YOUR_GEMINI_KEY","llmApiKey": "YOUR_GROQ_KEY","openrouterApiKey": "YOUR_OPENROUTER_KEY"}
With this setup, the actor automatically uses: Gemini → Groq → OpenRouter. If Gemini's daily quota runs out, it instantly falls back to Groq. If Groq is rate-limited, it falls back to OpenRouter. No manual fallback configuration needed.
Error-aware fallback: The system classifies errors (quota exhaustion vs. transient rate limits vs. auth errors) and provides clear console messages explaining what happened and why it switched providers.
You can still explicitly set llmProvider, fallbackProvider, and fallback2Provider if you want to override the automatic chain order.
Get a free Gemini API key at aistudio.google.com/apikey.
JavaScript-rendered pages
This actor is HTTP-only and does not run a browser. Websites built with JavaScript frameworks (React, Vue, Angular) may render no content in their initial HTML. When this happens, the actor automatically detects JS-rendering signals and flags affected companies in the output.
How detection works
After fetching each page (both Phase 1 and Phase 2), the actor inspects the raw HTML for framework markers, empty root elements, and noscript warnings. If JS-rendering indicators are found and no contacts are extracted, the result status is set to "js_rendering_suspected" instead of "no_contacts".
Detection signals
| Signal | Meaning |
|---|---|
react_root_empty | <div id="root"> or <div id="__next"> with near-empty content |
react_markers | data-reactroot, __NEXT_DATA__, or _next/static scripts |
vue_markers | data-v- attributes or Vue.js script references |
angular_markers | ng-version or ng-app attributes |
low_text_ratio | Less than 200 chars of text in 5000+ chars of HTML |
noscript_warning | <noscript> block asking user to enable JavaScript |
Flagged output example
When JS rendering is detected, the result includes additional fields:
{"company_id": "example-ag","company_name": "Example AG","website_url": "https://example.com","contacts": [],"company_data": {},"status": "js_rendering_suspected","js_rendering_suspected": true,"js_indicators": ["vue_markers", "low_text_ratio"],"scraped_at": "2026-03-09T10:00:00Z"}
Two-stage pipeline with browser fallback
Use the flag to build a two-stage pipeline with the Website Contact Extractor (Browser):
Website Contact Extractor (HTTP, fast, 256MB)├─ Normal results → use directly└─ js_rendering_suspected → Website Contact Extractor (Browser, 1-4GB)
Why separate actors? The HTTP actor uses a minimal Docker image (~256MB) with no browser. The browser actor needs Chrome (~1024MB) and costs ~2x more per result. Keeping them separate lets you run the cheap HTTP pass on all companies and only pay for browser rendering on the ~28% that need it.
Automatic chaining: Set enablePlaywrightFallback: true and the HTTP actor automatically triggers the browser actor for JS-flagged companies. The browser run ID is saved in the key-value store.
Filter flagged results in code
const dataset = await Actor.openDataset();const { items } = await dataset.getData();const results = items.filter(item => item.status !== 'js_rendering_suspected');const needsBrowser = items.filter(item => item.js_rendering_suspected === true);// Re-run flagged companies with the browser actorfor (const result of needsBrowser) {console.log(`Needs browser: ${result.company_name} (${result.js_indicators.join(', ')})`);}
Limitations
- HTTP-only extraction -- this actor does not run a browser. JS-rendered websites are automatically detected and flagged (see JavaScript-rendered pages above). Server-rendered sites and static HTML work well.
- ATS portals with JavaScript rendering -- some applicant tracking systems (e.g. CVManager) require JavaScript to display team pages. These are detected and flagged.
- Rate limits -- some LLM providers have strict rate limits on free tiers. Configure fallback providers to handle rate limit errors automatically.
- DACH coverage is strongest -- the multilingual keyword discovery works in 7 languages, but German, English, French, and Italian sites have the deepest coverage. Expect ~70-80% success rate for DACH company websites, ~40-50% for international sites.
- Team size -- for companies with 100+ team members, some contacts may be truncated by the token budget. Increase
maxTokensPerCompanyif you need exhaustive extraction.
Pricing
This actor uses pay-per-result pricing:
| Event | Price |
|---|---|
| Company processed | $0.01 per company |
| Contact extracted | $0.005 per contact |
Examples:
- 100 companies, average 5 contacts each = $1.00 + $2.50 = $3.50
- 500 companies, average 3 contacts each = $5.00 + $7.50 = $12.50
Plus standard Apify platform compute costs (minimal -- the actor uses ~256MB and processes each company in seconds).
LLM API costs are separate but effectively zero when using Gemini Flash free tier.
Use with Google Maps Scraper
Chain with the Google Maps Scraper for a complete lead generation pipeline:
Google Maps Scraper --> find companies with websites|vWebsite Contact Extractor --> get team contacts with emails and phones
The Google Maps Scraper has a built-in enableContactExtraction option that automatically passes results to this actor.
FAQ
Do I need my own API key? Yes. You need a free API key from at least one LLM provider. Gemini is recommended -- get a key in 30 seconds at aistudio.google.com/apikey.
How many companies can I process in one run? There is no hard limit. The actor processes companies sequentially with configurable concurrency. For large batches (1000+ companies), consider splitting into multiple runs.
What if a website blocks the request? Enable Apify proxy in the input configuration. Datacenter proxies work for the vast majority of company websites. For heavily protected sites, use residential proxies.
Can I use this actor via the Apify API? Yes. Call the actor via the Apify API and retrieve results from the default dataset. Each company produces one dataset item.