
Website Contact Scraper - AI-Powered Lead Finder
Under maintenance
Pricing
Pay per event

Website Contact Scraper - AI-Powered Lead Finder
Under maintenance
AI-powered website scraper that extracts real contact data from company sites! Finds people, positions, emails & phone numbers using LLM technology. Scans team pages, contact sections & company info. Perfect for B2B lead generation and sales research.
0.0 (0)
Pricing
Pay per event
0
Total users
3
Monthly users
3
Runs succeeded
>99%
Last modified
an hour ago
Scrape Contact and Person Data from a Single Web Page (JavaScript)
This template scrapes a single web page in Node.js using Apify SDK. The actor accepts a URL as input, fetches the page with Axios, parses it with Cheerio, and extracts structured person and general contact information. The results are stored in a dataset where each object has the same attributes.
Instead of just parsing headings, this actor looks for names, positions, email addresses, and phone numbers (including Swiss formats) anywhere on the page.
Included features
- Apify SDK – toolkit for building Actors
- Input schema – defines and validates a schema for the actor’s input (URL, maxPages, onlyRelevantPages)
- Dataset – store structured data where each object has the same attributes
- Axios client – promise-based HTTP client for Node.js
- Cheerio – library for parsing and traversing HTML
How it works
-
Actor.getInput()
retrieves the input object, which must include a non‐emptyurl
. Ifurl
is empty or only whitespace, the actor exits without any output. -
axios.get(url)
fetches the HTML of the given page. -
cheerio.load(response.data)
loads the HTML so you can parse its textual content and element structure. -
The actor removes
<script>
and<style>
elements, then extracts the full text content of the<body>
and the page’s<title>
. -
extractPersonData(text, pageUrl)
scans the page text for:- Potential names (capitalized words of 1–4 parts, excluding lines containing digits or “@”).
- Nearby position titles (matching a list of German/DACH and English keywords).
- Nearby email addresses and phone numbers (Swiss +41 or German +49 patterns).
- Each match yields an object
{ name, position?, email?, phone?, pageUrl }
. Duplicates by name+email are removed.
-
extractGeneralContact(text)
collects all email addresses and phone numbers on the page, then filters out “general” addresses (e.g.,info@
,kontakt@
,office@
). -
The result object is assembled with:
{url, // the page URLtitle, // <title> textscrapedAt: ISO timestamp,persons: [ … ], // array of { name?, position?, email?, phone?, pageUrl }generalContact: {emails: [ … ], // all emails foundphones: [ … ], // all phone numbers foundgeneralEmails: [ … ] // subset of emails deemed “general”}} -
Actor.pushData(result)
stores that object in the default dataset. If an error occurs (timeout, network, parsing), the actor still pushes a record with{ url, error: error.message, scrapedAt }
. -
Finally,
Actor.exit()
ends the run.
Input schema
{"type": "object","properties": {"url": {"type": "string","description": "The target web page to scrape (must be a non-empty string)."},"maxPages": {"type": "integer","description": "Maximum number of pages to process (only relevant if you extend to multi-page).","default": 50},"onlyRelevantPages": {"type": "boolean","description": "If true, only pages with certain keywords are scraped (for multi-page use).","default": false}},"required": ["url"]}
url
(string, required): the single page to scrape.maxPages
andonlyRelevantPages
exist to support multi-page crawling but do not affect single-page logic.
Example output
Each run produces one or more dataset items. A successful scrape yields at least one record like:
{"url": "https://example.com/about","title": "About Us – Example Company","scrapedAt": "2025-06-02T14:35:12.345Z","persons": [{"name": "Gabriel Ziegler","position": "Geschäftsführer","email": "info@zieglerhaustechnik.ch","phone": "+41 78 966 88 41","pageUrl": "https://example.com/about"},{"name": "Anna Meier","position": "Marketing Manager","email": "anna.meier@example.com","pageUrl": "https://example.com/about"}],"generalContact": {"emails": ["info@zieglerhaustechnik.ch","kontakt@example.com"],"phones": ["+41 44 123 45 67"],"generalEmails": ["info@zieglerhaustechnik.ch"]}}
If an error occurs, you might see:
{"url": "https://invalid-url.test","error": "getaddrinfo ENOTFOUND invalid-url.test","scrapedAt": "2025-06-02T14:35:12.345Z"}
Each field is guaranteed to exist, and fields like persons
or generalContact.emails
may be empty arrays if nothing is found.
Development and local testing
-
Clone or pull the Actor
apify pull <ActorId>cd <ActorDirectory> -
Install dependencies
$npm install -
Run locally
$npx apify runYou will be prompted to provide the
url
in theINPUT.json
file (edit it or pass via CLI). -
Inspect output After completion, check
apify_storage/datasets/default/*.json
for scraped records.