Website Contact Scraper - AI-Powered Lead Finder avatar
Website Contact Scraper - AI-Powered Lead Finder

Under maintenance

Pricing

Pay per event

Go to Store
Website Contact Scraper - AI-Powered Lead Finder

Website Contact Scraper - AI-Powered Lead Finder

Under maintenance

Developed by

Timo Sieber

Timo Sieber

Maintained by Community

AI-powered website scraper that extracts real contact data from company sites! Finds people, positions, emails & phone numbers using LLM technology. Scans team pages, contact sections & company info. Perfect for B2B lead generation and sales research.

0.0 (0)

Pricing

Pay per event

0

Total users

3

Monthly users

3

Runs succeeded

>99%

Last modified

an hour ago

Scrape Contact and Person Data from a Single Web Page (JavaScript)

This template scrapes a single web page in Node.js using Apify SDK. The actor accepts a URL as input, fetches the page with Axios, parses it with Cheerio, and extracts structured person and general contact information. The results are stored in a dataset where each object has the same attributes.

Instead of just parsing headings, this actor looks for names, positions, email addresses, and phone numbers (including Swiss formats) anywhere on the page.

Included features

  • Apify SDK – toolkit for building Actors
  • Input schema – defines and validates a schema for the actor’s input (URL, maxPages, onlyRelevantPages)
  • Dataset – store structured data where each object has the same attributes
  • Axios client – promise-based HTTP client for Node.js
  • Cheerio – library for parsing and traversing HTML

How it works

  1. Actor.getInput() retrieves the input object, which must include a non‐empty url. If url is empty or only whitespace, the actor exits without any output.

  2. axios.get(url) fetches the HTML of the given page.

  3. cheerio.load(response.data) loads the HTML so you can parse its textual content and element structure.

  4. The actor removes <script> and <style> elements, then extracts the full text content of the <body> and the page’s <title>.

  5. extractPersonData(text, pageUrl) scans the page text for:

    • Potential names (capitalized words of 1–4 parts, excluding lines containing digits or “@”).
    • Nearby position titles (matching a list of German/DACH and English keywords).
    • Nearby email addresses and phone numbers (Swiss +41 or German +49 patterns).
    • Each match yields an object { name, position?, email?, phone?, pageUrl }. Duplicates by name+email are removed.
  6. extractGeneralContact(text) collects all email addresses and phone numbers on the page, then filters out “general” addresses (e.g., info@, kontakt@, office@).

  7. The result object is assembled with:

    {
    url, // the page URL
    title, // <title> text
    scrapedAt: ISO timestamp,
    persons: [], // array of { name?, position?, email?, phone?, pageUrl }
    generalContact: {
    emails: [], // all emails found
    phones: [], // all phone numbers found
    generalEmails: [] // subset of emails deemed “general”
    }
    }
  8. Actor.pushData(result) stores that object in the default dataset. If an error occurs (timeout, network, parsing), the actor still pushes a record with { url, error: error.message, scrapedAt }.

  9. Finally, Actor.exit() ends the run.

Input schema

{
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The target web page to scrape (must be a non-empty string)."
},
"maxPages": {
"type": "integer",
"description": "Maximum number of pages to process (only relevant if you extend to multi-page).",
"default": 50
},
"onlyRelevantPages": {
"type": "boolean",
"description": "If true, only pages with certain keywords are scraped (for multi-page use).",
"default": false
}
},
"required": ["url"]
}
  • url (string, required): the single page to scrape.
  • maxPages and onlyRelevantPages exist to support multi-page crawling but do not affect single-page logic.

Example output

Each run produces one or more dataset items. A successful scrape yields at least one record like:

{
"url": "https://example.com/about",
"title": "About Us – Example Company",
"scrapedAt": "2025-06-02T14:35:12.345Z",
"persons": [
{
"name": "Gabriel Ziegler",
"position": "Geschäftsführer",
"email": "info@zieglerhaustechnik.ch",
"phone": "+41 78 966 88 41",
"pageUrl": "https://example.com/about"
},
{
"name": "Anna Meier",
"position": "Marketing Manager",
"email": "anna.meier@example.com",
"pageUrl": "https://example.com/about"
}
],
"generalContact": {
"emails": [
"info@zieglerhaustechnik.ch",
"kontakt@example.com"
],
"phones": [
"+41 44 123 45 67"
],
"generalEmails": [
"info@zieglerhaustechnik.ch"
]
}
}

If an error occurs, you might see:

{
"url": "https://invalid-url.test",
"error": "getaddrinfo ENOTFOUND invalid-url.test",
"scrapedAt": "2025-06-02T14:35:12.345Z"
}

Each field is guaranteed to exist, and fields like persons or generalContact.emails may be empty arrays if nothing is found.

Development and local testing

  1. Clone or pull the Actor

    apify pull <ActorId>
    cd <ActorDirectory>
  2. Install dependencies

    $npm install
  3. Run locally

    $npx apify run

    You will be prompted to provide the url in the INPUT.json file (edit it or pass via CLI).

  4. Inspect output After completion, check apify_storage/datasets/default/*.json for scraped records.