# Website Contact Data Extractor (`techionik9993/website-contact-data-extractor`) Actor

Extract public business contact data from websites, including validated emails, phone numbers, contact/about pages, and social profiles. Delivers clean, deduplicated JSON output for CRM enrichment, lead generation, prospecting, research, and automation workflows.

- **URL**: https://apify.com/techionik9993/website-contact-data-extractor.md
- **Developed by:** [Techionik](https://apify.com/techionik9993) (community)
- **Categories:** Automation, Developer tools, Lead generation
- **Stats:** 13 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.28 / result

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Website Contact Data Extractor

Extract publicly available business contact data from company websites in a clean, structured, and automation-ready format.

Website Contact Data Extractor is built for CRM enrichment, lead generation, prospecting, business research, data collection, and automation workflows. It extracts validated emails, phone numbers, contact pages, about pages, and major social media links from business websites.

### What This Actor Does

Website Contact Data Extractor helps you collect public contact and social information from websites without manually checking each page.

For each website, it can extract:

- Business email addresses
- Public phone numbers
- Contact page URL
- About page URL
- Facebook profile link
- Instagram profile link
- LinkedIn profile or company page
- Twitter / X profile link
- YouTube channel link
- Website domain

The output is clean, deduplicated, and returned in a consistent JSON structure.

### Best For

- Lead generation
- Sales prospecting
- CRM enrichment
- Business contact discovery
- Company research
- Market research
- Competitive analysis
- Website intelligence collection
- Contact database building
- Automation workflows using Apify, Make, n8n, Zapier, Google Sheets, or custom APIs

### Data Extracted

Each processed website returns the following fields:

| Field | Description |
|---|---|
| domain | Root domain of the website |
| emails | Valid public email addresses found on the website |
| phones | Public phone numbers found on the website |
| contactPage | Detected contact page URL |
| aboutPage | Detected about page URL |
| facebook | Facebook profile/page URL |
| instagram | Instagram profile URL |
| linkedin | LinkedIn profile/company URL |
| twitter | Twitter / X profile URL |
| youtube | YouTube channel URL |

### How It Works

1. Website Contact Data Extractor starts from the website URLs you provide.
2. It loads each website using Crawlee and CheerioCrawler.
3. It scans the page for public email addresses and phone numbers.
4. It checks mailto links, visible page text, tel links, footer content, and JSON-LD structured data.
5. It discovers contact and about pages from internal links.
6. It follows selected same-domain pages to improve contact discovery.
7. It detects major social media links.
8. It removes duplicate emails and phone numbers.
9. It filters invalid emails, invalid phone-like strings, share links, policy links, and support/social noise.
10. It saves clean structured results to the Apify dataset.

### Input Options

#### Website URLs

Add one or more website URLs to extract contact information from.

Example input:

    {
      "startUrls": [
        {
          "url": "https://www.example.com"
        }
      ]
    }

You can process one website or multiple websites in the same run.

### Output Example

Example output item:

    {
      "domain": "example.com",
      "emails": ["info@example.com"],
      "phones": ["+1-800-123-4567"],
      "contactPage": "https://example.com/contact",
      "aboutPage": "https://example.com/about",
      "facebook": "https://facebook.com/example",
      "instagram": "https://instagram.com/example",
      "linkedin": "https://linkedin.com/company/example",
      "twitter": "https://x.com/example",
      "youtube": "https://youtube.com/@example"
    }

If a specific field is not found, it may return null or an empty array depending on the field type.

### Key Features

- Extracts public business emails
- Extracts public phone numbers
- Detects contact pages
- Detects about pages
- Extracts major social media links
- Parses JSON-LD structured data
- Reads mailto and tel links
- Uses footer-based phone detection
- Uses same-domain page discovery
- Cleans and validates extracted emails
- Filters invalid phone-like values
- Removes duplicate results
- Avoids social share and policy links
- Returns structured JSON output
- Simple input configuration
- Easy integration with automation tools

### Supported Social Platforms

Website Contact Data Extractor can detect links from:

- Facebook
- Instagram
- LinkedIn
- Twitter / X
- YouTube

### Typical Use Cases

#### CRM Enrichment

Find public emails, phone numbers, and social profiles to complete company records in your CRM.

#### Lead Generation

Collect publicly available business contact data from company websites for prospecting workflows.

#### Sales Prospecting

Build structured contact datasets that can be used for outreach preparation, research, and qualification.

#### Business Research

Gather contact pages, about pages, and social links to better understand companies and their online presence.

#### Automation Pipelines

Send extracted data to Apify integrations, Google Sheets, Make, n8n, Zapier, databases, or custom APIs.

#### Market Research

Analyze company websites and collect public contact signals at scale.

### Recommended Usage

#### For One Website

Use one start URL when you only need contact data from a single company website.

Example:

    https://www.example.com

#### For Multiple Websites

Add multiple URLs in the input to process many company domains in one run.

Example:

    {
      "startUrls": [
        { "url": "https://www.example1.com" },
        { "url": "https://www.example2.com" },
        { "url": "https://www.example3.com" }
      ]
    }

### Output Access

After the run finishes, you can access results from:

- Apify Dataset
- Dataset API
- Overview table
- Raw JSON output
- CSV export
- Excel export
- JSON export
- XML export
- Apify integrations
- Webhooks

### Data Quality Approach

Website Contact Data Extractor is designed to prioritize clean and reliable results.

It uses:

- Strict email format validation
- Phone number length checks
- Phone normalization
- Duplicate removal
- Same-domain crawling
- Contact/about page discovery
- Social URL sanitization
- Filtering for share, help, privacy, policy, and support links

This helps reduce noisy results and keeps the output useful for professional workflows.

### Notes and Limitations

- Only publicly available website information is collected.
- Websites that hide contact details behind forms may return limited data.
- Websites that require login are not supported.
- Heavily JavaScript-rendered websites may return fewer results.
- Some companies may not publish emails or phone numbers directly on their website.
- The Actor focuses on quality and stability rather than aggressive deep crawling.
- Results depend on how the target website structures its public information.

### Why Use Website Contact Data Extractor

Website Contact Data Extractor saves time by automatically collecting public contact and social information from business websites.

It is useful for teams and professionals who need structured contact data for CRM enrichment, prospecting, sales research, market research, reporting, and automation workflows.

### Technology

Built with:

- Apify SDK
- Crawlee
- CheerioCrawler
- Cheerio

### Status

Production-ready for public website contact and social information extraction.

# Actor input Schema

## `startUrls` (type: `array`):

Enter one or more website URLs to extract contact information from.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.cooley.com",
      "description": "Example website (you can change this)"
    }
  ]
}
````

# Actor output Schema

## `overview` (type: `string`):

No description

## `json` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.cooley.com",
            "description": "Example website (you can change this)"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("techionik9993/website-contact-data-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{
            "url": "https://www.cooley.com",
            "description": "Example website (you can change this)",
        }] }

# Run the Actor and wait for it to finish
run = client.actor("techionik9993/website-contact-data-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.cooley.com",
      "description": "Example website (you can change this)"
    }
  ]
}' |
apify call techionik9993/website-contact-data-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=techionik9993/website-contact-data-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Website Contact Data Extractor",
        "description": "Extract public business contact data from websites, including validated emails, phone numbers, contact/about pages, and social profiles. Delivers clean, deduplicated JSON output for CRM enrichment, lead generation, prospecting, research, and automation workflows.",
        "version": "0.0",
        "x-build-id": "bktJnLPKjUWfxNtWF"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/techionik9993~website-contact-data-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-techionik9993-website-contact-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/techionik9993~website-contact-data-extractor/runs": {
            "post": {
                "operationId": "runs-sync-techionik9993-website-contact-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/techionik9993~website-contact-data-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-techionik9993-website-contact-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Website URLs",
                        "type": "array",
                        "description": "Enter one or more website URLs to extract contact information from.",
                        "items": {
                            "type": "object",
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "Website URL",
                                    "description": "Full website URL including https://"
                                }
                            },
                            "required": [
                                "url"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
