# 📧Extract Emails, Socials & Contacts from Any Website✨ (`jumbled_falcon/email-phone-social-finder-website`) Actor

Instantly extract emails, social media profiles, phone numbers, and contact details from any website. Save hours of manual research and build targeted lead lists effortlessly. Handles bulk lists of 1000+ websites. Extracts from contact pages, about pages, and homepage automatically.

- **URL**: https://apify.com/jumbled\_falcon/email-phone-social-finder-website.md
- **Developed by:** [Sept Solutions](https://apify.com/jumbled_falcon) (community)
- **Categories:** Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 website processeds

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Website Contact & Social Extractor

Apify Actor that extracts contact information, social media links, and key page URLs from websites. Built with Crawlee `PlaywrightCrawler` and migrated from a production Puppeteer extraction pipeline.

### Features

- **Email extraction** — scans visible page text and `mailto:` links, deduplicates and normalizes addresses
- **Phone extraction** — matches US-style numbers in body text and `tel:` links, formats as `(AAA) BBB-CCCC` with optional `+1` prefix when explicitly present
- **Social links** — finds the first link for LinkedIn, Facebook, Instagram, Twitter/X, YouTube, TikTok, Pinterest, Snapchat, WhatsApp, Telegram, and Skype
- **Contact & about pages** — discovers and records the first contact and about page URLs on the homepage
- **Sub-page crawling** — follows same-origin links matching configurable keywords (default: `contact`, `about`, `locations`) and merges data from up to `maxLinkPages` sub-pages
- **Concurrency** — processes multiple websites in parallel via Apify/Crawlee autoscaled pool
- **Anti-bot handling** — optional stealth plugin, browser hardening, Cloudflare challenge wait, and Crawlee `handleCloudflareChallenge`
- **Resource optimization** — blocks images, media, fonts, and stylesheets on the main page (safe for text/href extraction)
- **Per-URL error isolation** — a failed URL does not stop the rest of the run

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `websiteUrls` | `string[]` | *(required)* | Websites to analyze. `https://` is added automatically if missing. |
| `maxConcurrency` | `integer` | `5` | Max parallel browser tabs |
| `maxLinkPages` | `integer` | `5` | Max contact/about/location sub-pages per site |
| `requestTimeoutSecs` | `integer` | `30` | Main page navigation timeout (seconds) |
| `stealth` | `boolean` | `true` | Enable stealth plugin and browser hardening |
| `blockHeavyResources` | `boolean` | `true` | Block images, media, fonts, stylesheets on main page |
| `retries` | `integer` | `2` | Retries after first attempt (2 = up to 3 total tries) |
| `retryDelayMs` | `integer` | `2000` | Delay between retries (milliseconds) |
| `finderKeywords` | `string[]` | `["contact","about","locations"]` | Keywords matched in sub-page link hrefs |

#### Example input

```json
{
  "websiteUrls": [
    "https://example.com",
    "https://example.org"
  ],
  "maxConcurrency": 5,
  "maxLinkPages": 5,
  "requestTimeoutSecs": 30,
  "stealth": true,
  "blockHeavyResources": true,
  "retries": 2
}
````

### Output

One dataset item per input URL.

#### Success example

```json
{
  "url": "https://example.com",
  "title": "Example Domain",
  "phones": ["(555) 123-4567"],
  "emails": ["info@example.com"],
  "linkedin": "",
  "facebook": "",
  "instagram": "",
  "twitter": "",
  "youtube": "",
  "tiktok": "",
  "pinterest": "",
  "snapchat": "",
  "whatsapp": "",
  "telegram": "",
  "skype": "",
  "contact_page_url": "https://example.com/contact",
  "about_page_url": "https://example.com/about"
}
```

#### Failure example

```json
{
  "url": "https://unreachable.example",
  "error": "page.goto: Timeout 30000ms exceeded."
}
```

#### Output fields

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Input website URL |
| `title` | string | HTML `<title>` |
| `phones` | string\[] | US-formatted phone numbers |
| `emails` | string\[] | Deduplicated emails |
| `linkedin` … `skype` | string | First matching social link (empty if none) |
| `contact_page_url` | string | First contact page href found |
| `about_page_url` | string | First about page href found |
| `error` | string | Present only when extraction failed |

### Usage

#### Apify Console

1. Open the Actor in [Apify Console](https://console.apify.com).
2. Paste your input JSON.
3. Click **Start**.
4. Download results from the **Dataset** tab (JSON, CSV, Excel).

#### Apify API

```bash
curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~website-contact-extractor/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"websiteUrls":["https://example.com"]}'
```

#### Apify CLI

```bash
apify call YOUR_USERNAME/website-contact-extractor --input '{"websiteUrls":["https://example.com"]}'
```

### Local development

#### Prerequisites

- Node.js 18+
- [Apify CLI](https://docs.apify.com/cli) (optional, recommended)

#### Setup

```bash
cd backend
npm install
```

#### Run locally

Create `storage/key_value_stores/default/INPUT.json`:

```json
{
  "websiteUrls": ["https://example.com"],
  "maxConcurrency": 1,
  "stealth": true
}
```

Then run:

```bash
npm start
```

Or with Apify CLI:

```bash
apify run -p
```

Results are written to `storage/datasets/default/`.

### Apify deployment

```bash
cd backend
apify login
apify push
```

The Actor uses the `apify/actor-node-playwright-chrome:20` Docker image defined in `Dockerfile`.

### Actor Store description

**Website Contact & Social Extractor** enriches lead lists and company databases by automatically collecting emails, phone numbers, social profiles, and contact/about page URLs from any website.

Ideal for:

- **Lead generation** — build contact lists from company websites
- **Sales enrichment** — add phones and social links to CRM records
- **Market research** — collect public contact data at scale
- **Due diligence** — verify how businesses present contact information online

Runs fully in the cloud on Apify with configurable concurrency, retries, and anti-bot options.

### Limitations

- **US phone bias** — phone formatting targets US numbers; international numbers may appear unformatted
- **Same-origin sub-pages only** — contact/about/location links on external domains are not followed
- **Static extraction** — reads rendered DOM text and links; does not execute custom per-site scraping logic
- **Bot-protected sites** — heavily protected sites (Cloudflare, CAPTCHA) may return partial or empty results
- **No deep crawl** — only the homepage plus up to `maxLinkPages` keyword-matched sub-pages are visited
- **First-match social links** — returns the first anchor per platform, not all profiles

### Project structure

```
backend/
├── .actor/           # Apify Actor definition and schemas
├── src/
│   ├── main.js       # Actor entry point
│   ├── crawler.js    # PlaywrightCrawler setup
│   ├── extractors.js # Page-level extraction
│   ├── link-pages.js # Sub-page discovery and extraction
│   ├── result-merger.js
│   ├── browser-hooks.js
│   ├── constants.js
│   ├── utils.js
│   └── config.js
├── Dockerfile
├── package.json
└── README.md
```

### License

ISC

# Actor input Schema

## `websiteUrls` (type: `array`):

One URL per line. Protocol is optional — https:// is added automatically.

## Actor input object example

```json
{
  "websiteUrls": [
    "https://ny-hvac.com",
    "https://aristair.com"
  ]
}
```

# Actor output Schema

## `contacts` (type: `string`):

Extracted contact details for each website URL.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "websiteUrls": [
        "https://ny-hvac.com",
        "https://aristair.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("jumbled_falcon/email-phone-social-finder-website").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "websiteUrls": [
        "https://ny-hvac.com",
        "https://aristair.com",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("jumbled_falcon/email-phone-social-finder-website").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "websiteUrls": [
    "https://ny-hvac.com",
    "https://aristair.com"
  ]
}' |
apify call jumbled_falcon/email-phone-social-finder-website --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jumbled_falcon/email-phone-social-finder-website",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "📧Extract Emails, Socials & Contacts from Any Website✨",
        "description": "Instantly extract emails, social media profiles, phone numbers, and contact details from any website. Save hours of manual research and build targeted lead lists effortlessly. Handles bulk lists of 1000+ websites. Extracts from contact pages, about pages, and homepage automatically.",
        "version": "1.0",
        "x-build-id": "rEicmZQGAmzDgjqHi"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jumbled_falcon~email-phone-social-finder-website/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jumbled_falcon-email-phone-social-finder-website",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jumbled_falcon~email-phone-social-finder-website/runs": {
            "post": {
                "operationId": "runs-sync-jumbled_falcon-email-phone-social-finder-website",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jumbled_falcon~email-phone-social-finder-website/run-sync": {
            "post": {
                "operationId": "run-sync-jumbled_falcon-email-phone-social-finder-website",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "websiteUrls"
                ],
                "properties": {
                    "websiteUrls": {
                        "title": "Website URLs",
                        "minItems": 1,
                        "uniqueItems": true,
                        "type": "array",
                        "description": "One URL per line. Protocol is optional — https:// is added automatically.",
                        "items": {
                            "type": "string",
                            "minLength": 1,
                            "pattern": "^(https?://)?[\\w.-]+\\.[a-zA-Z]{2,}(/.*)?$"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
