# Websites Html Email Scraper (`jazzy_projector/email-scraper-apify`) Actor

Extract email addresses from any website at scale. Crawls multiple pages per domain, deduplicates results, filters false positives, and exports a clean dataset ready for outreach. Just Provide list of urls and let it work for you!

- **URL**: https://apify.com/jazzy\_projector/email-scraper-apify.md
- **Developed by:** [Ben salem yosri](https://apify.com/jazzy_projector) (community)
- **Categories:** Lead generation, Automation, Developer tools
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $12.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 📧 Email Scraper

Extract email addresses from any website — automatically, at scale.

This actor crawls websites page by page and pulls every email address it can find: from `mailto:` links, visible text, meta tags, HTML attributes, and inline JavaScript. Give it a list of URLs and it returns a clean, deduplicated dataset of emails ready to export.

---

### 🚀 What it does

Starting from your list of URLs, the actor:

1. Visits each website and scans the page for email addresses
2. Follows internal links to crawl deeper (up to your configured limit)
3. Deduplicates emails so you never see the same address twice
4. Records exactly which page each email was found on
5. Saves everything to a structured dataset you can export as CSV, JSON, or XLSX

It automatically skips social media platforms, review sites, and other domains that never contain useful contact emails (Amazon, LinkedIn, Yelp, etc.) — saving you time and credits.

---

### 📥 Input

| Field | Type | Default | Description |
|---|---|---|---|
| `startUrls` | Array | — | **Required.** The websites to scrape |
| `maxPagesPerSite` | Integer | `10` | How many pages to crawl per domain |
| `maxConcurrency` | Integer | `5` | Parallel requests (higher = faster, more resource use) |
| `skipDomains` | Array | *(see below)* | Domain keywords to skip entirely |
| `proxyConfiguration` | Object | disabled | Use Apify Proxy to avoid blocks |

**Domains skipped by default:**
`amazon`, `yelp`, `facebook`, `instagram`, `reddit`, `twitter`, `linkedin`, `youtube`, `tiktok`, `pinterest`, `snapchat`, `google`, `apple`, `microsoft`, `wikipedia`, `tripadvisor`, `bbb`, `yellowpages`, `maps`, `bing`, `yahoo`, `trustpilot`, `glassdoor`, `indeed`

You can override this list entirely in the input if needed.

---

### 📤 Output

Each email found is saved as a row in the dataset:

```json
{
  "sourceUrl":   "https://acme.com",
  "rootDomain":  "acme.com",
  "email":       "hello@acme.com",
  "foundOnPage": "https://acme.com/contact",
  "scrapedAt":   "2025-06-01T09:15:00.000Z"
}
````

Export the full results from the Apify Console in **CSV**, **JSON**, **XLSX**, or **XML** with one click.

***

### 💡 Example input

```json
{
  "startUrls": [
    { "url": "https://company-a.com" },
    { "url": "https://company-b.io" },
    { "url": "https://agency-c.co.uk" }
  ],
  "maxPagesPerSite": 20,
  "maxConcurrency": 5,
  "proxyConfiguration": { "useApifyProxy": true }
}
```

***

### 🔍 Where emails are extracted from

The actor scans all of the following on every page:

- `mailto:` links (most reliable source)
- All visible text content
- Meta tag `content` attributes
- HTML `data-*` attributes and form field values
- Inline `<script>` blocks (emails stored in JS variables)

False positives (placeholder emails like `user@example.com`, image filenames, etc.) are automatically filtered out.

***

### ⚙️ Tips for best results

- **Set `maxPagesPerSite` to 20–50** for thorough coverage of larger sites. Contact, About, and Team pages are crawled automatically.
- **Enable Apify Proxy** if you're scraping sites with bot protection.
- **Lower `maxConcurrency`** if you're hitting rate limits on sensitive domains.
- The actor respects site structure and only follows **internal links** — it won't wander off to third-party domains.

***

### 🛠️ Local development

```bash
npm install

## Create input file
mkdir -p storage/key_value_stores/default
echo '{"startUrls":[{"url":"https://yoursite.com"}],"maxPagesPerSite":5}' \
  > storage/key_value_stores/default/INPUT.json

npm start
```

***

### 📄 License

MIT

# Actor input Schema

## `startUrls` (type: `array`):

List of URLs to start scraping from. Each URL will be treated as a separate website.

## `maxPagesPerSite` (type: `integer`):

Maximum number of pages to crawl per website.

## `maxConcurrency` (type: `integer`):

Maximum number of pages to process in parallel.

## `skipDomains` (type: `array`):

List of domain keywords to skip (e.g. 'amazon', 'facebook').

## `proxyConfiguration` (type: `object`):

Optional proxy configuration.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPagesPerSite": 10,
  "maxConcurrency": 5,
  "skipDomains": [
    "amazon",
    "yelp",
    "facebook",
    "instagram",
    "reddit",
    "twitter",
    "linkedin",
    "youtube",
    "tiktok",
    "pinterest",
    "snapchat",
    "google",
    "apple",
    "microsoft",
    "wikipedia",
    "tripadvisor",
    "bbb",
    "yellowpages",
    "maps",
    "bing",
    "yahoo",
    "trustpilot",
    "glassdoor",
    "indeed"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com"
        }
    ],
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("jazzy_projector/email-scraper-apify").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://example.com" }],
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("jazzy_projector/email-scraper-apify").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call jazzy_projector/email-scraper-apify --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jazzy_projector/email-scraper-apify",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Websites Html Email Scraper",
        "description": "Extract email addresses from any website at scale. Crawls multiple pages per domain, deduplicates results, filters false positives, and exports a clean dataset ready for outreach. Just Provide list of urls and let it work for you!",
        "version": "0.0",
        "x-build-id": "gsMmgQcNdeczmFCM1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jazzy_projector~email-scraper-apify/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jazzy_projector-email-scraper-apify",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jazzy_projector~email-scraper-apify/runs": {
            "post": {
                "operationId": "runs-sync-jazzy_projector-email-scraper-apify",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jazzy_projector~email-scraper-apify/run-sync": {
            "post": {
                "operationId": "run-sync-jazzy_projector-email-scraper-apify",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of URLs to start scraping from. Each URL will be treated as a separate website.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPagesPerSite": {
                        "title": "Max pages per site",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl per website.",
                        "default": 10
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of pages to process in parallel.",
                        "default": 5
                    },
                    "skipDomains": {
                        "title": "Skip domains",
                        "type": "array",
                        "description": "List of domain keywords to skip (e.g. 'amazon', 'facebook').",
                        "default": [
                            "amazon",
                            "yelp",
                            "facebook",
                            "instagram",
                            "reddit",
                            "twitter",
                            "linkedin",
                            "youtube",
                            "tiktok",
                            "pinterest",
                            "snapchat",
                            "google",
                            "apple",
                            "microsoft",
                            "wikipedia",
                            "tripadvisor",
                            "bbb",
                            "yellowpages",
                            "maps",
                            "bing",
                            "yahoo",
                            "trustpilot",
                            "glassdoor",
                            "indeed"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional proxy configuration."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
