# Email Scraper / Extractor (`c0nst/email-scraper`) Actor

Extract unique email addresses from a list of websites. The scraper visits each URL, follows internal links, and returns one result per site with all emails found — duplicates removed, known placeholder and schema emails excluded so you only pay for real results.

- **URL**: https://apify.com/c0nst/email-scraper.md
- **Developed by:** [Kostiantyn](https://apify.com/c0nst) (community)
- **Categories:** Lead generation, E-commerce, Automation
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $100.00 / 1,000 site scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Email Scraper

Extract unique email addresses from a list of websites. The scraper visits each URL, follows internal links, and returns one result per site with all emails found — duplicates removed, known placeholder and schema emails excluded so you only pay for real results.

### What does Email Scraper do?

**Email Scraper** visits the URLs you provide, crawls their internal pages, and extracts every email address found in the raw HTML — including those inside `mailto:` links, plain text, and hidden fields. Results are grouped by site, deduplicated, and stored in a structured dataset you can download as JSON, CSV, or Excel.

Running on [Apify](https://apify.com) gives you automatic proxy rotation, scheduling, API access, and cloud storage — no infrastructure to manage.

### Why use Email Scraper?

- **Lead generation** — collect contact emails from a list of target company websites
- **Sales prospecting** — find decision-maker emails from directories or partner pages
- **Data enrichment** — augment a list of domains with their publicly listed contact addresses
- **Outreach campaigns** — build verified email lists from niche industry sites
- **Research** — map contact information across a set of websites at scale

### How to use Email Scraper

1. Sign in to [Apify Console](https://console.apify.com) and open the Actor.
2. Paste your target URLs into the **Start URLs** field.
3. Click **Start** and wait for the run to finish.
4. Open the **Output** tab to view and download your results.

### Input

| Field | Type | Description |
|---|---|---|
| `startUrls` | array | URLs to scrape (required) |

Example:

```json
{
    "startUrls": [
        { "url": "https://acme.com" },
        { "url": "https://globex.com" },
        { "url": "https://initech.com" }
    ]
}
````

### Output

One dataset row is produced per input URL, containing all unique emails found across every page crawled under that site:

```json
[
    {
        "startUrl": "https://acme.com",
        "emails": [
            "sales@acme.com",
            "support@acme.com",
            "ceo@acme.com"
        ],
        "emailCount": 3,
        "pagesScraped": 12,
        "scrapedAt": "2026-04-13T20:00:00.000Z"
    },
    {
        "startUrl": "https://globex.com",
        "emails": [
            "contact@globex.com"
        ],
        "emailCount": 1,
        "pagesScraped": 8,
        "scrapedAt": "2026-04-13T20:00:05.000Z"
    }
]
```

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel from the **Output** tab or via the Apify API.

### Data table

| Field | Format | Description |
|---|---|---|
| `startUrl` | URL | The input URL this result belongs to |
| `emails` | array | Unique, lowercased email addresses found across all crawled pages |
| `emailCount` | number | Total number of unique emails found |
| `pagesScraped` | number | Number of pages that contained at least one email |
| `scrapedAt` | ISO date | Timestamp of when post-processing completed |

### Pricing

This Actor uses **Pay per event** pricing — you are charged per site that returns at least one email. Sites that are crawled but yield no results are free.

| Event | When charged |
|---|---|
| `site-scraped` | Once per input URL that produced emails |

Your cost scales with useful output, not with how many pages were crawled. A run over 10 sites where 7 return emails = 7 charges.

### Advanced settings

For most use cases the defaults work well. If you need to fine-tune crawl behaviour, the following settings are available under **Advanced settings** in the Console:

| Field | Default | Description |
|---|---|---|
| `maxDepth` | `1` | How many link-levels deep to follow. `0` = start URL only, `1` = all directly linked pages |
| `maxRequestsPerSite` | `100` | Max pages crawled per site. `0` = unlimited |

**Depth guide:**

- `0` — use when your URLs already point at a contact or about page
- `1` — recommended for most sites; covers Contact, About, Team pages linked from the homepage
- `2` — thorough crawl including blog posts, product pages and sub-sections; runs take longer

### FAQ and disclaimers

**Is this legal?**
Email Scraper only collects data that is publicly visible in the HTML of websites you provide. You are responsible for ensuring your use complies with the target site's Terms of Service, GDPR, CAN-SPAM, and any other applicable laws.

**Why are some emails missing?**
This Actor makes plain HTTP requests without running JavaScript. Emails injected into the page by JS (contact forms, dynamic "mailto" links, obfuscation scripts) won't be captured. If a site's Contact page scrapes clean but returns no emails, this is the likely cause. A future v2 will handle JS-rendered content.

**I'm seeing placeholder or test emails in the results.**
Common false positives (e.g. `user@example.com`, schema.org addresses, image filenames) are filtered out automatically. If you encounter others, open an issue.

**Need a custom solution?**
Open an issue in the repository or contact us for enterprise scraping requirements.

# Actor input Schema

## `startUrls` (type: `array`):

List of website URLs to scrape for email addresses.

## `maxDepth` (type: `integer`):

How many link-levels deep to follow from each start URL. 0 = only the start URL itself, 1 = start URL + all directly linked pages (recommended). 2 = goes deeper into blog posts, product pages and sub-sections — more thorough but much slower.

## `maxRequestsPerSite` (type: `integer`):

Maximum number of pages to crawl per site. Applied independently to each URL in the list — so 200 means up to 200 pages per site, not 200 total. Set to 0 for unlimited (use with caution at depth ≥ 2).

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxDepth": 1,
  "maxRequestsPerSite": 100
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("c0nst/email-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://example.com" }] }

# Run the Actor and wait for it to finish
run = client.actor("c0nst/email-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ]
}' |
apify call c0nst/email-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=c0nst/email-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Email Scraper / Extractor",
        "description": "Extract unique email addresses from a list of websites. The scraper visits each URL, follows internal links, and returns one result per site with all emails found — duplicates removed, known placeholder and schema emails excluded so you only pay for real results.",
        "version": "0.0",
        "x-build-id": "UCBZ2fNesTS9VzGVH"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/c0nst~email-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-c0nst-email-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/c0nst~email-scraper/runs": {
            "post": {
                "operationId": "runs-sync-c0nst-email-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/c0nst~email-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-c0nst-email-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of website URLs to scrape for email addresses.",
                        "default": [
                            {
                                "url": "https://example.com"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxDepth": {
                        "title": "Max Crawl Depth",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "How many link-levels deep to follow from each start URL. 0 = only the start URL itself, 1 = start URL + all directly linked pages (recommended). 2 = goes deeper into blog posts, product pages and sub-sections — more thorough but much slower.",
                        "default": 1
                    },
                    "maxRequestsPerSite": {
                        "title": "Max Requests Per Site",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl per site. Applied independently to each URL in the list — so 200 means up to 200 pages per site, not 200 total. Set to 0 for unlimited (use with caution at depth ≥ 2).",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
