# Deep Email, Phone & Social Media Scraper (`trakk/deep-email-phone-social-media-scraper-search`) Actor

Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.

- **URL**: https://apify.com/trakk/deep-email-phone-social-media-scraper-search.md
- **Developed by:** [Blynx](https://apify.com/trakk) (community)
- **Categories:** Lead generation, Automation, Developer tools
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.80 / 1,000 website contact leads

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Deep Email, Phone & Social Media Scraper

Find public business contacts from any list of websites. This actor crawls company sites with HTTP requests only, discovers likely contact pages, extracts emails, phone numbers, social profiles, logos, and business metadata, then returns clean lead records ready for CRM, outreach, enrichment, or research.

No browser. No Playwright. No Puppeteer. Just fast HTTP crawling, Chrome-like requests, smart contact-page discovery, and clean Apify dataset output.

Use it when you have a list of company websites and need answers like:

- What is the best public email for this company?
- Does the website publish phone numbers?
- Which social and messaging channels are linked from the site?
- What is the company name, logo, address, or legal metadata?
- Which page was the contact found on?

---

### What this actor extracts

#### Contacts

- Emails from `mailto:` links, visible text, HTML, Cloudflare protected emails, JSON-LD, and common `[at] / dot` obfuscation
- Phone numbers from `tel:` links, page text, and JSON-LD
- E.164 phone normalization when the number is valid
- Best email, best phone, confidence score, and source page

#### Social and messaging channels

- Facebook
- Instagram
- X / Twitter
- YouTube
- TikTok
- Pinterest
- Reddit
- Snapchat
- Discord
- Twitch
- GitHub
- Medium
- WhatsApp
- Telegram
- Yelp
- Tripadvisor
- App Store
- Google Play
- Amazon, Etsy, and eBay store/profile links

LinkedIn, Trustpilot, Google Maps, and Threads are intentionally excluded to keep the output focused and avoid duplicating separate scrapers.

#### Business and brand data

- Company name
- Legal name
- Website title and meta description
- Addresses from structured data when present
- Opening hours from structured data when present
- VAT / tax IDs when visible
- Registration numbers when visible
- Best logo URL
- Favicon
- Apple touch icon
- OpenGraph image
- Twitter card image
- Source evidence for logo/contact extraction when enabled

---

### Common use cases

- Lead generation from company website lists
- CRM enrichment
- Agency prospecting
- B2B sales research
- Directory enrichment
- Supplier and vendor research
- Startup, SaaS, ecommerce, and local business contact collection
- Marketing outreach preparation
- Checking whether company websites publish contact details
- Building internal contact intelligence datasets

---

### Quick start in Apify Console

1. Open the actor in Apify Console.
2. Paste websites into **Website URL(s) or domain(s)**.
3. For the first test, set **Number of websites to process** to `10`.
4. Keep **Pages per website** between `3` and `10`.
5. Keep **Stop early when enough contacts are found** enabled to save cost.
6. Run the actor.
7. Open the default dataset and export results as JSON, CSV, Excel, XML, RSS, or HTML.

You can paste full URLs or plain domains. These are all valid:

```text
example.com
https://example.com
https://www.example.com/contact
````

The actor normalizes plain domains to HTTPS automatically.

***

### Recommended input examples

#### Simple website list

Use this for normal lead enrichment.

```json
{
  "startUrls": [
    { "url": "https://www.cloudflare.com/resource/contact-enterprise-sales/" },
    { "url": "https://www.bluehost.com/contact" },
    { "url": "https://www.wolfssl.com/contact/" }
  ],
  "maxDomains": 10,
  "maxPagesPerDomain": 5,
  "maxDepth": 1,
  "stopWhenFound": true,
  "outputMode": "summary",
  "compactOutput": true
}
```

#### Deeper contact discovery

Use this when websites are messy and contacts may be on support, about, legal, team, or office pages.

```json
{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "maxDomains": 1,
  "maxPagesPerDomain": 15,
  "maxDepth": 2,
  "useSitemap": true,
  "stopWhenFound": false,
  "includeEvidence": true,
  "outputMode": "summary"
}
```

#### Import websites from another dataset

Use this when a previous actor produced a dataset with website URLs.

```json
{
  "datasetId": "YOUR_DATASET_ID",
  "maxDomains": 1000,
  "maxPagesPerDomain": 5,
  "outputMode": "summary"
}
```

The actor reads these fields from input dataset items:

```text
website, url, domain, companyUrl, sourceUrl, finalUrl, startUrl
```

#### Evidence and audit mode

Use this when you want to see where each email, phone, social profile, or logo came from.

```json
{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "includeEvidence": true,
  "outputMode": "summary",
  "compactOutput": true
}
```

#### Page-level debugging mode

Use this when you want one dataset row per crawled page.

```json
{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "outputMode": "pages",
  "includeEvidence": true
}
```

***

### Input fields

| Field | Type | Default | Description |
|---|---:|---:|---|
| `startUrls` | array | sample URLs | Websites or pages to scan. Use full URLs or plain domains. |
| `datasetId` | string | empty | Optional Apify dataset containing website/domain fields. |
| `maxDomains` | integer | `1` | Safety cap for unique websites processed from all inputs. Raise it for real batches. |
| `maxPagesPerDomain` | integer | `5` | Maximum pages fetched per website. Start with `3-10`. |
| `maxDepth` | integer | `1` | Link depth. `0` = start page only, `1` = linked contact/about pages, `2` = one more layer. |
| `stopWhenFound` | boolean | `true` | Stops early when a strong email plus phone or socials are found. Saves time and cost. |
| `extractEmails` | boolean | `true` | Extract normal, obfuscated, `mailto`, JSON-LD, and Cloudflare protected emails. |
| `extractPhones` | boolean | `true` | Extract and validate phone numbers. |
| `extractSocials` | boolean | `true` | Extract social, messaging, marketplace, and app profile links. |
| `extractBrandAssets` | boolean | `true` | Extract logo, favicon, OpenGraph image, and related brand images. |
| `extractBusinessData` | boolean | `true` | Extract company name, legal name, addresses, opening hours, tax IDs, registration numbers. |
| `useSitemap` | boolean | `true` | Reads `sitemap.xml` and adds contact-like URLs. |
| `followSubdomains` | boolean | `false` | Allows crawling same-root subdomains, for example `help.example.com`. |
| `countryHint` | string | `US` | Default phone country for numbers without country prefix. |
| `outputMode` | string | `summary` | `summary`, `pages`, or `both`. |
| `includeEvidence` | boolean | `false` | Adds detailed source evidence objects. |
| `compactOutput` | boolean | `true` | Removes empty arrays, empty objects, nulls, and blank strings. |
| `maxConcurrency` | integer | `10` | Number of websites processed in parallel. |
| `requestTimeoutSec` | integer | `25` | HTTP request timeout per page. |
| `maxRetries` | integer | `3` | Retry budget for HTTP and connection errors. |
| `maxProxyRetries` | integer | `3` | Extra retry budget for proxy/transport failures. |
| `proxyConfiguration` | object | Apify Proxy off | Apify Proxy settings. Keep it off for cheap tests; enable it if target sites block direct requests. |
| `userAgent` | string | empty | Optional custom user agent. Leave empty for built-in Chrome-like headers. |

Accepted `countryHint` values:

```text
US, GB, DE, FR, ES, IT, NL, PL, IN, CA, AU, BR, MX
```

The API also accepts URL aliases for convenience:

```text
websites, start_urls, urls, domains
```

***

### Output modes

#### `summary`

Default mode. Returns one clean lead row per website. Best for exports, CRM import, enrichment, and normal use.

#### `pages`

Returns one row per crawled page. Best for debugging, QA, and checking which pages had contacts.

#### `both`

Returns summary rows plus page rows. Use only when you need both lead records and page-level evidence in the same dataset.

***

### Main output fields

#### Website identity

| Field | Description |
|---|---|
| `recordType` | Usually `domain` in summary mode. |
| `domain` | Final website host. |
| `rootDomain` | Root domain used for matching. |
| `startUrl` | Original normalized input URL. |
| `finalUrl` | Final URL after redirects. |
| `status` | `ok`, `no_contacts_found`, or `failed`. |
| `statusCode` | HTTP status code of the first successful page. |
| `pageTitle` | First useful page title. |
| `metaDescription` | First useful meta description. |
| `language` | HTML language attribute when present. |
| `countryHint` | Phone country hint used by the run. |
| `pagesCrawled` | Number of pages fetched for the website. |
| `pagesMatched` | Number of pages where contacts or socials were found. |
| `crawlDepthReached` | Deepest crawl depth reached. |

#### Emails

| Field | Description |
|---|---|
| `bestEmail` | Best ranked email for outreach. |
| `bestEmailType` | `sales`, `support`, `info`, `press`, `jobs`, `privacy`, `billing`, `personal`, or `unknown`. |
| `bestEmailConfidence` | `high`, `medium`, or `low`. |
| `emails` | Unique email list. |
| `emailDetails` | Detailed email evidence when `includeEvidence` is enabled. |

#### Phones

| Field | Description |
|---|---|
| `bestPhone` | Best ranked display phone. |
| `bestPhoneE164` | Normalized E.164 phone when possible. |
| `bestPhoneConfidence` | `high`, `medium`, or `low`. |
| `phones` | Unique valid phone list. |
| `phoneDetails` | Detailed phone evidence when `includeEvidence` is enabled. |

#### Social profiles

| Field | Description |
|---|---|
| `socialProfiles` | Unified list of `{ platform, url }` records. |
| `facebookUrls` | Facebook pages/profiles. |
| `instagramUrls` | Instagram profiles. |
| `twitterUrls` | X / Twitter profiles. |
| `youtubeUrls` | YouTube channels or handles. |
| `tiktokUrls` | TikTok profiles. |
| `whatsappUrls` | WhatsApp links. |
| `telegramUrls` | Telegram links. |
| `githubUrls` | GitHub organization/user profiles. |
| `mediumUrls` | Medium profiles. |
| Other `*Urls` fields | Additional supported social, app, or marketplace links. |

#### Brand and company data

| Field | Description |
|---|---|
| `companyName` | Company name from JSON-LD or site metadata. |
| `legalName` | Legal name when structured data provides it. |
| `bestLogoUrl` | Best logo/image candidate. |
| `logoUrl` | Same as `bestLogoUrl`, for convenient exports. |
| `logoSource` | `jsonLd`, `headerLogo`, `appleTouchIcon`, `openGraph`, `twitterCard`, or `favicon`. |
| `logoConfidence` | Confidence of selected logo. |
| `faviconUrl` | Favicon URL. |
| `appleTouchIconUrl` | Apple touch icon URL. |
| `openGraphImageUrl` | OpenGraph image URL. |
| `twitterImageUrl` | Twitter card image URL. |
| `brandImages` | List of likely brand images. |
| `addresses` | Structured addresses when present. |
| `openingHours` | Structured opening hours when present. |
| `taxIds` | Visible VAT/tax IDs when detected. |
| `vatIds` | Alias of `taxIds`. |
| `registrationNumbers` | Visible company registration numbers when detected. |

#### Evidence fields

These appear when `includeEvidence` is enabled:

| Field | Description |
|---|---|
| `bestContactPage` | Best page where useful contact data was found. |
| `sourcePages` | Crawled pages with counts of found emails, phones, and socials. |
| `contactEvidence` | Full contact evidence with value, source URL, page type, confidence, and context. |
| `imageEvidence` | Logo and image extraction evidence. |
| `warnings` | Non-fatal fetch or parsing warnings. |
| `errors` | Fatal run errors for that website. |

***

### Example output

```json
{
  "recordType": "domain",
  "domain": "wolfssl.com",
  "rootDomain": "wolfssl.com",
  "startUrl": "https://www.wolfssl.com/contact/",
  "finalUrl": "https://www.wolfssl.com/contact/",
  "status": "ok",
  "statusCode": 200,
  "companyName": "wolfSSL",
  "pagesCrawled": 5,
  "pagesMatched": 5,
  "bestEmail": "support@wolfssl.com",
  "bestEmailType": "support",
  "bestEmailConfidence": "high",
  "bestPhone": "+1 (425) 245-8247",
  "bestPhoneE164": "+14252458247",
  "bestPhoneConfidence": "high",
  "emails": [
    "support@wolfssl.com",
    "facts@wolfssl.com",
    "licensing@wolfssl.com"
  ],
  "phones": [
    "+1 (425) 245-8247"
  ],
  "socialProfiles": [
    { "platform": "X / Twitter", "url": "https://twitter.com/wolfssl" },
    { "platform": "Facebook", "url": "https://www.facebook.com/wolfssl" },
    { "platform": "GitHub", "url": "https://www.github.com/wolfssl" }
  ],
  "bestLogoUrl": "https://www.wolfssl.com/wordpress/wp-content/uploads/2020/12/cropped-wolfssl_logo_300px.png"
}
```

***

### Contact discovery logic

The actor does not crawl the whole website blindly. It prioritizes URLs that usually contain contacts:

```text
/contact
/contacts
/about
/team
/staff
/support
/help
/sales
/press
/media
/locations
/offices
/impressum
/imprint
/legal
/privacy
```

When `useSitemap` is enabled, it checks `sitemap.xml` and adds only contact-like URLs from the sitemap. This helps find contact pages that are not linked from the homepage.

***

### How ranking works

The actor returns all unique contacts, but it also chooses `bestEmail` and `bestPhone`.

Email ranking prefers:

1. Same-domain business emails
2. High-confidence sources like `mailto`, contact pages, support pages, and JSON-LD
3. Useful role emails like sales, info, support, and press
4. Lower-confidence text matches only when they look legitimate

Phone ranking prefers:

1. Valid phone numbers
2. Numbers matching the selected `countryHint`
3. Numbers found on contact, support, sales, or legal pages
4. Numbers that can be normalized to E.164

***

### Clean output and filtering

The actor filters common junk before writing results:

- Fake emails like `test@example.com`
- Asset-like strings that look like emails but are actually images or scripts
- Sentry, schema, Wix, and placeholder domains
- Personal/free-mail noise when it appears as unrelated text on a different company site
- Social share/login URLs
- LinkedIn, Trustpilot, Google Maps, and Threads links
- Empty arrays, empty objects, nulls, and blank strings when `compactOutput` is enabled

This keeps Apify's **All fields** view clean and makes CSV/Excel exports easier to use.

***

### How to run with Apify API

Replace `YOUR_TOKEN` with your Apify API token.

```bash
curl -X POST "https://api.apify.com/v2/acts/trakk~deep-email-phone-social-media-scraper-search/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": [
      { "url": "https://www.wolfssl.com/contact/" },
      { "url": "https://www.bluehost.com/contact" }
    ],
    "maxDomains": 2,
    "maxPagesPerDomain": 5,
    "outputMode": "summary"
  }'
```

To get dataset items after the run finishes:

```bash
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&format=json&token=YOUR_TOKEN"
```

***

### Apify CLI commands

#### Log in

```bash
apify login
```

#### Run the actor on Apify

```bash
apify call trakk/deep-email-phone-social-media-scraper-search --input-file input.json
```

You can also call by actor ID:

```bash
apify call BtjVjAQKexpfdq5po --input-file input.json
```

#### Run and print dataset output

```bash
apify call BtjVjAQKexpfdq5po --input-file input.json --output-dataset
```

#### Check latest runs

```bash
apify runs ls BtjVjAQKexpfdq5po --limit 10 --desc
```

#### View one run

```bash
apify runs info RUN_ID
```

#### Download dataset items

```bash
apify datasets get-items DATASET_ID --format json
apify datasets get-items DATASET_ID --format csv
apify datasets get-items DATASET_ID --format xlsx
```

#### Deploy updates to Apify

```bash
apify push --force
```

***

### Local development commands

Install dependencies:

```bash
pip install -r requirements.txt
```

Run locally with Apify storage:

```bash
apify run
```

Run the Python module directly:

```bash
python -m src
```

Run unit tests:

```bash
python -m unittest discover -s tests -v
```

Validate the Apify input schema:

```bash
apify validate-schema .actor/input_schema.json
```

Deploy:

```bash
apify push --force
```

***

### Performance tips

#### Fast and cheap first run

Use:

```json
{
  "maxPagesPerDomain": 3,
  "maxDepth": 1,
  "stopWhenFound": true,
  "maxConcurrency": 10
}
```

#### Better coverage

Use:

```json
{
  "maxPagesPerDomain": 10,
  "maxDepth": 2,
  "stopWhenFound": false,
  "useSitemap": true
}
```

#### Large website lists

Recommended settings:

```json
{
  "maxPagesPerDomain": 3,
  "maxDepth": 1,
  "stopWhenFound": true,
  "compactOutput": true,
  "includeEvidence": false,
  "maxConcurrency": 10
}
```

#### When to use proxies

Apify Proxy is disabled by default so the sample run stays cheap and stable. For most public company websites, direct requests are enough. If a website blocks direct traffic, enable Apify Proxy in `proxyConfiguration`; residential proxy can help for more protected sites.

#### When to raise retries

Raise `maxRetries` and `maxProxyRetries` if some websites randomly fail with connection errors, 429, 5xx, or temporary blocks.

***

### Status values

| Status | Meaning |
|---|---|
| `ok` | The website was crawled and at least one contact, phone, or social profile was found. |
| `no_contacts_found` | Pages were fetched, but no useful contacts were found. |
| `failed` | No pages were crawled successfully. Check `warnings` or `errors`. |

***

### FAQ

**Does this actor use a browser?**

No. It is requests-only. It does not use Playwright, Puppeteer, Selenium, or a headless browser.

**Can it find emails hidden behind JavaScript?**

Sometimes, if the email is present in the HTML, JSON-LD, `mailto`, Cloudflare email protection, or page text. It will not execute JavaScript.

**Does it verify that an email inbox exists?**

No. It extracts and cleans public emails, but it does not perform SMTP verification or deliverability checks.

**Why did a website return no contacts?**

Possible reasons: the site blocks automated requests, contacts are loaded only after JavaScript execution, contacts are behind forms, or the site does not publish direct contacts.

**Can I scrape thousands of websites?**

Yes. Use `datasetId` for large input lists, keep `maxPagesPerDomain` modest, keep `stopWhenFound` enabled, and tune `maxConcurrency` based on stability.

**What is the best setting for normal lead generation?**

Use `summary` output, `compactOutput: true`, `maxPagesPerDomain: 5`, `maxDepth: 1`, `stopWhenFound: true`, and `includeEvidence: false`.

**When should I enable evidence?**

Enable `includeEvidence` when you need to audit where contacts came from, debug results, or show source URLs to a client. Keep it disabled for cleaner CSV exports.

**Can I use the result in Google Sheets, Zapier, Make, or n8n?**

Yes. Apify datasets and webhooks work with all common automation tools.

**Does it scrape LinkedIn?**

No. LinkedIn is intentionally excluded. Use a dedicated LinkedIn actor if you need LinkedIn data.

**Is this legal?**

The actor extracts publicly visible website data. You are responsible for using the data legally and respecting privacy, anti-spam, GDPR, CCPA, CAN-SPAM, and other rules that apply to your use case.

***

### Best practices for clean lead lists

- Start with company homepages or contact pages.
- Keep `compactOutput` enabled.
- Use `summary` mode for CRM exports.
- Use `includeEvidence` only when you need auditability.
- Run a small sample first, then scale.
- For international phone numbers, set `countryHint` to the most common target country.
- For outreach, verify emails with a deliverability tool before sending campaigns.

***

### Tags

`email scraper` | `phone scraper` | `contact scraper` | `social media scraper` | `website contact extractor` | `lead generation` | `b2b leads` | `company enrichment` | `crm enrichment` | `business contacts` | `website scraper` | `email finder` | `phone number finder` | `social profile finder` | `logo extractor` | `brand data` | `sales prospecting` | `marketing outreach` | `Apify actor` | `HTTP scraper` | `no browser scraper`

***

Built for Apify. HTTP-only. Clean contact leads from website lists.

# Actor input Schema

## `startUrls` (type: `array`):

Websites to scan. You can paste full URLs or plain domains like example.com.

## `datasetId` (type: `string`):

Optional dataset with website, url, domain, companyUrl, sourceUrl, or finalUrl fields.

## `maxDomains` (type: `integer`):

Safety cap for how many unique domains/URLs are processed from all inputs.

## `maxPagesPerDomain` (type: `integer`):

Maximum pages to fetch per website. Most contacts are found in the first 3-10 pages.

## `maxDepth` (type: `integer`):

0 = homepage only. 1 = homepage plus linked contact/about pages. 2 = one more layer from those pages.

## `stopWhenFound` (type: `boolean`):

Stops crawling a website once a strong email plus phone or social channels are found. Saves time and cost.

## `extractEmails` (type: `boolean`):

Find normal, mailto, obfuscated, and Cloudflare-protected email addresses.

## `extractPhones` (type: `boolean`):

Find phone numbers from tel links and page text, then normalize valid numbers to E.164 when possible.

## `extractSocials` (type: `boolean`):

Facebook, Instagram, X/Twitter, YouTube, TikTok, Pinterest, Reddit, Snapchat, Discord, Twitch, GitHub, Medium, WhatsApp, Telegram, Yelp, TripAdvisor, app stores, marketplaces.

## `extractBrandAssets` (type: `boolean`):

Find logo, favicon, apple-touch icon, OpenGraph image, Twitter card image, and header logo URLs.

## `extractBusinessData` (type: `boolean`):

Find company name, JSON-LD organization data, addresses, opening hours, VAT/tax IDs, and registration numbers when visible.

## `useSitemap` (type: `boolean`):

Reads sitemap.xml and adds only contact-like URLs such as contact, about, team, support, impressum, legal, offices.

## `followSubdomains` (type: `boolean`):

Allow crawling links on the same root domain, e.g. help.example.com from www.example.com.

## `countryHint` (type: `string`):

Used for phone normalization when a number has no country prefix.

## `outputMode` (type: `string`):

Summary returns one clean lead per website. Pages returns one item per crawled page. Both returns summary and page evidence rows.

## `includeEvidence` (type: `boolean`):

Attach detailed evidence objects with value, source URL, page type, confidence, and context.

## `compactOutput` (type: `boolean`):

Remove empty arrays, empty objects, nulls, and blank strings from dataset items.

## `maxConcurrency` (type: `integer`):

How many websites to process in parallel.

## `requestTimeoutSec` (type: `integer`):

HTTP request timeout in seconds.

## `maxRetries` (type: `integer`):

Retry budget for 408, 425, 429, 5xx, connection, and proxy errors.

## `maxProxyRetries` (type: `integer`):

Dedicated retry budget for proxy/transport failures.

## `proxyConfiguration` (type: `object`):

Apify Proxy configuration. Disabled by default for cheap health checks; enable it if target websites block direct requests.

## `userAgent` (type: `string`):

Optional custom user agent. Leave empty for the built-in Chrome-like headers.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.wolfssl.com/contact/"
    }
  ],
  "maxDomains": 1,
  "maxPagesPerDomain": 5,
  "maxDepth": 1,
  "stopWhenFound": true,
  "extractEmails": true,
  "extractPhones": true,
  "extractSocials": true,
  "extractBrandAssets": true,
  "extractBusinessData": true,
  "useSitemap": true,
  "followSubdomains": false,
  "countryHint": "US",
  "outputMode": "summary",
  "includeEvidence": false,
  "compactOutput": true,
  "maxConcurrency": 10,
  "requestTimeoutSec": 25,
  "maxRetries": 3,
  "maxProxyRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "userAgent": ""
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.wolfssl.com/contact/"
        }
    ],
    "maxDomains": 1,
    "maxPagesPerDomain": 5,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("trakk/deep-email-phone-social-media-scraper-search").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.wolfssl.com/contact/" }],
    "maxDomains": 1,
    "maxPagesPerDomain": 5,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("trakk/deep-email-phone-social-media-scraper-search").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.wolfssl.com/contact/"
    }
  ],
  "maxDomains": 1,
  "maxPagesPerDomain": 5,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call trakk/deep-email-phone-social-media-scraper-search --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=trakk/deep-email-phone-social-media-scraper-search",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Deep Email, Phone & Social Media Scraper",
        "description": "Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.",
        "version": "0.1",
        "x-build-id": "k6nTOAJ0790NhcG6s"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/trakk~deep-email-phone-social-media-scraper-search/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-trakk-deep-email-phone-social-media-scraper-search",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/trakk~deep-email-phone-social-media-scraper-search/runs": {
            "post": {
                "operationId": "runs-sync-trakk-deep-email-phone-social-media-scraper-search",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/trakk~deep-email-phone-social-media-scraper-search/run-sync": {
            "post": {
                "operationId": "run-sync-trakk-deep-email-phone-social-media-scraper-search",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "🌐 Website URL(s) or domain(s)",
                        "type": "array",
                        "description": "Websites to scan. You can paste full URLs or plain domains like example.com.",
                        "default": [
                            {
                                "url": "https://www.wolfssl.com/contact/"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "datasetId": {
                        "title": "📥 Input dataset",
                        "type": "string",
                        "description": "Optional dataset with website, url, domain, companyUrl, sourceUrl, or finalUrl fields."
                    },
                    "maxDomains": {
                        "title": "💯 Number of websites to process",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Safety cap for how many unique domains/URLs are processed from all inputs.",
                        "default": 1
                    },
                    "maxPagesPerDomain": {
                        "title": "📄 Pages per website",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum pages to fetch per website. Most contacts are found in the first 3-10 pages.",
                        "default": 5
                    },
                    "maxDepth": {
                        "title": "🧭 Crawl depth",
                        "minimum": 0,
                        "maximum": 5,
                        "type": "integer",
                        "description": "0 = homepage only. 1 = homepage plus linked contact/about pages. 2 = one more layer from those pages.",
                        "default": 1
                    },
                    "stopWhenFound": {
                        "title": "✅ Stop early when enough contacts are found",
                        "type": "boolean",
                        "description": "Stops crawling a website once a strong email plus phone or social channels are found. Saves time and cost.",
                        "default": true
                    },
                    "extractEmails": {
                        "title": "📧 Extract emails",
                        "type": "boolean",
                        "description": "Find normal, mailto, obfuscated, and Cloudflare-protected email addresses.",
                        "default": true
                    },
                    "extractPhones": {
                        "title": "📞 Extract phones",
                        "type": "boolean",
                        "description": "Find phone numbers from tel links and page text, then normalize valid numbers to E.164 when possible.",
                        "default": true
                    },
                    "extractSocials": {
                        "title": "📣 Extract social and messaging channels",
                        "type": "boolean",
                        "description": "Facebook, Instagram, X/Twitter, YouTube, TikTok, Pinterest, Reddit, Snapchat, Discord, Twitch, GitHub, Medium, WhatsApp, Telegram, Yelp, TripAdvisor, app stores, marketplaces.",
                        "default": true
                    },
                    "extractBrandAssets": {
                        "title": "🖼️ Extract logo and brand images",
                        "type": "boolean",
                        "description": "Find logo, favicon, apple-touch icon, OpenGraph image, Twitter card image, and header logo URLs.",
                        "default": true
                    },
                    "extractBusinessData": {
                        "title": "🏢 Extract business/legal hints",
                        "type": "boolean",
                        "description": "Find company name, JSON-LD organization data, addresses, opening hours, VAT/tax IDs, and registration numbers when visible.",
                        "default": true
                    },
                    "useSitemap": {
                        "title": "🗺️ Use sitemap for contact discovery",
                        "type": "boolean",
                        "description": "Reads sitemap.xml and adds only contact-like URLs such as contact, about, team, support, impressum, legal, offices.",
                        "default": true
                    },
                    "followSubdomains": {
                        "title": "Follow same-root subdomains",
                        "type": "boolean",
                        "description": "Allow crawling links on the same root domain, e.g. help.example.com from www.example.com.",
                        "default": false
                    },
                    "countryHint": {
                        "title": "🌍 Default phone country",
                        "enum": [
                            "US",
                            "GB",
                            "DE",
                            "FR",
                            "ES",
                            "IT",
                            "NL",
                            "PL",
                            "IN",
                            "CA",
                            "AU",
                            "BR",
                            "MX"
                        ],
                        "type": "string",
                        "description": "Used for phone normalization when a number has no country prefix.",
                        "default": "US"
                    },
                    "outputMode": {
                        "title": "📦 Output mode",
                        "enum": [
                            "summary",
                            "pages",
                            "both"
                        ],
                        "type": "string",
                        "description": "Summary returns one clean lead per website. Pages returns one item per crawled page. Both returns summary and page evidence rows.",
                        "default": "summary"
                    },
                    "includeEvidence": {
                        "title": "🧾 Include contact evidence",
                        "type": "boolean",
                        "description": "Attach detailed evidence objects with value, source URL, page type, confidence, and context.",
                        "default": false
                    },
                    "compactOutput": {
                        "title": "🧹 Compact output",
                        "type": "boolean",
                        "description": "Remove empty arrays, empty objects, nulls, and blank strings from dataset items.",
                        "default": true
                    },
                    "maxConcurrency": {
                        "title": "⚙️ Max concurrency",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "How many websites to process in parallel.",
                        "default": 10
                    },
                    "requestTimeoutSec": {
                        "title": "Request timeout",
                        "minimum": 3,
                        "maximum": 120,
                        "type": "integer",
                        "description": "HTTP request timeout in seconds.",
                        "default": 25
                    },
                    "maxRetries": {
                        "title": "Max retries per page",
                        "minimum": 0,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Retry budget for 408, 425, 429, 5xx, connection, and proxy errors.",
                        "default": 3
                    },
                    "maxProxyRetries": {
                        "title": "Max proxy retries per page",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Dedicated retry budget for proxy/transport failures.",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify Proxy configuration. Disabled by default for cheap health checks; enable it if target websites block direct requests.",
                        "default": {
                            "useApifyProxy": false
                        }
                    },
                    "userAgent": {
                        "title": "User agent",
                        "type": "string",
                        "description": "Optional custom user agent. Leave empty for the built-in Chrome-like headers.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
