Pricing

from $0.60 / 1,000 website contact leads

Deep Email, Phone & Social Media Scraper

Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.

Pricing

from $0.60 / 1,000 website contact leads

Rating

0.0

(0)

Developer

Blynx

Actor stats

Bookmarked

Total users

Monthly active users

17 hours ago

Last modified

What this actor extracts

Contacts

Emails from mailto: links, visible text, HTML, Cloudflare protected emails, JSON-LD, and common [at] / dot obfuscation
Phone numbers from tel: links, page text, and JSON-LD
E.164 phone normalization when the number is valid
Best email, best phone, confidence score, and source page

Facebook
Instagram
X / Twitter
YouTube
TikTok
Pinterest
Reddit
Snapchat
Discord
Twitch
GitHub
Medium
WhatsApp
Telegram
Yelp
Tripadvisor
App Store
Google Play
Amazon, Etsy, and eBay store/profile links

LinkedIn, Trustpilot, Google Maps, and Threads are intentionally excluded to keep the output focused and avoid duplicating separate scrapers.

Business and brand data

Company name
Legal name
Website title and meta description
Addresses from structured data when present
Opening hours from structured data when present
VAT / tax IDs when visible
Registration numbers when visible
Best logo URL
Favicon
Apple touch icon
OpenGraph image
Twitter card image
Source evidence for logo/contact extraction when enabled

Common use cases

Lead generation from company website lists
CRM enrichment
Agency prospecting
B2B sales research
Directory enrichment
Supplier and vendor research
Startup, SaaS, ecommerce, and local business contact collection
Marketing outreach preparation
Checking whether company websites publish contact details
Building internal contact intelligence datasets

Quick start in Apify Console

Open the actor in Apify Console.
Paste websites into Website URL(s) or domain(s).
For the first test, set Number of websites to process to 10.
Keep Pages per website between 3 and 10.
Keep Stop early when enough contacts are found enabled to save cost.
Run the actor.
Open the default dataset and export results as JSON, CSV, Excel, XML, RSS, or HTML.

You can paste full URLs or plain domains. These are all valid:

example.com
https://example.com
https://www.example.com/contact

The actor normalizes plain domains to HTTPS automatically.

Recommended input examples

Simple website list

Use this for normal lead enrichment.

{
  "startUrls": [
    { "url": "https://www.cloudflare.com/resource/contact-enterprise-sales/" },
    { "url": "https://www.bluehost.com/contact" },
    { "url": "https://www.wolfssl.com/contact/" }
  ],
  "maxDomains": 10,
  "maxPagesPerDomain": 5,
  "maxDepth": 1,
  "stopWhenFound": true,
  "outputMode": "summary",
  "compactOutput": true
}

Deeper contact discovery

Use this when websites are messy and contacts may be on support, about, legal, team, or office pages.

{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "maxDomains": 1,
  "maxPagesPerDomain": 15,
  "maxDepth": 2,
  "useSitemap": true,
  "stopWhenFound": false,
  "includeEvidence": true,
  "outputMode": "summary"
}

Import websites from another dataset

Use this when a previous actor produced a dataset with website URLs.

{
  "datasetId": "YOUR_DATASET_ID",
  "maxDomains": 1000,
  "maxPagesPerDomain": 5,
  "outputMode": "summary"
}

The actor reads these fields from input dataset items:

website, url, domain, companyUrl, sourceUrl, finalUrl, startUrl

Evidence and audit mode

Use this when you want to see where each email, phone, social profile, or logo came from.

{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "includeEvidence": true,
  "outputMode": "summary",
  "compactOutput": true
}

Page-level debugging mode

Use this when you want one dataset row per crawled page.

{
  "startUrls": [
    { "url": "https://example.com" }
  ],
  "outputMode": "pages",
  "includeEvidence": true
}

Input fields

Field	Type	Default	Description
`startUrls`	array	sample URLs	Websites or pages to scan. Use full URLs or plain domains.
`datasetId`	string	empty	Optional Apify dataset containing website/domain fields.
`maxDomains`	integer	`1`	Safety cap for unique websites processed from all inputs. Raise it for real batches.
`maxPagesPerDomain`	integer	`5`	Maximum pages fetched per website. Start with `3-10`.
`maxDepth`	integer	`1`	Link depth. `0` = start page only, `1` = linked contact/about pages, `2` = one more layer.
`stopWhenFound`	boolean	`true`	Stops early when a strong email plus phone or socials are found. Saves time and cost.
`extractEmails`	boolean	`true`	Extract normal, obfuscated, `mailto`, JSON-LD, and Cloudflare protected emails.
`extractPhones`	boolean	`true`	Extract and validate phone numbers.
`extractSocials`	boolean	`true`	Extract social, messaging, marketplace, and app profile links.
`extractBrandAssets`	boolean	`true`	Extract logo, favicon, OpenGraph image, and related brand images.
`extractBusinessData`	boolean	`true`	Extract company name, legal name, addresses, opening hours, tax IDs, registration numbers.
`useSitemap`	boolean	`true`	Reads `sitemap.xml` and adds contact-like URLs.
`followSubdomains`	boolean	`false`	Allows crawling same-root subdomains, for example `help.example.com`.
`countryHint`	string	`US`	Default phone country for numbers without country prefix.
`outputMode`	string	`summary`	`summary`, `pages`, or `both`.
`includeEvidence`	boolean	`false`	Adds detailed source evidence objects.
`compactOutput`	boolean	`true`	Removes empty arrays, empty objects, nulls, and blank strings.
`maxConcurrency`	integer	`10`	Number of websites processed in parallel.
`requestTimeoutSec`	integer	`25`	HTTP request timeout per page.
`maxRetries`	integer	`3`	Retry budget for HTTP and connection errors.
`maxProxyRetries`	integer	`3`	Extra retry budget for proxy/transport failures.
`proxyConfiguration`	object	Apify Proxy off	Apify Proxy settings. Keep it off for cheap tests; enable it if target sites block direct requests.
`userAgent`	string	empty	Optional custom user agent. Leave empty for built-in Chrome-like headers.

Accepted countryHint values:

US, GB, DE, FR, ES, IT, NL, PL, IN, CA, AU, BR, MX

The API also accepts URL aliases for convenience:

websites, start_urls, urls, domains

Output modes

`summary`

Default mode. Returns one clean lead row per website. Best for exports, CRM import, enrichment, and normal use.

`pages`

Returns one row per crawled page. Best for debugging, QA, and checking which pages had contacts.

`both`

Returns summary rows plus page rows. Use only when you need both lead records and page-level evidence in the same dataset.

Main output fields

Website identity

Field	Description
`recordType`	Usually `domain` in summary mode.
`domain`	Final website host.
`rootDomain`	Root domain used for matching.
`startUrl`	Original normalized input URL.
`finalUrl`	Final URL after redirects.
`status`	`ok`, `no_contacts_found`, or `failed`.
`statusCode`	HTTP status code of the first successful page.
`pageTitle`	First useful page title.
`metaDescription`	First useful meta description.
`language`	HTML language attribute when present.
`countryHint`	Phone country hint used by the run.
`pagesCrawled`	Number of pages fetched for the website.
`pagesMatched`	Number of pages where contacts or socials were found.
`crawlDepthReached`	Deepest crawl depth reached.

Emails

Field	Description
`bestEmail`	Best ranked email for outreach.
`bestEmailType`	`sales`, `support`, `info`, `press`, `jobs`, `privacy`, `billing`, `personal`, or `unknown`.
`bestEmailConfidence`	`high`, `medium`, or `low`.
`emails`	Unique email list.
`emailDetails`	Detailed email evidence when `includeEvidence` is enabled.

Phones

Field	Description
`bestPhone`	Best ranked display phone.
`bestPhoneE164`	Normalized E.164 phone when possible.
`bestPhoneConfidence`	`high`, `medium`, or `low`.
`phones`	Unique valid phone list.
`phoneDetails`	Detailed phone evidence when `includeEvidence` is enabled.

Field	Description
`socialProfiles`	Unified list of `{ platform, url }` records.
`facebookUrls`	Facebook pages/profiles.
`instagramUrls`	Instagram profiles.
`twitterUrls`	X / Twitter profiles.
`youtubeUrls`	YouTube channels or handles.
`tiktokUrls`	TikTok profiles.
`whatsappUrls`	WhatsApp links.
`telegramUrls`	Telegram links.
`githubUrls`	GitHub organization/user profiles.
`mediumUrls`	Medium profiles.
Other `*Urls` fields	Additional supported social, app, or marketplace links.

Brand and company data

Field	Description
`companyName`	Company name from JSON-LD or site metadata.
`legalName`	Legal name when structured data provides it.
`bestLogoUrl`	Best logo/image candidate.
`logoUrl`	Same as `bestLogoUrl`, for convenient exports.
`logoSource`	`jsonLd`, `headerLogo`, `appleTouchIcon`, `openGraph`, `twitterCard`, or `favicon`.
`logoConfidence`	Confidence of selected logo.
`faviconUrl`	Favicon URL.
`appleTouchIconUrl`	Apple touch icon URL.
`openGraphImageUrl`	OpenGraph image URL.
`twitterImageUrl`	Twitter card image URL.
`brandImages`	List of likely brand images.
`addresses`	Structured addresses when present.
`openingHours`	Structured opening hours when present.
`taxIds`	Visible VAT/tax IDs when detected.
`vatIds`	Alias of `taxIds`.
`registrationNumbers`	Visible company registration numbers when detected.

Evidence fields

These appear when includeEvidence is enabled:

Field	Description
`bestContactPage`	Best page where useful contact data was found.
`sourcePages`	Crawled pages with counts of found emails, phones, and socials.
`contactEvidence`	Full contact evidence with value, source URL, page type, confidence, and context.
`imageEvidence`	Logo and image extraction evidence.
`warnings`	Non-fatal fetch or parsing warnings.
`errors`	Fatal run errors for that website.

Example output

{
  "recordType": "domain",
  "domain": "wolfssl.com",
  "rootDomain": "wolfssl.com",
  "startUrl": "https://www.wolfssl.com/contact/",
  "finalUrl": "https://www.wolfssl.com/contact/",
  "status": "ok",
  "statusCode": 200,
  "companyName": "wolfSSL",
  "pagesCrawled": 5,
  "pagesMatched": 5,
  "bestEmail": "support@wolfssl.com",
  "bestEmailType": "support",
  "bestEmailConfidence": "high",
  "bestPhone": "+1 (425) 245-8247",
  "bestPhoneE164": "+14252458247",
  "bestPhoneConfidence": "high",
  "emails": [
    "support@wolfssl.com",
    "facts@wolfssl.com",
    "licensing@wolfssl.com"
  ],
  "phones": [
    "+1 (425) 245-8247"
  ],
  "socialProfiles": [
    { "platform": "X / Twitter", "url": "https://twitter.com/wolfssl" },
    { "platform": "Facebook", "url": "https://www.facebook.com/wolfssl" },
    { "platform": "GitHub", "url": "https://www.github.com/wolfssl" }
  ],
  "bestLogoUrl": "https://www.wolfssl.com/wordpress/wp-content/uploads/2020/12/cropped-wolfssl_logo_300px.png"
}

Contact discovery logic

The actor does not crawl the whole website blindly. It prioritizes URLs that usually contain contacts:

/contact
/contacts
/about
/team
/staff
/support
/help
/sales
/press
/media
/locations
/offices
/impressum
/imprint
/legal
/privacy

When useSitemap is enabled, it checks sitemap.xml and adds only contact-like URLs from the sitemap. This helps find contact pages that are not linked from the homepage.

How ranking works

The actor returns all unique contacts, but it also chooses bestEmail and bestPhone.

Email ranking prefers:

Same-domain business emails
High-confidence sources like mailto, contact pages, support pages, and JSON-LD
Useful role emails like sales, info, support, and press
Lower-confidence text matches only when they look legitimate

Phone ranking prefers:

Valid phone numbers
Numbers matching the selected countryHint
Numbers found on contact, support, sales, or legal pages
Numbers that can be normalized to E.164

Clean output and filtering

The actor filters common junk before writing results:

Fake emails like test@example.com
Asset-like strings that look like emails but are actually images or scripts
Sentry, schema, Wix, and placeholder domains
Personal/free-mail noise when it appears as unrelated text on a different company site
Social share/login URLs
LinkedIn, Trustpilot, Google Maps, and Threads links
Empty arrays, empty objects, nulls, and blank strings when compactOutput is enabled

This keeps Apify's All fields view clean and makes CSV/Excel exports easier to use.

How to run with Apify API

Replace YOUR_TOKEN with your Apify API token.

curl -X POST "https://api.apify.com/v2/acts/trakk~deep-email-phone-social-media-scraper-search/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": [
      { "url": "https://www.wolfssl.com/contact/" },
      { "url": "https://www.bluehost.com/contact" }
    ],
    "maxDomains": 2,
    "maxPagesPerDomain": 5,
    "outputMode": "summary"
  }'

To get dataset items after the run finishes:

$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&format=json&token=YOUR_TOKEN"

Apify CLI commands

Log in

$apify login

Run the actor on Apify

$apify call trakk/deep-email-phone-social-media-scraper-search --input-file input.json

You can also call by actor ID:

$apify call BtjVjAQKexpfdq5po --input-file input.json

Run and print dataset output

$apify call BtjVjAQKexpfdq5po --input-file input.json --output-dataset

Check latest runs

$apify runs ls BtjVjAQKexpfdq5po --limit 10 --desc

View one run

$apify runs info RUN_ID

Download dataset items

apify datasets get-items DATASET_ID --format json
apify datasets get-items DATASET_ID --format csv
apify datasets get-items DATASET_ID --format xlsx

Deploy updates to Apify

$apify push --force

Local development commands

Install dependencies:

$pip install -r requirements.txt

Run locally with Apify storage:

$apify run

Run the Python module directly:

$python -m src

Run unit tests:

$python -m unittest discover -s tests -v

Validate the Apify input schema:

$apify validate-schema .actor/input_schema.json

Deploy:

$apify push --force

Performance tips

Fast and cheap first run

Use:

{
  "maxPagesPerDomain": 3,
  "maxDepth": 1,
  "stopWhenFound": true,
  "maxConcurrency": 10
}

Better coverage

Use:

{
  "maxPagesPerDomain": 10,
  "maxDepth": 2,
  "stopWhenFound": false,
  "useSitemap": true
}

Large website lists

Recommended settings:

{
  "maxPagesPerDomain": 3,
  "maxDepth": 1,
  "stopWhenFound": true,
  "compactOutput": true,
  "includeEvidence": false,
  "maxConcurrency": 10
}

When to use proxies

Apify Proxy is disabled by default so the sample run stays cheap and stable. For most public company websites, direct requests are enough. If a website blocks direct traffic, enable Apify Proxy in proxyConfiguration; residential proxy can help for more protected sites.

When to raise retries

Raise maxRetries and maxProxyRetries if some websites randomly fail with connection errors, 429, 5xx, or temporary blocks.

Status values

Status	Meaning
`ok`	The website was crawled and at least one contact, phone, or social profile was found.
`no_contacts_found`	Pages were fetched, but no useful contacts were found.
`failed`	No pages were crawled successfully. Check `warnings` or `errors`.

FAQ

Does this actor use a browser?

No. It is requests-only. It does not use Playwright, Puppeteer, Selenium, or a headless browser.

Can it find emails hidden behind JavaScript?

Sometimes, if the email is present in the HTML, JSON-LD, mailto, Cloudflare email protection, or page text. It will not execute JavaScript.

Does it verify that an email inbox exists?

No. It extracts and cleans public emails, but it does not perform SMTP verification or deliverability checks.

Why did a website return no contacts?

Possible reasons: the site blocks automated requests, contacts are loaded only after JavaScript execution, contacts are behind forms, or the site does not publish direct contacts.

Can I scrape thousands of websites?

Yes. Use datasetId for large input lists, keep maxPagesPerDomain modest, keep stopWhenFound enabled, and tune maxConcurrency based on stability.

What is the best setting for normal lead generation?

Use summary output, compactOutput: true, maxPagesPerDomain: 5, maxDepth: 1, stopWhenFound: true, and includeEvidence: false.

When should I enable evidence?

Enable includeEvidence when you need to audit where contacts came from, debug results, or show source URLs to a client. Keep it disabled for cleaner CSV exports.

Can I use the result in Google Sheets, Zapier, Make, or n8n?

Yes. Apify datasets and webhooks work with all common automation tools.

Does it scrape LinkedIn?

No. LinkedIn is intentionally excluded. Use a dedicated LinkedIn actor if you need LinkedIn data.

Is this legal?

The actor extracts publicly visible website data. You are responsible for using the data legally and respecting privacy, anti-spam, GDPR, CCPA, CAN-SPAM, and other rules that apply to your use case.

Best practices for clean lead lists

Start with company homepages or contact pages.
Keep compactOutput enabled.
Use summary mode for CRM exports.
Use includeEvidence only when you need auditability.
Run a small sample first, then scale.
For international phone numbers, set countryHint to the most common target country.
For outreach, verify emails with a deliverability tool before sending campaigns.

Feature requests and issues

Need a new field, a different output format, or a workflow this actor does not cover yet? Found a bug or a website response that does not parse correctly? Open an issue on the actor page in Apify Console and include the run ID, your input, what you expected, and a short example of the data you need. Clear reports help prioritize fixes and new features faster.