Deep Email, Phone & Social Media Scraper avatar

Deep Email, Phone & Social Media Scraper

Pricing

from $1.80 / 1,000 website contact leads

Go to Apify Store
Deep Email, Phone & Social Media Scraper

Deep Email, Phone & Social Media Scraper

Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.

Pricing

from $1.80 / 1,000 website contact leads

Rating

0.0

(0)

Developer

Blynx

Blynx

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

6 days ago

Last modified

Share

Find public business contacts from any list of websites. This actor crawls company sites with HTTP requests only, discovers likely contact pages, extracts emails, phone numbers, social profiles, logos, and business metadata, then returns clean lead records ready for CRM, outreach, enrichment, or research.

No browser. No Playwright. No Puppeteer. Just fast HTTP crawling, Chrome-like requests, smart contact-page discovery, and clean Apify dataset output.

Use it when you have a list of company websites and need answers like:

  • What is the best public email for this company?
  • Does the website publish phone numbers?
  • Which social and messaging channels are linked from the site?
  • What is the company name, logo, address, or legal metadata?
  • Which page was the contact found on?

What this actor extracts

Contacts

  • Emails from mailto: links, visible text, HTML, Cloudflare protected emails, JSON-LD, and common [at] / dot obfuscation
  • Phone numbers from tel: links, page text, and JSON-LD
  • E.164 phone normalization when the number is valid
  • Best email, best phone, confidence score, and source page

Social and messaging channels

  • Facebook
  • Instagram
  • X / Twitter
  • YouTube
  • TikTok
  • Pinterest
  • Reddit
  • Snapchat
  • Discord
  • Twitch
  • GitHub
  • Medium
  • WhatsApp
  • Telegram
  • Yelp
  • Tripadvisor
  • App Store
  • Google Play
  • Amazon, Etsy, and eBay store/profile links

LinkedIn, Trustpilot, Google Maps, and Threads are intentionally excluded to keep the output focused and avoid duplicating separate scrapers.

Business and brand data

  • Company name
  • Legal name
  • Website title and meta description
  • Addresses from structured data when present
  • Opening hours from structured data when present
  • VAT / tax IDs when visible
  • Registration numbers when visible
  • Best logo URL
  • Favicon
  • Apple touch icon
  • OpenGraph image
  • Twitter card image
  • Source evidence for logo/contact extraction when enabled

Common use cases

  • Lead generation from company website lists
  • CRM enrichment
  • Agency prospecting
  • B2B sales research
  • Directory enrichment
  • Supplier and vendor research
  • Startup, SaaS, ecommerce, and local business contact collection
  • Marketing outreach preparation
  • Checking whether company websites publish contact details
  • Building internal contact intelligence datasets

Quick start in Apify Console

  1. Open the actor in Apify Console.
  2. Paste websites into Website URL(s) or domain(s).
  3. For the first test, set Number of websites to process to 10.
  4. Keep Pages per website between 3 and 10.
  5. Keep Stop early when enough contacts are found enabled to save cost.
  6. Run the actor.
  7. Open the default dataset and export results as JSON, CSV, Excel, XML, RSS, or HTML.

You can paste full URLs or plain domains. These are all valid:

example.com
https://example.com
https://www.example.com/contact

The actor normalizes plain domains to HTTPS automatically.


Simple website list

Use this for normal lead enrichment.

{
"startUrls": [
{ "url": "https://www.cloudflare.com/resource/contact-enterprise-sales/" },
{ "url": "https://www.bluehost.com/contact" },
{ "url": "https://www.wolfssl.com/contact/" }
],
"maxDomains": 10,
"maxPagesPerDomain": 5,
"maxDepth": 1,
"stopWhenFound": true,
"outputMode": "summary",
"compactOutput": true
}

Deeper contact discovery

Use this when websites are messy and contacts may be on support, about, legal, team, or office pages.

{
"startUrls": [
{ "url": "https://example.com" }
],
"maxDomains": 1,
"maxPagesPerDomain": 15,
"maxDepth": 2,
"useSitemap": true,
"stopWhenFound": false,
"includeEvidence": true,
"outputMode": "summary"
}

Import websites from another dataset

Use this when a previous actor produced a dataset with website URLs.

{
"datasetId": "YOUR_DATASET_ID",
"maxDomains": 1000,
"maxPagesPerDomain": 5,
"outputMode": "summary"
}

The actor reads these fields from input dataset items:

website, url, domain, companyUrl, sourceUrl, finalUrl, startUrl

Evidence and audit mode

Use this when you want to see where each email, phone, social profile, or logo came from.

{
"startUrls": [
{ "url": "https://example.com" }
],
"includeEvidence": true,
"outputMode": "summary",
"compactOutput": true
}

Page-level debugging mode

Use this when you want one dataset row per crawled page.

{
"startUrls": [
{ "url": "https://example.com" }
],
"outputMode": "pages",
"includeEvidence": true
}

Input fields

FieldTypeDefaultDescription
startUrlsarraysample URLsWebsites or pages to scan. Use full URLs or plain domains.
datasetIdstringemptyOptional Apify dataset containing website/domain fields.
maxDomainsinteger1Safety cap for unique websites processed from all inputs. Raise it for real batches.
maxPagesPerDomaininteger5Maximum pages fetched per website. Start with 3-10.
maxDepthinteger1Link depth. 0 = start page only, 1 = linked contact/about pages, 2 = one more layer.
stopWhenFoundbooleantrueStops early when a strong email plus phone or socials are found. Saves time and cost.
extractEmailsbooleantrueExtract normal, obfuscated, mailto, JSON-LD, and Cloudflare protected emails.
extractPhonesbooleantrueExtract and validate phone numbers.
extractSocialsbooleantrueExtract social, messaging, marketplace, and app profile links.
extractBrandAssetsbooleantrueExtract logo, favicon, OpenGraph image, and related brand images.
extractBusinessDatabooleantrueExtract company name, legal name, addresses, opening hours, tax IDs, registration numbers.
useSitemapbooleantrueReads sitemap.xml and adds contact-like URLs.
followSubdomainsbooleanfalseAllows crawling same-root subdomains, for example help.example.com.
countryHintstringUSDefault phone country for numbers without country prefix.
outputModestringsummarysummary, pages, or both.
includeEvidencebooleanfalseAdds detailed source evidence objects.
compactOutputbooleantrueRemoves empty arrays, empty objects, nulls, and blank strings.
maxConcurrencyinteger10Number of websites processed in parallel.
requestTimeoutSecinteger25HTTP request timeout per page.
maxRetriesinteger3Retry budget for HTTP and connection errors.
maxProxyRetriesinteger3Extra retry budget for proxy/transport failures.
proxyConfigurationobjectApify Proxy offApify Proxy settings. Keep it off for cheap tests; enable it if target sites block direct requests.
userAgentstringemptyOptional custom user agent. Leave empty for built-in Chrome-like headers.

Accepted countryHint values:

US, GB, DE, FR, ES, IT, NL, PL, IN, CA, AU, BR, MX

The API also accepts URL aliases for convenience:

websites, start_urls, urls, domains

Output modes

summary

Default mode. Returns one clean lead row per website. Best for exports, CRM import, enrichment, and normal use.

pages

Returns one row per crawled page. Best for debugging, QA, and checking which pages had contacts.

both

Returns summary rows plus page rows. Use only when you need both lead records and page-level evidence in the same dataset.


Main output fields

Website identity

FieldDescription
recordTypeUsually domain in summary mode.
domainFinal website host.
rootDomainRoot domain used for matching.
startUrlOriginal normalized input URL.
finalUrlFinal URL after redirects.
statusok, no_contacts_found, or failed.
statusCodeHTTP status code of the first successful page.
pageTitleFirst useful page title.
metaDescriptionFirst useful meta description.
languageHTML language attribute when present.
countryHintPhone country hint used by the run.
pagesCrawledNumber of pages fetched for the website.
pagesMatchedNumber of pages where contacts or socials were found.
crawlDepthReachedDeepest crawl depth reached.

Emails

FieldDescription
bestEmailBest ranked email for outreach.
bestEmailTypesales, support, info, press, jobs, privacy, billing, personal, or unknown.
bestEmailConfidencehigh, medium, or low.
emailsUnique email list.
emailDetailsDetailed email evidence when includeEvidence is enabled.

Phones

FieldDescription
bestPhoneBest ranked display phone.
bestPhoneE164Normalized E.164 phone when possible.
bestPhoneConfidencehigh, medium, or low.
phonesUnique valid phone list.
phoneDetailsDetailed phone evidence when includeEvidence is enabled.

Social profiles

FieldDescription
socialProfilesUnified list of { platform, url } records.
facebookUrlsFacebook pages/profiles.
instagramUrlsInstagram profiles.
twitterUrlsX / Twitter profiles.
youtubeUrlsYouTube channels or handles.
tiktokUrlsTikTok profiles.
whatsappUrlsWhatsApp links.
telegramUrlsTelegram links.
githubUrlsGitHub organization/user profiles.
mediumUrlsMedium profiles.
Other *Urls fieldsAdditional supported social, app, or marketplace links.

Brand and company data

FieldDescription
companyNameCompany name from JSON-LD or site metadata.
legalNameLegal name when structured data provides it.
bestLogoUrlBest logo/image candidate.
logoUrlSame as bestLogoUrl, for convenient exports.
logoSourcejsonLd, headerLogo, appleTouchIcon, openGraph, twitterCard, or favicon.
logoConfidenceConfidence of selected logo.
faviconUrlFavicon URL.
appleTouchIconUrlApple touch icon URL.
openGraphImageUrlOpenGraph image URL.
twitterImageUrlTwitter card image URL.
brandImagesList of likely brand images.
addressesStructured addresses when present.
openingHoursStructured opening hours when present.
taxIdsVisible VAT/tax IDs when detected.
vatIdsAlias of taxIds.
registrationNumbersVisible company registration numbers when detected.

Evidence fields

These appear when includeEvidence is enabled:

FieldDescription
bestContactPageBest page where useful contact data was found.
sourcePagesCrawled pages with counts of found emails, phones, and socials.
contactEvidenceFull contact evidence with value, source URL, page type, confidence, and context.
imageEvidenceLogo and image extraction evidence.
warningsNon-fatal fetch or parsing warnings.
errorsFatal run errors for that website.

Example output

{
"recordType": "domain",
"domain": "wolfssl.com",
"rootDomain": "wolfssl.com",
"startUrl": "https://www.wolfssl.com/contact/",
"finalUrl": "https://www.wolfssl.com/contact/",
"status": "ok",
"statusCode": 200,
"companyName": "wolfSSL",
"pagesCrawled": 5,
"pagesMatched": 5,
"bestEmail": "support@wolfssl.com",
"bestEmailType": "support",
"bestEmailConfidence": "high",
"bestPhone": "+1 (425) 245-8247",
"bestPhoneE164": "+14252458247",
"bestPhoneConfidence": "high",
"emails": [
"support@wolfssl.com",
"facts@wolfssl.com",
"licensing@wolfssl.com"
],
"phones": [
"+1 (425) 245-8247"
],
"socialProfiles": [
{ "platform": "X / Twitter", "url": "https://twitter.com/wolfssl" },
{ "platform": "Facebook", "url": "https://www.facebook.com/wolfssl" },
{ "platform": "GitHub", "url": "https://www.github.com/wolfssl" }
],
"bestLogoUrl": "https://www.wolfssl.com/wordpress/wp-content/uploads/2020/12/cropped-wolfssl_logo_300px.png"
}

Contact discovery logic

The actor does not crawl the whole website blindly. It prioritizes URLs that usually contain contacts:

/contact
/contacts
/about
/team
/staff
/support
/help
/sales
/press
/media
/locations
/offices
/impressum
/imprint
/legal
/privacy

When useSitemap is enabled, it checks sitemap.xml and adds only contact-like URLs from the sitemap. This helps find contact pages that are not linked from the homepage.


How ranking works

The actor returns all unique contacts, but it also chooses bestEmail and bestPhone.

Email ranking prefers:

  1. Same-domain business emails
  2. High-confidence sources like mailto, contact pages, support pages, and JSON-LD
  3. Useful role emails like sales, info, support, and press
  4. Lower-confidence text matches only when they look legitimate

Phone ranking prefers:

  1. Valid phone numbers
  2. Numbers matching the selected countryHint
  3. Numbers found on contact, support, sales, or legal pages
  4. Numbers that can be normalized to E.164

Clean output and filtering

The actor filters common junk before writing results:

  • Fake emails like test@example.com
  • Asset-like strings that look like emails but are actually images or scripts
  • Sentry, schema, Wix, and placeholder domains
  • Personal/free-mail noise when it appears as unrelated text on a different company site
  • Social share/login URLs
  • LinkedIn, Trustpilot, Google Maps, and Threads links
  • Empty arrays, empty objects, nulls, and blank strings when compactOutput is enabled

This keeps Apify's All fields view clean and makes CSV/Excel exports easier to use.


How to run with Apify API

Replace YOUR_TOKEN with your Apify API token.

curl -X POST "https://api.apify.com/v2/acts/trakk~deep-email-phone-social-media-scraper-search/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"startUrls": [
{ "url": "https://www.wolfssl.com/contact/" },
{ "url": "https://www.bluehost.com/contact" }
],
"maxDomains": 2,
"maxPagesPerDomain": 5,
"outputMode": "summary"
}'

To get dataset items after the run finishes:

$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&format=json&token=YOUR_TOKEN"

Apify CLI commands

Log in

$apify login

Run the actor on Apify

$apify call trakk/deep-email-phone-social-media-scraper-search --input-file input.json

You can also call by actor ID:

$apify call BtjVjAQKexpfdq5po --input-file input.json

Run and print dataset output

$apify call BtjVjAQKexpfdq5po --input-file input.json --output-dataset

Check latest runs

$apify runs ls BtjVjAQKexpfdq5po --limit 10 --desc

View one run

$apify runs info RUN_ID

Download dataset items

apify datasets get-items DATASET_ID --format json
apify datasets get-items DATASET_ID --format csv
apify datasets get-items DATASET_ID --format xlsx

Deploy updates to Apify

$apify push --force

Local development commands

Install dependencies:

$pip install -r requirements.txt

Run locally with Apify storage:

$apify run

Run the Python module directly:

$python -m src

Run unit tests:

$python -m unittest discover -s tests -v

Validate the Apify input schema:

$apify validate-schema .actor/input_schema.json

Deploy:

$apify push --force

Performance tips

Fast and cheap first run

Use:

{
"maxPagesPerDomain": 3,
"maxDepth": 1,
"stopWhenFound": true,
"maxConcurrency": 10
}

Better coverage

Use:

{
"maxPagesPerDomain": 10,
"maxDepth": 2,
"stopWhenFound": false,
"useSitemap": true
}

Large website lists

Recommended settings:

{
"maxPagesPerDomain": 3,
"maxDepth": 1,
"stopWhenFound": true,
"compactOutput": true,
"includeEvidence": false,
"maxConcurrency": 10
}

When to use proxies

Apify Proxy is disabled by default so the sample run stays cheap and stable. For most public company websites, direct requests are enough. If a website blocks direct traffic, enable Apify Proxy in proxyConfiguration; residential proxy can help for more protected sites.

When to raise retries

Raise maxRetries and maxProxyRetries if some websites randomly fail with connection errors, 429, 5xx, or temporary blocks.


Status values

StatusMeaning
okThe website was crawled and at least one contact, phone, or social profile was found.
no_contacts_foundPages were fetched, but no useful contacts were found.
failedNo pages were crawled successfully. Check warnings or errors.

FAQ

Does this actor use a browser?

No. It is requests-only. It does not use Playwright, Puppeteer, Selenium, or a headless browser.

Can it find emails hidden behind JavaScript?

Sometimes, if the email is present in the HTML, JSON-LD, mailto, Cloudflare email protection, or page text. It will not execute JavaScript.

Does it verify that an email inbox exists?

No. It extracts and cleans public emails, but it does not perform SMTP verification or deliverability checks.

Why did a website return no contacts?

Possible reasons: the site blocks automated requests, contacts are loaded only after JavaScript execution, contacts are behind forms, or the site does not publish direct contacts.

Can I scrape thousands of websites?

Yes. Use datasetId for large input lists, keep maxPagesPerDomain modest, keep stopWhenFound enabled, and tune maxConcurrency based on stability.

What is the best setting for normal lead generation?

Use summary output, compactOutput: true, maxPagesPerDomain: 5, maxDepth: 1, stopWhenFound: true, and includeEvidence: false.

When should I enable evidence?

Enable includeEvidence when you need to audit where contacts came from, debug results, or show source URLs to a client. Keep it disabled for cleaner CSV exports.

Can I use the result in Google Sheets, Zapier, Make, or n8n?

Yes. Apify datasets and webhooks work with all common automation tools.

Does it scrape LinkedIn?

No. LinkedIn is intentionally excluded. Use a dedicated LinkedIn actor if you need LinkedIn data.

Is this legal?

The actor extracts publicly visible website data. You are responsible for using the data legally and respecting privacy, anti-spam, GDPR, CCPA, CAN-SPAM, and other rules that apply to your use case.


Best practices for clean lead lists

  • Start with company homepages or contact pages.
  • Keep compactOutput enabled.
  • Use summary mode for CRM exports.
  • Use includeEvidence only when you need auditability.
  • Run a small sample first, then scale.
  • For international phone numbers, set countryHint to the most common target country.
  • For outreach, verify emails with a deliverability tool before sending campaigns.

Tags

email scraper | phone scraper | contact scraper | social media scraper | website contact extractor | lead generation | b2b leads | company enrichment | crm enrichment | business contacts | website scraper | email finder | phone number finder | social profile finder | logo extractor | brand data | sales prospecting | marketing outreach | Apify actor | HTTP scraper | no browser scraper


Built for Apify. HTTP-only. Clean contact leads from website lists.