Deep Email, Phone & Social Media Scraper
Pricing
from $1.80 / 1,000 website contact leads
Deep Email, Phone & Social Media Scraper
Find emails, phone numbers, social profiles, logos, and business contact details from any website list. HTTP-only, fast, clean output, with smart contact-page discovery and optional source evidence for lead generation.
Pricing
from $1.80 / 1,000 website contact leads
Rating
0.0
(0)
Developer
Blynx
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Find public business contacts from any list of websites. This actor crawls company sites with HTTP requests only, discovers likely contact pages, extracts emails, phone numbers, social profiles, logos, and business metadata, then returns clean lead records ready for CRM, outreach, enrichment, or research.
No browser. No Playwright. No Puppeteer. Just fast HTTP crawling, Chrome-like requests, smart contact-page discovery, and clean Apify dataset output.
Use it when you have a list of company websites and need answers like:
- What is the best public email for this company?
- Does the website publish phone numbers?
- Which social and messaging channels are linked from the site?
- What is the company name, logo, address, or legal metadata?
- Which page was the contact found on?
What this actor extracts
Contacts
- Emails from
mailto:links, visible text, HTML, Cloudflare protected emails, JSON-LD, and common[at] / dotobfuscation - Phone numbers from
tel:links, page text, and JSON-LD - E.164 phone normalization when the number is valid
- Best email, best phone, confidence score, and source page
Social and messaging channels
- X / Twitter
- YouTube
- TikTok
- Snapchat
- Discord
- Twitch
- GitHub
- Medium
- Telegram
- Yelp
- Tripadvisor
- App Store
- Google Play
- Amazon, Etsy, and eBay store/profile links
LinkedIn, Trustpilot, Google Maps, and Threads are intentionally excluded to keep the output focused and avoid duplicating separate scrapers.
Business and brand data
- Company name
- Legal name
- Website title and meta description
- Addresses from structured data when present
- Opening hours from structured data when present
- VAT / tax IDs when visible
- Registration numbers when visible
- Best logo URL
- Favicon
- Apple touch icon
- OpenGraph image
- Twitter card image
- Source evidence for logo/contact extraction when enabled
Common use cases
- Lead generation from company website lists
- CRM enrichment
- Agency prospecting
- B2B sales research
- Directory enrichment
- Supplier and vendor research
- Startup, SaaS, ecommerce, and local business contact collection
- Marketing outreach preparation
- Checking whether company websites publish contact details
- Building internal contact intelligence datasets
Quick start in Apify Console
- Open the actor in Apify Console.
- Paste websites into Website URL(s) or domain(s).
- For the first test, set Number of websites to process to
10. - Keep Pages per website between
3and10. - Keep Stop early when enough contacts are found enabled to save cost.
- Run the actor.
- Open the default dataset and export results as JSON, CSV, Excel, XML, RSS, or HTML.
You can paste full URLs or plain domains. These are all valid:
example.comhttps://example.comhttps://www.example.com/contact
The actor normalizes plain domains to HTTPS automatically.
Recommended input examples
Simple website list
Use this for normal lead enrichment.
{"startUrls": [{ "url": "https://www.cloudflare.com/resource/contact-enterprise-sales/" },{ "url": "https://www.bluehost.com/contact" },{ "url": "https://www.wolfssl.com/contact/" }],"maxDomains": 10,"maxPagesPerDomain": 5,"maxDepth": 1,"stopWhenFound": true,"outputMode": "summary","compactOutput": true}
Deeper contact discovery
Use this when websites are messy and contacts may be on support, about, legal, team, or office pages.
{"startUrls": [{ "url": "https://example.com" }],"maxDomains": 1,"maxPagesPerDomain": 15,"maxDepth": 2,"useSitemap": true,"stopWhenFound": false,"includeEvidence": true,"outputMode": "summary"}
Import websites from another dataset
Use this when a previous actor produced a dataset with website URLs.
{"datasetId": "YOUR_DATASET_ID","maxDomains": 1000,"maxPagesPerDomain": 5,"outputMode": "summary"}
The actor reads these fields from input dataset items:
website, url, domain, companyUrl, sourceUrl, finalUrl, startUrl
Evidence and audit mode
Use this when you want to see where each email, phone, social profile, or logo came from.
{"startUrls": [{ "url": "https://example.com" }],"includeEvidence": true,"outputMode": "summary","compactOutput": true}
Page-level debugging mode
Use this when you want one dataset row per crawled page.
{"startUrls": [{ "url": "https://example.com" }],"outputMode": "pages","includeEvidence": true}
Input fields
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | sample URLs | Websites or pages to scan. Use full URLs or plain domains. |
datasetId | string | empty | Optional Apify dataset containing website/domain fields. |
maxDomains | integer | 1 | Safety cap for unique websites processed from all inputs. Raise it for real batches. |
maxPagesPerDomain | integer | 5 | Maximum pages fetched per website. Start with 3-10. |
maxDepth | integer | 1 | Link depth. 0 = start page only, 1 = linked contact/about pages, 2 = one more layer. |
stopWhenFound | boolean | true | Stops early when a strong email plus phone or socials are found. Saves time and cost. |
extractEmails | boolean | true | Extract normal, obfuscated, mailto, JSON-LD, and Cloudflare protected emails. |
extractPhones | boolean | true | Extract and validate phone numbers. |
extractSocials | boolean | true | Extract social, messaging, marketplace, and app profile links. |
extractBrandAssets | boolean | true | Extract logo, favicon, OpenGraph image, and related brand images. |
extractBusinessData | boolean | true | Extract company name, legal name, addresses, opening hours, tax IDs, registration numbers. |
useSitemap | boolean | true | Reads sitemap.xml and adds contact-like URLs. |
followSubdomains | boolean | false | Allows crawling same-root subdomains, for example help.example.com. |
countryHint | string | US | Default phone country for numbers without country prefix. |
outputMode | string | summary | summary, pages, or both. |
includeEvidence | boolean | false | Adds detailed source evidence objects. |
compactOutput | boolean | true | Removes empty arrays, empty objects, nulls, and blank strings. |
maxConcurrency | integer | 10 | Number of websites processed in parallel. |
requestTimeoutSec | integer | 25 | HTTP request timeout per page. |
maxRetries | integer | 3 | Retry budget for HTTP and connection errors. |
maxProxyRetries | integer | 3 | Extra retry budget for proxy/transport failures. |
proxyConfiguration | object | Apify Proxy off | Apify Proxy settings. Keep it off for cheap tests; enable it if target sites block direct requests. |
userAgent | string | empty | Optional custom user agent. Leave empty for built-in Chrome-like headers. |
Accepted countryHint values:
US, GB, DE, FR, ES, IT, NL, PL, IN, CA, AU, BR, MX
The API also accepts URL aliases for convenience:
websites, start_urls, urls, domains
Output modes
summary
Default mode. Returns one clean lead row per website. Best for exports, CRM import, enrichment, and normal use.
pages
Returns one row per crawled page. Best for debugging, QA, and checking which pages had contacts.
both
Returns summary rows plus page rows. Use only when you need both lead records and page-level evidence in the same dataset.
Main output fields
Website identity
| Field | Description |
|---|---|
recordType | Usually domain in summary mode. |
domain | Final website host. |
rootDomain | Root domain used for matching. |
startUrl | Original normalized input URL. |
finalUrl | Final URL after redirects. |
status | ok, no_contacts_found, or failed. |
statusCode | HTTP status code of the first successful page. |
pageTitle | First useful page title. |
metaDescription | First useful meta description. |
language | HTML language attribute when present. |
countryHint | Phone country hint used by the run. |
pagesCrawled | Number of pages fetched for the website. |
pagesMatched | Number of pages where contacts or socials were found. |
crawlDepthReached | Deepest crawl depth reached. |
Emails
| Field | Description |
|---|---|
bestEmail | Best ranked email for outreach. |
bestEmailType | sales, support, info, press, jobs, privacy, billing, personal, or unknown. |
bestEmailConfidence | high, medium, or low. |
emails | Unique email list. |
emailDetails | Detailed email evidence when includeEvidence is enabled. |
Phones
| Field | Description |
|---|---|
bestPhone | Best ranked display phone. |
bestPhoneE164 | Normalized E.164 phone when possible. |
bestPhoneConfidence | high, medium, or low. |
phones | Unique valid phone list. |
phoneDetails | Detailed phone evidence when includeEvidence is enabled. |
Social profiles
| Field | Description |
|---|---|
socialProfiles | Unified list of { platform, url } records. |
facebookUrls | Facebook pages/profiles. |
instagramUrls | Instagram profiles. |
twitterUrls | X / Twitter profiles. |
youtubeUrls | YouTube channels or handles. |
tiktokUrls | TikTok profiles. |
whatsappUrls | WhatsApp links. |
telegramUrls | Telegram links. |
githubUrls | GitHub organization/user profiles. |
mediumUrls | Medium profiles. |
Other *Urls fields | Additional supported social, app, or marketplace links. |
Brand and company data
| Field | Description |
|---|---|
companyName | Company name from JSON-LD or site metadata. |
legalName | Legal name when structured data provides it. |
bestLogoUrl | Best logo/image candidate. |
logoUrl | Same as bestLogoUrl, for convenient exports. |
logoSource | jsonLd, headerLogo, appleTouchIcon, openGraph, twitterCard, or favicon. |
logoConfidence | Confidence of selected logo. |
faviconUrl | Favicon URL. |
appleTouchIconUrl | Apple touch icon URL. |
openGraphImageUrl | OpenGraph image URL. |
twitterImageUrl | Twitter card image URL. |
brandImages | List of likely brand images. |
addresses | Structured addresses when present. |
openingHours | Structured opening hours when present. |
taxIds | Visible VAT/tax IDs when detected. |
vatIds | Alias of taxIds. |
registrationNumbers | Visible company registration numbers when detected. |
Evidence fields
These appear when includeEvidence is enabled:
| Field | Description |
|---|---|
bestContactPage | Best page where useful contact data was found. |
sourcePages | Crawled pages with counts of found emails, phones, and socials. |
contactEvidence | Full contact evidence with value, source URL, page type, confidence, and context. |
imageEvidence | Logo and image extraction evidence. |
warnings | Non-fatal fetch or parsing warnings. |
errors | Fatal run errors for that website. |
Example output
{"recordType": "domain","domain": "wolfssl.com","rootDomain": "wolfssl.com","startUrl": "https://www.wolfssl.com/contact/","finalUrl": "https://www.wolfssl.com/contact/","status": "ok","statusCode": 200,"companyName": "wolfSSL","pagesCrawled": 5,"pagesMatched": 5,"bestEmail": "support@wolfssl.com","bestEmailType": "support","bestEmailConfidence": "high","bestPhone": "+1 (425) 245-8247","bestPhoneE164": "+14252458247","bestPhoneConfidence": "high","emails": ["support@wolfssl.com","facts@wolfssl.com","licensing@wolfssl.com"],"phones": ["+1 (425) 245-8247"],"socialProfiles": [{ "platform": "X / Twitter", "url": "https://twitter.com/wolfssl" },{ "platform": "Facebook", "url": "https://www.facebook.com/wolfssl" },{ "platform": "GitHub", "url": "https://www.github.com/wolfssl" }],"bestLogoUrl": "https://www.wolfssl.com/wordpress/wp-content/uploads/2020/12/cropped-wolfssl_logo_300px.png"}
Contact discovery logic
The actor does not crawl the whole website blindly. It prioritizes URLs that usually contain contacts:
/contact/contacts/about/team/staff/support/help/sales/press/media/locations/offices/impressum/imprint/legal/privacy
When useSitemap is enabled, it checks sitemap.xml and adds only contact-like URLs from the sitemap. This helps find contact pages that are not linked from the homepage.
How ranking works
The actor returns all unique contacts, but it also chooses bestEmail and bestPhone.
Email ranking prefers:
- Same-domain business emails
- High-confidence sources like
mailto, contact pages, support pages, and JSON-LD - Useful role emails like sales, info, support, and press
- Lower-confidence text matches only when they look legitimate
Phone ranking prefers:
- Valid phone numbers
- Numbers matching the selected
countryHint - Numbers found on contact, support, sales, or legal pages
- Numbers that can be normalized to E.164
Clean output and filtering
The actor filters common junk before writing results:
- Fake emails like
test@example.com - Asset-like strings that look like emails but are actually images or scripts
- Sentry, schema, Wix, and placeholder domains
- Personal/free-mail noise when it appears as unrelated text on a different company site
- Social share/login URLs
- LinkedIn, Trustpilot, Google Maps, and Threads links
- Empty arrays, empty objects, nulls, and blank strings when
compactOutputis enabled
This keeps Apify's All fields view clean and makes CSV/Excel exports easier to use.
How to run with Apify API
Replace YOUR_TOKEN with your Apify API token.
curl -X POST "https://api.apify.com/v2/acts/trakk~deep-email-phone-social-media-scraper-search/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrls": [{ "url": "https://www.wolfssl.com/contact/" },{ "url": "https://www.bluehost.com/contact" }],"maxDomains": 2,"maxPagesPerDomain": 5,"outputMode": "summary"}'
To get dataset items after the run finishes:
$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&format=json&token=YOUR_TOKEN"
Apify CLI commands
Log in
$apify login
Run the actor on Apify
$apify call trakk/deep-email-phone-social-media-scraper-search --input-file input.json
You can also call by actor ID:
$apify call BtjVjAQKexpfdq5po --input-file input.json
Run and print dataset output
$apify call BtjVjAQKexpfdq5po --input-file input.json --output-dataset
Check latest runs
$apify runs ls BtjVjAQKexpfdq5po --limit 10 --desc
View one run
$apify runs info RUN_ID
Download dataset items
apify datasets get-items DATASET_ID --format jsonapify datasets get-items DATASET_ID --format csvapify datasets get-items DATASET_ID --format xlsx
Deploy updates to Apify
$apify push --force
Local development commands
Install dependencies:
$pip install -r requirements.txt
Run locally with Apify storage:
$apify run
Run the Python module directly:
$python -m src
Run unit tests:
$python -m unittest discover -s tests -v
Validate the Apify input schema:
$apify validate-schema .actor/input_schema.json
Deploy:
$apify push --force
Performance tips
Fast and cheap first run
Use:
{"maxPagesPerDomain": 3,"maxDepth": 1,"stopWhenFound": true,"maxConcurrency": 10}
Better coverage
Use:
{"maxPagesPerDomain": 10,"maxDepth": 2,"stopWhenFound": false,"useSitemap": true}
Large website lists
Recommended settings:
{"maxPagesPerDomain": 3,"maxDepth": 1,"stopWhenFound": true,"compactOutput": true,"includeEvidence": false,"maxConcurrency": 10}
When to use proxies
Apify Proxy is disabled by default so the sample run stays cheap and stable. For most public company websites, direct requests are enough. If a website blocks direct traffic, enable Apify Proxy in proxyConfiguration; residential proxy can help for more protected sites.
When to raise retries
Raise maxRetries and maxProxyRetries if some websites randomly fail with connection errors, 429, 5xx, or temporary blocks.
Status values
| Status | Meaning |
|---|---|
ok | The website was crawled and at least one contact, phone, or social profile was found. |
no_contacts_found | Pages were fetched, but no useful contacts were found. |
failed | No pages were crawled successfully. Check warnings or errors. |
FAQ
Does this actor use a browser?
No. It is requests-only. It does not use Playwright, Puppeteer, Selenium, or a headless browser.
Can it find emails hidden behind JavaScript?
Sometimes, if the email is present in the HTML, JSON-LD, mailto, Cloudflare email protection, or page text. It will not execute JavaScript.
Does it verify that an email inbox exists?
No. It extracts and cleans public emails, but it does not perform SMTP verification or deliverability checks.
Why did a website return no contacts?
Possible reasons: the site blocks automated requests, contacts are loaded only after JavaScript execution, contacts are behind forms, or the site does not publish direct contacts.
Can I scrape thousands of websites?
Yes. Use datasetId for large input lists, keep maxPagesPerDomain modest, keep stopWhenFound enabled, and tune maxConcurrency based on stability.
What is the best setting for normal lead generation?
Use summary output, compactOutput: true, maxPagesPerDomain: 5, maxDepth: 1, stopWhenFound: true, and includeEvidence: false.
When should I enable evidence?
Enable includeEvidence when you need to audit where contacts came from, debug results, or show source URLs to a client. Keep it disabled for cleaner CSV exports.
Can I use the result in Google Sheets, Zapier, Make, or n8n?
Yes. Apify datasets and webhooks work with all common automation tools.
Does it scrape LinkedIn?
No. LinkedIn is intentionally excluded. Use a dedicated LinkedIn actor if you need LinkedIn data.
Is this legal?
The actor extracts publicly visible website data. You are responsible for using the data legally and respecting privacy, anti-spam, GDPR, CCPA, CAN-SPAM, and other rules that apply to your use case.
Best practices for clean lead lists
- Start with company homepages or contact pages.
- Keep
compactOutputenabled. - Use
summarymode for CRM exports. - Use
includeEvidenceonly when you need auditability. - Run a small sample first, then scale.
- For international phone numbers, set
countryHintto the most common target country. - For outreach, verify emails with a deliverability tool before sending campaigns.
Tags
email scraper | phone scraper | contact scraper | social media scraper | website contact extractor | lead generation | b2b leads | company enrichment | crm enrichment | business contacts | website scraper | email finder | phone number finder | social profile finder | logo extractor | brand data | sales prospecting | marketing outreach | Apify actor | HTTP scraper | no browser scraper
Built for Apify. HTTP-only. Clean contact leads from website lists.