Pricing

Pay per event

Adaptive Website Lead Extractor

Crawl public business websites with Scrapling to extract emails, phones, social profiles, contact pages, automation gaps, and lead scores for CRM-ready outreach.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Solutions Smart

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

What It Extracts

Company name estimate
Page title and meta description
Public emails and phone numbers
Primary email and primary phone
Contact page and about page URLs
Social profile links
Address-like text when confidently detected
Contact form signals
Booking, chat, WhatsApp, and contact automation signals
Opportunity signals for missing contact automation
Optional image, video, document, audio, archive, and embed URLs
Lead score from 0 to 100
Extraction confidence from 0 to 1
Pages crawled, source pages, and non-fatal errors

Best Use Cases

Enrich company website lists with public contact data
Find businesses with weak contact or booking infrastructure
Build review queues for AI receptionist, local SEO, web design, or CRM automation outreach
Send structured lead records into n8n, Make, Zapier, Google Sheets, Airtable, HubSpot, Pipedrive, or a custom CRM
Discover public media and document URLs referenced by crawled pages when media mode is enabled

Input

{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPagesPerDomain": 20,
  "maxConcurrency": 5,
  "useStealth": false,
  "respectRobotsTxt": true,
  "extractEmails": true,
  "extractPhones": true,
  "extractSocialLinks": true,
  "extractContactPages": true,
  "extractAutomationSignals": true,
  "extractMedia": false,
  "extractImages": true,
  "extractVideos": true,
  "extractDocuments": true,
  "extractOtherMedia": true,
  "maxMediaPerDomain": 100,
  "crawlSameDomainOnly": true,
  "requestTimeoutSecs": 30,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

Important Input Options

Field	Default	Description
`startUrls`	required	Websites or domains to crawl.
`maxPagesPerDomain`	`20`	Hard limit for pages crawled per input website.
`maxConcurrency`	`5`	Number of websites processed in parallel.
`useStealth`	`false`	Uses Scrapling's stealth browser fetcher. Slower, intended only for public pages that need browser rendering.
`respectRobotsTxt`	`true`	Skips URLs disallowed by robots.txt.
`extractAutomationSignals`	`true`	Detects public booking, form, chat, and WhatsApp signals.
`extractMedia`	`false`	Enables media URL discovery. Files are not downloaded.
`maxMediaPerDomain`	`100`	Maximum media asset URLs returned per website.
`crawlSameDomainOnly`	`true`	Stays on the same normalized host. `docs.example.com` does not crawl `blog.example.com`.
`proxyConfiguration`	disabled	Optional Apify Proxy configuration for public sites that rate-limit datacenter traffic.

Media Extraction

Media discovery is disabled by default because the Actor is primarily a lead intelligence tool.

Set extractMedia to true to collect public URLs for:

images from img, source, srcset, Open Graph, Twitter image, icons, and CSS url(...)
videos from video tags and public embeds such as YouTube, Vimeo, Wistia, Loom, and Vidyard
documents such as PDF, DOCX, PPTX, XLSX, CSV, and TXT
audio files, archives, and other recognized media file URLs

The Actor records media URLs only. It does not download, store, transform, or rehost media files.

Output

The Actor pushes one item per input website to the default dataset and stores a run summary in the default Key-Value Store under OUTPUT_SUMMARY.

Example dataset item:

{
  "startUrl": "https://example.com",
  "domain": "example.com",
  "siteHost": "example.com",
  "companyName": "Example GmbH",
  "title": "Example GmbH - Digital Services",
  "description": "Example company description...",
  "primaryEmail": "info@example.com",
  "primaryPhone": "+49 30 123456",
  "emails": ["info@example.com"],
  "phones": ["+49 30 123456"],
  "socialLinks": {
    "linkedin": "https://linkedin.com/company/example",
    "instagram": "https://instagram.com/example"
  },
  "mediaSummary": {
    "images": 12,
    "videos": 1,
    "documents": 2,
    "other": 0,
    "total": 15
  },
  "mediaAssets": {
    "images": [
      {
        "url": "https://example.com/assets/logo.png",
        "sourcePage": "https://example.com",
        "extension": "png"
      }
    ],
    "videos": [
      {
        "url": "https://www.youtube.com/embed/example",
        "sourcePage": "https://example.com"
      }
    ],
    "documents": [
      {
        "url": "https://example.com/company-brochure.pdf",
        "sourcePage": "https://example.com/about",
        "extension": "pdf"
      }
    ],
    "other": []
  },
  "contactPage": "https://example.com/contact",
  "aboutPage": "https://example.com/about",
  "addressLikeText": ["Example Street 12, 10115 Berlin"],
  "contactMethods": {
    "hasEmail": true,
    "hasPhone": true,
    "hasContactPage": true,
    "hasContactForm": true,
    "hasSocialProfile": true
  },
  "automationSignals": {
    "hasOnlineBooking": false,
    "hasChatWidget": false,
    "hasContactForm": true,
    "hasWhatsappLink": false
  },
  "opportunitySignals": {
    "missingOnlineBooking": true,
    "missingChatWidget": true,
    "missingWhatsappLink": true,
    "missingContactForm": false,
    "hasMessagingGap": true,
    "hasAutomationGap": true
  },
  "siteClassification": {
    "type": "business_website",
    "businessWebsiteLikely": true,
    "reason": "Business contact or outreach signals were detected on crawled pages."
  },
  "recommendedAction": "Prioritize outbound: public email found and automation gap detected.",
  "leadScore": 78,
  "leadScoreLabel": "high",
  "confidence": 0.84,
  "confidenceLabel": "high",
  "confidenceReasons": [
    "Crawled 12 public page(s).",
    "Public email address found.",
    "Public phone number found.",
    "Likely contact page found.",
    "Company identity inferred from page metadata, title, schema, logo, or domain."
  ],
  "pagesCrawled": 12,
  "errors": [],
  "sourcePages": [
    "https://example.com",
    "https://example.com/contact",
    "https://example.com/about"
  ],
  "crawlSummary": {
    "pagesCrawled": 12,
    "emailsFound": 1,
    "phonesFound": 1,
    "socialProfilesFound": 2,
    "mediaAssetsFound": 15,
    "contactPageFound": true,
    "aboutPageFound": true,
    "errorsFound": 0
  }
}

Output Fields

Field	Description
`domain`	Registered domain, for example `example.com`.
`siteHost`	Actual host crawled, for example `docs.example.com`.
`companyName`	Best-effort company name from title, metadata, schema, logo alt text, or domain.
`primaryEmail`, `primaryPhone`	First selected contact candidates for workflow-friendly use.
`emails`, `phones`	Deduplicated public contact data found on crawled pages.
`socialLinks`	Public social profile URLs grouped by platform.
`mediaSummary`, `mediaAssets`	Media counts and URLs when `extractMedia` is enabled.
`contactMethods`	Boolean summary of reachable contact methods.
`automationSignals`	Detected booking, chat, form, and WhatsApp signals.
`opportunitySignals`	Missing automation/contact signals useful for outreach review.
`siteClassification`	Best-effort site type classification: `business_website`, `documentation`, `blog`, `ecommerce`, or `unknown`.
`leadScore`	Transparent opportunity score from `0` to `100`.
`confidence`, `confidenceReasons`	Extraction confidence from `0` to `1` and short reasons explaining the confidence.
`engine`, `engineRepository`	Scraping engine metadata for auditability and workflow routing.
`crawlSummary`	Compact summary for dashboards and automation filters.

Reliability

Uses an input schema so Apify validates required input before the run starts.
Uses an output schema so users, API clients, and AI agents know where to find results.
Pushes one dataset item per input website, even when no contact data is found.
Fails gracefully per URL and records non-fatal crawl errors in the output item.
Stores a run-level OUTPUT_SUMMARY record in the default Key-Value Store.
Uses bounded crawling with maxPagesPerDomain, maxConcurrency, and request timeouts.
Runs under Apify limited permissions and does not require account credentials.

Automated Test Readiness

Apify's automated Store test expects the Actor's default/prefilled input to finish successfully and produce a non-empty default dataset within a short time window.

Recommended smoke-test input:

{
  "startUrls": [
    {
      "url": "https://docs.apify.com/"
    }
  ],
  "maxPagesPerDomain": 10,
  "maxConcurrency": 1,
  "respectRobotsTxt": true,
  "extractMedia": false
}

Expected smoke-test result:

run status: succeeded
default dataset: non-empty
one domain-level item pushed
no uncaught ReferenceError, TypeError, or Python traceback
OUTPUT_SUMMARY present in the Key-Value Store

Ease of Use

Provides form-friendly input controls for URLs, crawling limits, concurrency, robots.txt, contact extraction, automation signals, media extraction, timeout, and proxy settings.
Uses conservative defaults for normal public website enrichment.
Keeps media extraction disabled by default to reduce output size and cost.
Returns CRM-friendly fields such as primaryEmail, primaryPhone, leadScore, confidence, recommendedAction, and crawlSummary.

Trust and Safety

Crawls public pages only.
Respects robots.txt when enabled.
Avoids authenticated, private, checkout, account, and obvious sensitive paths.
Does not submit forms.
Does not solve CAPTCHAs.
Does not perform aggressive anti-bot bypassing.
Does not download or rehost media files; media mode records public URLs only.

Congruency

The Actor title, description, input schema, output schema, dataset view, README, and monetization events use the same terminology:

website/domain lead record
basic lead record
qualified lead record
media assets
automation signals
lead score
confidence
crawl summary

This consistency is intentional because Apify's quality score considers whether an Actor's text, schemas, and behavior align.

Lead Score

The score is intentionally simple and transparent. It is an outreach opportunity score, not a business quality score.

Example scoring factors:

+20 email found
+20 phone found
+15 contact page found
+10 social profile found
+10 contact form found
+15 appointment-based business appears to lack online booking
+15 no chat, WhatsApp, or similar messaging automation detected
capped at 100

Use the score for prioritization and human review, not automated eligibility decisions.

Confidence

Confidence is separate from lead score. It increases when more useful pages are crawled, contact/about pages are found, contact details are detected, and multiple signals confirm the company identity. It decreases when pages fail, data is sparse, or identity/contact signals are weak.

Recommended Workflows

CRM Enrichment

Upload a list of company websites.
Extract email, phone, social profiles, contact page, score, and confidence.
Export the dataset to CSV or send it to HubSpot, Pipedrive, Airtable, or Google Sheets.

AI Receptionist or Booking Automation Leads

Filter for websites with:

phone number present
email or contact page present
opportunitySignals.hasAutomationGap = true
missing booking, chat, WhatsApp, or contact form

n8n Automation

Trigger this Actor from n8n.
Read the default dataset items.
Filter by leadScore, confidence, and opportunitySignals.
Send qualified records to Google Sheets, Airtable, HubSpot, Slack, or an outreach queue.

Example filter:

leadScore >= 70
AND confidence >= 0.7
AND opportunitySignals.hasAutomationGap = true

Proxy and Stealth Use

The default run does not use Apify Proxy and does not use stealth fetching.

For larger public crawls or sites that rate-limit datacenter traffic, enable Apify Proxy:

{
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

Use useStealth: true only when public pages need browser rendering. This Actor does not solve CAPTCHAs, submit forms, scrape authenticated content, or perform aggressive anti-bot bypassing.

Performance Tips

Keep maxPagesPerDomain between 5 and 20 for quick enrichment.
Use 20 to 50 pages for deeper lead analysis.
Disable extractMedia unless you need media URLs.
Keep crawlSameDomainOnly enabled for cleaner results.
Use moderate concurrency for large input lists.
Enable proxy only when needed.

Limitations

Websites vary widely. Some sites hide contact details behind JavaScript, publish contact data as images, block automated requests, use ambiguous phone/address formats, or disallow crawling in robots.txt.

Automation signals are best-effort public-page signals. They should be treated as review hints, not guarantees.

Ethical Usage

Use this Actor only on public web pages and for legitimate business purposes. Respect robots.txt when enabled and comply with applicable privacy, marketing, platform, and data protection rules.

Do not use this Actor for spam, harassment, credential collection, sensitive profiling, scraping private or authenticated data, bypassing access restrictions, or deceptive outreach.

Always review leads before contacting them.

Local Business Lead Enricher & Website Contact Auditor

leadforge412/local-business-lead-enricher-website-contact-auditor

Enrich Google Maps Scraper results and local business lists with emails, phones, social links, website checks, contact pages, and transparent CRM-ready lead scores.

Mezhnun Orudzhaliev

Extract Emails Contacts Socials From Any Website

scrapio/extract-emails-contacts-socials-from-any-website

✉️ Extract emails, phones, contact pages & social profiles from any website. 🔎 Lightning-fast email extractor & contact scraper for lead gen, sales outreach, and research. 🚀 Clean, structured export to power your CRM and campaigns.

Scrapio

Website Email, Phone & Social Extractor

toolsnmoreapi/Website-Lead-Scraper

Extract business emails, phone numbers, and social profiles from websites — clean, structured, and ready for lead generation.

ToolsAPI

Website Lead Extractor, Emails, Phones & Social Profiles

george.the.developer/website-contact-scraper

Extract contact information from any website. Finds emails, phone numbers, social media profiles, and contact forms automatically. Perfect for building prospect lists, lead generation, and sales outreach. Handles JavaScript rendered pages.

George Kioko

AI Lead Intelligence Website Opportunity Finder API

shahabuddin38/AI-Lead-Intelligence-Website-Opportunity-Finder-API

Discover businesses with weak SEO, missing AI automation, poor lead capture, and outdated websites. Extract emails, phones, WhatsApp, and social profiles while generating outreach-ready opportunities, lead scores, SEO audits, and AI automation insights from public business websites.

Shahab Uddin

Website Email & Contact Extractor

wishful_knowledge/website-contact-tech-scanner

Extract public business emails, contact links, social profiles, tech stack signals, and outreach scores from domain or URL lists.

sanfeng zhang

B2B Website Contact & Company Intelligence Extractor (CRM-Ready

adinfosys-labs/b2b-website-contact-company-intelligence-extractor-crm-ready

Extract emails, phone numbers, and social links from thousands of websites. Automatically scans contact pages and returns clean, export-ready contact data.

Artashes Arakelyan

5.0

Contact Info Scraper — Extract Emails & Phones from Websites

lanky_quantifier/contact-info-scraper

Extract emails, phone numbers, and social profiles (LinkedIn, Twitter, Facebook, Instagram, YouTube, TikTok, GitHub) from any website. Crawls contact pages, footers, and team pages. B2B lead gen and CRM enrichment.

Vhub Systems

Contact Info Extractor

optimus-fulcria/contact-info-extractor

Extract emails, phone numbers, social media profiles, and addresses from any website. Auto-follows contact pages. Lead generation ready.

Fulcria Labs

Tiktok Contact Extractor

coregent/tiktok-contact-extractor

TikTok Contact Extractor finds public contact details from TikTok creators, including emails, phones, websites, Instagram, YouTube, and profile data. Use it for influencer outreach, lead generation, creator partnerships, and contact-ready CRM exports.