Website Contact & Tech Stack Scraper avatar

Website Contact & Tech Stack Scraper

Pricing

from $40.00 / 1,000 domain results

Go to Apify Store
Website Contact & Tech Stack Scraper

Website Contact & Tech Stack Scraper

Scrape any website for emails, phone numbers, social profiles, tech stack, ad pixels, chatbot, and a lead score. One clean record per domain. Bulk lead generation + MCP server for AI agents.

Pricing

from $40.00 / 1,000 domain results

Rating

0.0

(0)

Developer

Mayowa Ogedengbe

Mayowa Ogedengbe

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

2 days ago

Last modified

Share

Website Contact & Tech Stack Scraper extracts emails, phone numbers, social media profiles, technology stack, advertising pixels, chatbots, and a lead score from any list of websites — returning one clean, structured record per domain. It is a bulk website contact scraper and B2B lead generation tool built for sales teams, marketing and web design agencies, and AI agents that need to enrich a company from just its URL.

Give it one website or thousands. For every domain you get reachable contacts (emails, phones, WhatsApp, contact forms), the full tech stack (CMS, ecommerce, analytics, chat, marketing automation), which ad pixels and chatbots are actually running, business signals, and a 0–100 lead score with reasons. The output schema is stable and resale-ready: every field is always present, so you can drop the dataset straight into a CRM, spreadsheet, or enrichment pipeline.

What does the Website Contact & Tech Stack Scraper do?

This website scraper crawls each domain you give it, prioritizes the pages that carry contact and company information (contact, about, team, footer), and extracts a single rich JSON record per domain. It tries cheap HTTP requests first and only launches a real browser when a page needs JavaScript, so it stays fast and cost-effective at scale.

In one run you can answer questions like:

  • Who can I contact at this company, and how? — emails (including obfuscated ones), phone numbers in E.164 format, WhatsApp links, social profiles, and the best contact-form page.
  • What technology does this site run on? — CMS, ecommerce platform, JavaScript frameworks, analytics, CDN, and server.
  • Are they spending on ads? — Meta Pixel, Google Ads, TikTok, LinkedIn, Pinterest, Reddit and other pixels, including ones injected at runtime through Google Tag Manager that static scrapers miss.
  • Do they use a chatbot, and which vendor? — Intercom, Drift, Zendesk, Tidio, Crisp, HubSpot Chat, and more.
  • Is this a good lead? — a 0–100 lead score with human-readable reasons.

Who is this website scraper for?

This tool maps a single, high-value output to each type of buyer:

  • Sales & lead generation teams (SDRs): stop wasting hours finding and qualifying contacts. Get emails, phone numbers, and a lead score per domain, with role-vs-personal email classification so you can target the right inbox.
  • Ad & marketing agencies: find businesses that are (or are not) already running ads. The ad-pixel and live-ad signals tell you who has budget and who to pitch.
  • Web design & development agencies: build a pipeline of outdated sites to pitch for a redesign using tech_freshness, legacy-stack detection, and missing-SSL flags.
  • Chatbot & AI support vendors: find which businesses across a market lack a chatbot — or already run a competitor's — using chatbot presence and vendor detection.
  • AI agent developers: enrich any company from a URL live, with no pipeline to build, by calling the actor as a Model Context Protocol (MCP) server.

What data can this website scraper extract?

Every domain produces one JSON record with a stable schema (missing scalars are null, missing lists are [], keys are never omitted):

Field groupWhat you get
CompanyName, description, logo URL, industry guess
EmailsAddresses from mailto:, visible text, JSON-LD, and <script> data; decodes Cloudflare and entity obfuscation; classified role vs personal; placeholder/example addresses filtered; scoped to the target's own domain (third-party addresses found on the page go to a separate emails_off_domain list); optional DNS MX validation
PhonesValid numbers in E.164 format with ISO country code
WhatsAppClick-to-chat links and numbers
SocialsLinkedIn (company + people), Twitter/X, Instagram, Facebook, YouTube, TikTok — normalized, tracking params stripped, attributed to the target (profiles belonging to others, e.g. a creator featured on the site, go to socials.other_profiles)
AddressValidated postal address from JSON-LD PostalAddress or footer heuristics — marketing copy and bare fragments are rejected, with sub-fields (street/city/region/postal/country) split out when available
Contact formWhether a real contact form exists and the best contact-form page URL
Tech stackCMS, ecommerce platform, primary framework, analytics, chat, marketing automation, CDN, server; runtime_signals flags whether GTM-injected tags were resolved
ChatbotWhether a chatbot is present, the vendor, and how it was detected
Ad pixelsMeta, Google Ads, GTM, LinkedIn, TikTok, X, Pinterest, Reddit, Bing, and more — including runtime/GTM-injected pixels
Live adsOptional best-effort check of Meta Ad Library and Google Ads Transparency
Site signalsHTTPS, SSL validity, mobile-friendly hint, copyright year, last content date, tech_freshness (modern / dated / legacy / unknown)
Lead score0–100 score with the reasons that contributed
Status & diagnosticsA top-level status (ok / partial / blocked / failed), plus crawl metadata: pages crawled, fetch method, detected bot/WAF vendor, and a classified failure_reason (dns / tls / connection / timeout) when a site can't be fetched
ScreenshotOptional homepage screenshot URL

How to use the Website Contact & Tech Stack Scraper

  1. Add your websites. Paste plain URLs into the Websites field (one per line) and/or add them under Start URLs. The scraper deduplicates by registrable domain automatically.
  2. (Optional) tune the crawl. Set maxPagesPerSite, maxDepth, and renderJavaScript (auto is recommended). Enable validateEmailMx, checkLiveAds, or captureScreenshot if you need them.
  3. Run the actor. Results are pushed to the dataset — one record per domain — which you can export to JSON, CSV, Excel, or pull via the API.

Example input

{
"websites": ["https://example.com", "https://acme.co"],
"maxPagesPerSite": 3,
"renderJavaScript": "auto",
"prioritizeContactPages": true,
"validateEmailMx": false,
"checkLiveAds": false,
"captureScreenshot": false
}

Key input options

FieldDefaultWhat it does
startUrls / websitesThe websites to scrape (provide at least one)
maxPagesPerSite3Maximum pages crawled per domain. Default of 3 = homepage + 2 ranked contact pages (where leads live). Raise for broader coverage.
maxDepth2Link depth from the start URL
prioritizeContactPagestrueCrawl contact / about / team pages first
renderJavaScriptautoauto renders only JS-heavy pages; always / never force it
checkLiveAdsfalseBest-effort Meta / Google live-ad check
captureScreenshotfalseSave a homepage screenshot
validateEmailMxfalseDNS MX lookup (never SMTP)
bypassProtectionfalseWhen a site is behind a beatable Cloudflare challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost; interactive CAPTCHAs are never attempted
respectRobotsTxttrueHonor robots.txt
proxyConfigurationDatacenterProxy settings; switch to Residential only for sites that block datacenter IPs
scoringWeightsdefaultsOverride the lead-score weights

Example output

{
"input_url": "https://example.com",
"final_url": "https://example.com/",
"domain": "example.com",
"status": "ok",
"crawl": {
"pages_crawled": 7, "fetch_method": "http", "block_vendor": null,
"failure_reason": null, "errors": []
},
"company": { "name": "Example Inc", "description": "…", "logo_url": "…" },
"emails": [
{ "value": "hello@example.com", "type": "role", "obfuscated": false,
"off_domain": false, "source_page": "https://example.com/contact" }
],
"emails_off_domain": [
{ "value": "support@somepartner.com", "type": "role", "off_domain": true,
"source_page": "https://example.com/integrations" }
],
"phones": [{ "raw": "+1 555 0100", "e164": "+15550100", "country": "US" }],
"socials": {
"linkedin_company": "…", "twitter": "…",
"other_profiles": [{ "platform": "youtube", "url": "https://youtube.com/c/someCreator" }]
},
"tech": {
"cms": "WordPress", "analytics": ["Google Analytics 4"],
"chat": ["Intercom"], "runtime_signals": "resolved"
},
"ads": { "pixels": ["meta", "gtm"], "running_ads": { "checked": false } },
"site_signals": { "https": true, "tech_freshness": "modern" },
"lead_score": { "score": 78, "reasons": ["role-based email", "advertising pixels: meta, gtm", "live chat: Intercom"] }
}

Use it as an MCP server for AI agents

This actor is also a Model Context Protocol (MCP) server, so an AI client — Claude Desktop, Cursor, an agent framework, or your own app — can call its scraping tools directly and enrich a company from a URL in real time.

Route 1 — Apify's hosted MCP server (no setup)

Every Apify Actor is callable through Apify's hosted MCP endpoint. Point your MCP client at it, authenticate with your own Apify API token, and scope it to this actor. Nothing to deploy.

Route 2 — this actor's dedicated MCP endpoint (Standby)

In Standby mode the actor serves MCP over Streamable HTTP at a stable URL:

https://USERNAME--website-intelligence-scraper.apify.actor/mcp

Configure your MCP client with that URL and an

Authorization: Bearer <APIFY_TOKEN>
header. The Apify platform validates the token for you.

MCP tools

ToolDescription
scrape_website(url, options?)Full pipeline on one domain; returns the complete record. Synchronous and fast.
extract_contacts(url)Only emails, phones, WhatsApp, socials, and contact form.
check_tech_and_ads(url)Only tech stack, chatbot, and ad pixels.
scrape_websites(urls[], options?)Asynchronous batch — starts a run and returns runId + datasetId.

All tools return structured JSON and never throw across the transport.

Pricing

Billing is Pay-Per-Event — you only pay for results that carry real data:

EventPriceWhen it's charged
Domain result$0.04Once per domain, only when the record is populated (≥1 contact, or ≥1 detected tech/ad/chat signal). Parked, blocked, or empty domains are never charged.
Bot-protection bypass+$0.06Only when bypassProtection is enabled and a stealth browser over residential proxy actually ran to clear a Cloudflare-style challenge. Covers the residential bandwidth and extra renders. Not charged when no bypass was needed.
Live ad check+$0.02Only when checkLiveAds is enabled and the check runs.
MCP tool call$0.04Once per MCP tool invocation in Standby mode.

The actor defaults to datacenter proxy to keep costs low; the bypassProtection option upgrades to a stealth browser over residential proxy for sites behind bot protection, and is the only path that triggers the bypass surcharge above. MCP Standby mode adds standby compute (~$0.40 per GB-hour while awake; it idles down when not in use).

Scraping publicly available data is generally legal in most jurisdictions. This actor only collects public data that any visitor can see, honors the respectRobotsTxt toggle, never performs SMTP email verification, and never crawls off the target domain. You are responsible for how you use the data, including compliance with GDPR, CCPA, and each site's terms where applicable.

Frequently asked questions

How do I scrape emails from a list of websites?

Paste your URLs into the Websites field and run the actor. Each domain returns an emails array with addresses found in mailto: links, visible text, JSON-LD, and embedded script data, including de-obfuscated Cloudflare and HTML-entity emails. Enable validateEmailMx to confirm each email domain has valid MX records (DNS only — no SMTP).

Does it separate personal emails from generic inboxes?

Yes. Every email is classified as role (shared inboxes like info@, sales@, support@, noreply@, and system mailboxes) or personal (an individual's address). This lets you target real people and skip generic catch-alls, or do the reverse, depending on your outreach.

Does it detect Facebook and Google ad pixels?

Yes. It detects ad pixels for Meta, Google Ads, Google Tag Manager, TikTok, LinkedIn, Pinterest, Reddit, Bing, and others — including pixels injected at runtime through Google Tag Manager, which static-only scrapers miss. Enable checkLiveAds for a best-effort check of whether the advertiser has active ads.

Can it tell which CMS or technology a website uses?

Yes. The tech block reports the CMS (WordPress, Shopify, Wix, Squarespace, Webflow, and more), ecommerce platform, JavaScript frameworks, analytics, chat, marketing automation, CDN, and server, using a data-driven fingerprint engine.

How does the lead score work?

Each domain gets a 0–100 lead_score computed from weighted signals — reachable contacts, ad pixels, marketing automation, chat, a modern tech stack, and social presence — with a reasons list explaining the score. You can override the weights with scoringWeights.

What happens when a website is down or blocks the scraper?

Each site is isolated. A failure never kills the run: the actor records the problem in that record's crawl.errors and still returns a partial record with the stable schema. Every record carries a top-level status (ok, partial, blocked, or failed); when a site can't be fetched, crawl.failure_reason classifies why (DNS, TLS, connection, timeout) and crawl.block_vendor names the bot/WAF vendor if one blocked the crawl — so you can tell a dead domain from one that's merely protected. Blocked, parked, or empty domains are not billed.

Can AI agents use this scraper?

Yes. The actor runs as an MCP (Model Context Protocol) server, so AI agents and clients like Claude Desktop and Cursor can call its tools to enrich a company from a URL live, without building a data pipeline.

How many websites can I scrape at once?

There is no hard limit — provide one URL or thousands. The actor deduplicates by domain, crawls with configurable concurrency, and pushes one record per domain to the dataset as it goes.

Use this actor to power lead lists, CRM enrichment, competitive analysis, market research, and ad-targeting audits. Export results as JSON, CSV, or Excel, or integrate via the Apify API, webhooks, and scheduling. For live, on-demand enrichment inside an AI agent, connect through the MCP server described above.