Pricing

from $10.00 / 1,000 domain results

Website Contact & Tech Stack Scraper

Scrape any website for emails, phone numbers, social profiles, tech stack, ad pixels, chatbot, and a lead score. One clean record per domain. Bulk lead generation + MCP server for AI agents.

Pricing

from $10.00 / 1,000 domain results

Rating

0.0

(0)

Developer

Mayowa Ogedengbe

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

What does the Website Contact & Tech Stack Scraper do?

This website scraper crawls each domain you give it, prioritizes the pages that carry contact and company information (contact, about, team, footer), and extracts a single rich JSON record per domain. It tries cheap HTTP requests first and only launches a real browser when a page needs JavaScript, so it stays fast and cost-effective at scale.

In one run you can answer questions like:

Who can I contact at this company, and how? — emails (including obfuscated ones), phone numbers in E.164 format, WhatsApp links, social profiles, and the best contact-form page.
What technology does this site run on? — CMS, ecommerce platform, JavaScript frameworks, analytics, CDN, and server.
Are they spending on ads? — Meta Pixel, Google Ads, TikTok, LinkedIn, Pinterest, Reddit and other pixels, including ones injected at runtime through Google Tag Manager that static scrapers miss.
Do they use a chatbot, and which vendor? — Intercom, Drift, Zendesk, Tidio, Crisp, HubSpot Chat, and more.
Is this a good lead? — a 0–100 lead score with human-readable reasons.

Who is this website scraper for?

This tool maps a single, high-value output to each type of buyer:

Sales & lead generation teams (SDRs): stop wasting hours finding and qualifying contacts. Get emails, phone numbers, and a lead score per domain, with role-vs-personal email classification so you can target the right inbox.
Ad & marketing agencies: find businesses that are (or are not) already running ads. The ad-pixel and live-ad signals tell you who has budget and who to pitch.
Web design & development agencies: build a pipeline of outdated sites to pitch for a redesign using tech_freshness, legacy-stack detection, and missing-SSL flags.
Chatbot & AI support vendors: find which businesses across a market lack a chatbot — or already run a competitor's — using chatbot presence and vendor detection.
AI agent developers: enrich any company from a URL live, with no pipeline to build, by calling the actor as a Model Context Protocol (MCP) server.

What data can this website scraper extract?

Every domain produces one JSON record with a stable schema (missing scalars are null, missing lists are [], keys are never omitted):

Field group	What you get
Company	Name, description, logo URL, industry guess
Emails	Addresses from `mailto:`, visible text, JSON-LD, and `<script>` data; decodes Cloudflare and entity obfuscation; classified role vs personal; placeholder/example addresses filtered; scoped to the target's own domain (third-party addresses found on the page go to a separate `emails_off_domain` list); optional DNS MX validation
Phones	Valid numbers in E.164 format with ISO country code
WhatsApp	Click-to-chat links and numbers
Socials	LinkedIn (company + people), Twitter/X, Instagram, Facebook, YouTube, TikTok — normalized, tracking params stripped, attributed to the target (profiles belonging to others, e.g. a creator featured on the site, go to `socials.other_profiles`)
Address	Validated postal address from JSON-LD `PostalAddress` or footer heuristics — marketing copy and bare fragments are rejected, with sub-fields (street/city/region/postal/country) split out when available
Contact form	Whether a real contact form exists and the best contact-form page URL
Tech stack	CMS, ecommerce platform, primary framework, analytics, chat, marketing automation, CDN, server; `runtime_signals` flags whether GTM-injected tags were resolved
Chatbot	Whether a chatbot is present, the vendor, and how it was detected
Ad pixels	Meta, Google Ads, GTM, LinkedIn, TikTok, X, Pinterest, Reddit, Bing, and more — including runtime/GTM-injected pixels
Live ads	Optional best-effort check of Meta Ad Library and Google Ads Transparency
Site signals	HTTPS, SSL validity, mobile-friendly hint, copyright year, last content date, `tech_freshness` (modern / dated / legacy / unknown)
Lead score	0–100 score with the reasons that contributed
Status & diagnostics	A top-level `status` (`ok` / `partial` / `blocked` / `failed`), plus `crawl` metadata: pages crawled, fetch method, detected bot/WAF vendor, and a classified `failure_reason` (dns / tls / connection / timeout) when a site can't be fetched
Screenshot	Optional homepage screenshot URL

How to use the Website Contact & Tech Stack Scraper

Add your websites. Paste plain URLs into the Websites field (one per line) and/or add them under Start URLs. The scraper deduplicates by registrable domain automatically.
(Optional) tune the crawl. Set maxPagesPerSite, maxDepth, and renderJavaScript (auto is recommended). Enable validateEmailMx, checkLiveAds, or captureScreenshot if you need them.
Run the actor. Results are pushed to the dataset — one record per domain — which you can export to JSON, CSV, Excel, or pull via the API.

Example input

{
  "websites": ["https://example.com", "https://acme.co"],
  "maxPagesPerSite": 3,
  "renderJavaScript": "auto",
  "prioritizeContactPages": true,
  "validateEmailMx": false,
  "checkLiveAds": false,
  "captureScreenshot": false
}

Key input options

Field	Default	What it does
`startUrls` / `websites`	—	The websites to scrape (provide at least one)
`maxPagesPerSite`	`3`	Maximum pages crawled per domain. Default of 3 = homepage + 2 ranked contact pages (where leads live). Raise for broader coverage.
`maxDepth`	`2`	Link depth from the start URL
`prioritizeContactPages`	`true`	Crawl contact / about / team pages first
`renderJavaScript`	`auto`	`auto` renders only JS-heavy pages; `always` / `never` force it
`checkLiveAds`	`false`	Best-effort Meta / Google live-ad check
`captureScreenshot`	`false`	Save a homepage screenshot
`validateEmailMx`	`false`	DNS MX lookup (never SMTP)
`bypassProtection`	`false`	When a site is behind a beatable Cloudflare challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost; interactive CAPTCHAs are never attempted
`respectRobotsTxt`	`true`	Honor robots.txt
`proxyConfiguration`	Datacenter	Proxy settings; switch to Residential only for sites that block datacenter IPs
`scoringWeights`	defaults	Override the lead-score weights

Example output

{
  "input_url": "https://example.com",
  "final_url": "https://example.com/",
  "domain": "example.com",
  "status": "ok",
  "crawl": {
    "pages_crawled": 7, "fetch_method": "http", "block_vendor": null,
    "failure_reason": null, "errors": []
  },
  "company": { "name": "Example Inc", "description": "…", "logo_url": "…" },
  "emails": [
    { "value": "hello@example.com", "type": "role", "obfuscated": false,
      "off_domain": false, "source_page": "https://example.com/contact" }
  ],
  "emails_off_domain": [
    { "value": "support@somepartner.com", "type": "role", "off_domain": true,
      "source_page": "https://example.com/integrations" }
  ],
  "phones": [{ "raw": "+1 555 0100", "e164": "+15550100", "country": "US" }],
  "socials": {
    "linkedin_company": "…", "twitter": "…",
    "other_profiles": [{ "platform": "youtube", "url": "https://youtube.com/c/someCreator" }]
  },
  "tech": {
    "cms": "WordPress", "analytics": ["Google Analytics 4"],
    "chat": ["Intercom"], "runtime_signals": "resolved"
  },
  "ads": { "pixels": ["meta", "gtm"], "running_ads": { "checked": false } },
  "site_signals": { "https": true, "tech_freshness": "modern" },
  "lead_score": { "score": 78, "reasons": ["role-based email", "advertising pixels: meta, gtm", "live chat: Intercom"] }
}

Use it as an MCP server for AI agents

This actor is also a Model Context Protocol (MCP) server, so an AI client — Claude Desktop, Cursor, an agent framework, or your own app — can call its scraping tools directly and enrich a company from a URL in real time.

Route 1 — Apify's hosted MCP server (no setup)

Every Apify Actor is callable through Apify's hosted MCP endpoint. Point your MCP client at it, authenticate with your own Apify API token, and scope it to this actor. Nothing to deploy.

Route 2 — this actor's dedicated MCP endpoint (Standby)

In Standby mode the actor serves MCP over Streamable HTTP at a stable URL:

https://USERNAME--website-intelligence-scraper.apify.actor/mcp

Configure your MCP client with that URL and an

Authorization: Bearer <APIFY_TOKEN>

header. The Apify platform validates the token for you.

MCP tools

Tool	Description
`scrape_website(url, options?)`	Full pipeline on one domain; returns the complete record. Synchronous and fast.
`extract_contacts(url)`	Only emails, phones, WhatsApp, socials, and contact form.
`check_tech_and_ads(url)`	Only tech stack, chatbot, and ad pixels.
`scrape_websites(urls[], options?)`	Asynchronous batch — starts a run and returns `runId` + `datasetId`.

All tools return structured JSON and never throw across the transport.

Pricing

Billing is Pay-Per-Event — you only pay for results that carry real data:

Event	Price	When it's charged
Domain result	$0.04	Once per domain, only when the record is populated (≥1 contact, or ≥1 detected tech/ad/chat signal). Parked, blocked, or empty domains are never charged.
Bot-protection bypass	+$0.06	Only when `bypassProtection` is enabled and a stealth browser over residential proxy actually ran to clear a Cloudflare-style challenge. Covers the residential bandwidth and extra renders. Not charged when no bypass was needed.
Live ad check	+$0.02	Only when `checkLiveAds` is enabled and the check runs.
MCP tool call	$0.04	Once per MCP tool invocation in Standby mode.

The actor defaults to datacenter proxy to keep costs low; the bypassProtection option upgrades to a stealth browser over residential proxy for sites behind bot protection, and is the only path that triggers the bypass surcharge above. MCP Standby mode adds standby compute (~$0.40 per GB-hour while awake; it idles down when not in use).

Is web scraping legal?

Scraping publicly available data is generally legal in most jurisdictions. This actor only collects public data that any visitor can see, honors the respectRobotsTxt toggle, never performs SMTP email verification, and never crawls off the target domain. You are responsible for how you use the data, including compliance with GDPR, CCPA, and each site's terms where applicable.

Frequently asked questions

How do I scrape emails from a list of websites?

Paste your URLs into the Websites field and run the actor. Each domain returns an emails array with addresses found in mailto: links, visible text, JSON-LD, and embedded script data, including de-obfuscated Cloudflare and HTML-entity emails. Enable validateEmailMx to confirm each email domain has valid MX records (DNS only — no SMTP).

Does it separate personal emails from generic inboxes?

Yes. Every email is classified as role (shared inboxes like info@, sales@, support@, noreply@, and system mailboxes) or personal (an individual's address). This lets you target real people and skip generic catch-alls, or do the reverse, depending on your outreach.

Does it detect Facebook and Google ad pixels?

Yes. It detects ad pixels for Meta, Google Ads, Google Tag Manager, TikTok, LinkedIn, Pinterest, Reddit, Bing, and others — including pixels injected at runtime through Google Tag Manager, which static-only scrapers miss. Enable checkLiveAds for a best-effort check of whether the advertiser has active ads.

Can it tell which CMS or technology a website uses?

Yes. The tech block reports the CMS (WordPress, Shopify, Wix, Squarespace, Webflow, and more), ecommerce platform, JavaScript frameworks, analytics, chat, marketing automation, CDN, and server, using a data-driven fingerprint engine.

How does the lead score work?

Each domain gets a 0–100 lead_score computed from weighted signals — reachable contacts, ad pixels, marketing automation, chat, a modern tech stack, and social presence — with a reasons list explaining the score. You can override the weights with scoringWeights.

What happens when a website is down or blocks the scraper?

Each site is isolated. A failure never kills the run: the actor records the problem in that record's crawl.errors and still returns a partial record with the stable schema. Every record carries a top-level status (ok, partial, blocked, or failed); when a site can't be fetched, crawl.failure_reason classifies why (DNS, TLS, connection, timeout) and crawl.block_vendor names the bot/WAF vendor if one blocked the crawl — so you can tell a dead domain from one that's merely protected. Blocked, parked, or empty domains are not billed.

Can AI agents use this scraper?

Yes. The actor runs as an MCP (Model Context Protocol) server, so AI agents and clients like Claude Desktop and Cursor can call its tools to enrich a company from a URL live, without building a data pipeline.

How many websites can I scrape at once?

There is no hard limit — provide one URL or thousands. The actor deduplicates by domain, crawls with configurable concurrency, and pushes one record per domain to the dataset as it goes.

Use this actor to power lead lists, CRM enrichment, competitive analysis, market research, and ad-targeting audits. Export results as JSON, CSV, or Excel, or integrate via the Apify API, webhooks, and scheduling. For live, on-demand enrichment inside an AI agent, connect through the MCP server described above.

Website Email & Phone Scraper — Contact Finder

intelscrape/contact-info-scraper

Extract emails, phones, social links from any website. Deep scans contact pages. Detects tech stack and ad pixels. CSV export.

IntelScrape

Website Tech & Contact Audit — Tech Stack + Emails API

nexgendata/local-business-tech-contact-audit

Audit any website in one call: detect the tech stack (CMS, frameworks, analytics, hosting) and extract contact info (emails, phones, social profiles). Lead-qualification for agencies and sales.

NexGenData

Lead List Enricher — Emails, Phones & Tech from a Domain API

nexgendata/lead-list-enricher

Enrich your lead lists with contact data from any website. Upload domains or company URLs and get emails, phone numbers, social media profiles, and tech stack. Free Clearbit & ZoomInfo alternative. Bulk domain enrichment for sales teams.

NexGenData

Website Tech Stack Detector

rationalistic_counsel/website-tech-stack-detector

J N

Shopify Scraper - Extract Products, Emails & Tech Stack

leado/shopify-store-scraper

Scrape any Shopify store to extract products, prices, emails, phone numbers, and detect tech stack. Perfect for lead generation and competitor analysis.

Leado

Website Tech Stack Detector

antishock/tech-stack-detector

Detect tech stack of any website - frameworks, CDNs, analytics, CMS, server.

Ryan Zinburg

Website Intelligence

automationagents/website-intelligence

Analyze any website for SEO, tech stack, social links, emails, metadata, and tracking signals.

Alex Jordan

Website Contact Tech Stack Extractor

happitap/website-contact-tech-stack-extractor

A powerful Apify actor that extracts contact information, social media links, and technology stack details from websites. Perfect for lead generation, competitor research, and market intelligence.

HappiTap

5.0

Website Tech Stack & Lead Intelligence Scraper

scrapesage/website-tech-stack-scraper

Detect any website's full technology stack — CMS, e-commerce, analytics, ad pixels, CRM, payments, CDN & 200+ more — plus B2B lead intelligence: contacts, firmographics, email host (MX), TLS and a technographic lead score. Filter by technology and monitor stack changes. No login.