Website Contact & Tech Stack Scraper
Pricing
from $40.00 / 1,000 domain results
Website Contact & Tech Stack Scraper
Scrape any website for emails, phone numbers, social profiles, tech stack, ad pixels, chatbot, and a lead score. One clean record per domain. Bulk lead generation + MCP server for AI agents.
Pricing
from $40.00 / 1,000 domain results
Rating
0.0
(0)
Developer
Mayowa Ogedengbe
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Website Contact & Tech Stack Scraper extracts emails, phone numbers, social media profiles, technology stack, advertising pixels, chatbots, and a lead score from any list of websites — returning one clean, structured record per domain. It is a bulk website contact scraper and B2B lead generation tool built for sales teams, marketing and web design agencies, and AI agents that need to enrich a company from just its URL.
Give it one website or thousands. For every domain you get reachable contacts (emails, phones, WhatsApp, contact forms), the full tech stack (CMS, ecommerce, analytics, chat, marketing automation), which ad pixels and chatbots are actually running, business signals, and a 0–100 lead score with reasons. The output schema is stable and resale-ready: every field is always present, so you can drop the dataset straight into a CRM, spreadsheet, or enrichment pipeline.
What does the Website Contact & Tech Stack Scraper do?
This website scraper crawls each domain you give it, prioritizes the pages that carry contact and company information (contact, about, team, footer), and extracts a single rich JSON record per domain. It tries cheap HTTP requests first and only launches a real browser when a page needs JavaScript, so it stays fast and cost-effective at scale.
In one run you can answer questions like:
- Who can I contact at this company, and how? — emails (including obfuscated ones), phone numbers in E.164 format, WhatsApp links, social profiles, and the best contact-form page.
- What technology does this site run on? — CMS, ecommerce platform, JavaScript frameworks, analytics, CDN, and server.
- Are they spending on ads? — Meta Pixel, Google Ads, TikTok, LinkedIn, Pinterest, Reddit and other pixels, including ones injected at runtime through Google Tag Manager that static scrapers miss.
- Do they use a chatbot, and which vendor? — Intercom, Drift, Zendesk, Tidio, Crisp, HubSpot Chat, and more.
- Is this a good lead? — a 0–100 lead score with human-readable reasons.
Who is this website scraper for?
This tool maps a single, high-value output to each type of buyer:
- Sales & lead generation teams (SDRs): stop wasting hours finding and qualifying contacts. Get emails, phone numbers, and a lead score per domain, with role-vs-personal email classification so you can target the right inbox.
- Ad & marketing agencies: find businesses that are (or are not) already running ads. The ad-pixel and live-ad signals tell you who has budget and who to pitch.
- Web design & development agencies: build a pipeline of outdated sites to
pitch for a redesign using
tech_freshness, legacy-stack detection, and missing-SSL flags. - Chatbot & AI support vendors: find which businesses across a market lack a chatbot — or already run a competitor's — using chatbot presence and vendor detection.
- AI agent developers: enrich any company from a URL live, with no pipeline to build, by calling the actor as a Model Context Protocol (MCP) server.
What data can this website scraper extract?
Every domain produces one JSON record with a stable schema (missing scalars are
null, missing lists are [], keys are never omitted):
| Field group | What you get |
|---|---|
| Company | Name, description, logo URL, industry guess |
| Emails | Addresses from mailto:, visible text, JSON-LD, and <script> data; decodes Cloudflare and entity obfuscation; classified role vs personal; placeholder/example addresses filtered; scoped to the target's own domain (third-party addresses found on the page go to a separate emails_off_domain list); optional DNS MX validation |
| Phones | Valid numbers in E.164 format with ISO country code |
| Click-to-chat links and numbers | |
| Socials | LinkedIn (company + people), Twitter/X, Instagram, Facebook, YouTube, TikTok — normalized, tracking params stripped, attributed to the target (profiles belonging to others, e.g. a creator featured on the site, go to socials.other_profiles) |
| Address | Validated postal address from JSON-LD PostalAddress or footer heuristics — marketing copy and bare fragments are rejected, with sub-fields (street/city/region/postal/country) split out when available |
| Contact form | Whether a real contact form exists and the best contact-form page URL |
| Tech stack | CMS, ecommerce platform, primary framework, analytics, chat, marketing automation, CDN, server; runtime_signals flags whether GTM-injected tags were resolved |
| Chatbot | Whether a chatbot is present, the vendor, and how it was detected |
| Ad pixels | Meta, Google Ads, GTM, LinkedIn, TikTok, X, Pinterest, Reddit, Bing, and more — including runtime/GTM-injected pixels |
| Live ads | Optional best-effort check of Meta Ad Library and Google Ads Transparency |
| Site signals | HTTPS, SSL validity, mobile-friendly hint, copyright year, last content date, tech_freshness (modern / dated / legacy / unknown) |
| Lead score | 0–100 score with the reasons that contributed |
| Status & diagnostics | A top-level status (ok / partial / blocked / failed), plus crawl metadata: pages crawled, fetch method, detected bot/WAF vendor, and a classified failure_reason (dns / tls / connection / timeout) when a site can't be fetched |
| Screenshot | Optional homepage screenshot URL |
How to use the Website Contact & Tech Stack Scraper
- Add your websites. Paste plain URLs into the Websites field (one per line) and/or add them under Start URLs. The scraper deduplicates by registrable domain automatically.
- (Optional) tune the crawl. Set
maxPagesPerSite,maxDepth, andrenderJavaScript(autois recommended). EnablevalidateEmailMx,checkLiveAds, orcaptureScreenshotif you need them. - Run the actor. Results are pushed to the dataset — one record per domain — which you can export to JSON, CSV, Excel, or pull via the API.
Example input
{"websites": ["https://example.com", "https://acme.co"],"maxPagesPerSite": 3,"renderJavaScript": "auto","prioritizeContactPages": true,"validateEmailMx": false,"checkLiveAds": false,"captureScreenshot": false}
Key input options
| Field | Default | What it does |
|---|---|---|
startUrls / websites | — | The websites to scrape (provide at least one) |
maxPagesPerSite | 3 | Maximum pages crawled per domain. Default of 3 = homepage + 2 ranked contact pages (where leads live). Raise for broader coverage. |
maxDepth | 2 | Link depth from the start URL |
prioritizeContactPages | true | Crawl contact / about / team pages first |
renderJavaScript | auto | auto renders only JS-heavy pages; always / never force it |
checkLiveAds | false | Best-effort Meta / Google live-ad check |
captureScreenshot | false | Save a homepage screenshot |
validateEmailMx | false | DNS MX lookup (never SMTP) |
bypassProtection | false | When a site is behind a beatable Cloudflare challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost; interactive CAPTCHAs are never attempted |
respectRobotsTxt | true | Honor robots.txt |
proxyConfiguration | Datacenter | Proxy settings; switch to Residential only for sites that block datacenter IPs |
scoringWeights | defaults | Override the lead-score weights |
Example output
{"input_url": "https://example.com","final_url": "https://example.com/","domain": "example.com","status": "ok","crawl": {"pages_crawled": 7, "fetch_method": "http", "block_vendor": null,"failure_reason": null, "errors": []},"company": { "name": "Example Inc", "description": "…", "logo_url": "…" },"emails": [{ "value": "hello@example.com", "type": "role", "obfuscated": false,"off_domain": false, "source_page": "https://example.com/contact" }],"emails_off_domain": [{ "value": "support@somepartner.com", "type": "role", "off_domain": true,"source_page": "https://example.com/integrations" }],"phones": [{ "raw": "+1 555 0100", "e164": "+15550100", "country": "US" }],"socials": {"linkedin_company": "…", "twitter": "…","other_profiles": [{ "platform": "youtube", "url": "https://youtube.com/c/someCreator" }]},"tech": {"cms": "WordPress", "analytics": ["Google Analytics 4"],"chat": ["Intercom"], "runtime_signals": "resolved"},"ads": { "pixels": ["meta", "gtm"], "running_ads": { "checked": false } },"site_signals": { "https": true, "tech_freshness": "modern" },"lead_score": { "score": 78, "reasons": ["role-based email", "advertising pixels: meta, gtm", "live chat: Intercom"] }}
Use it as an MCP server for AI agents
This actor is also a Model Context Protocol (MCP) server, so an AI client — Claude Desktop, Cursor, an agent framework, or your own app — can call its scraping tools directly and enrich a company from a URL in real time.
Route 1 — Apify's hosted MCP server (no setup)
Every Apify Actor is callable through Apify's hosted MCP endpoint. Point your MCP client at it, authenticate with your own Apify API token, and scope it to this actor. Nothing to deploy.
Route 2 — this actor's dedicated MCP endpoint (Standby)
In Standby mode the actor serves MCP over Streamable HTTP at a stable URL:
https://USERNAME--website-intelligence-scraper.apify.actor/mcp
Configure your MCP client with that URL and an
Authorization: Bearer <APIFY_TOKEN>MCP tools
| Tool | Description |
|---|---|
scrape_website(url, options?) | Full pipeline on one domain; returns the complete record. Synchronous and fast. |
extract_contacts(url) | Only emails, phones, WhatsApp, socials, and contact form. |
check_tech_and_ads(url) | Only tech stack, chatbot, and ad pixels. |
scrape_websites(urls[], options?) | Asynchronous batch — starts a run and returns runId + datasetId. |
All tools return structured JSON and never throw across the transport.
Pricing
Billing is Pay-Per-Event — you only pay for results that carry real data:
| Event | Price | When it's charged |
|---|---|---|
| Domain result | $0.04 | Once per domain, only when the record is populated (≥1 contact, or ≥1 detected tech/ad/chat signal). Parked, blocked, or empty domains are never charged. |
| Bot-protection bypass | +$0.06 | Only when bypassProtection is enabled and a stealth browser over residential proxy actually ran to clear a Cloudflare-style challenge. Covers the residential bandwidth and extra renders. Not charged when no bypass was needed. |
| Live ad check | +$0.02 | Only when checkLiveAds is enabled and the check runs. |
| MCP tool call | $0.04 | Once per MCP tool invocation in Standby mode. |
The actor defaults to datacenter proxy to keep costs low; the
bypassProtection option upgrades to a stealth browser over residential proxy
for sites behind bot protection, and is the only path that triggers the bypass
surcharge above. MCP Standby mode adds standby compute (~$0.40 per GB-hour while
awake; it idles down when not in use).
Is web scraping legal?
Scraping publicly available data is generally legal in most jurisdictions. This
actor only collects public data that any visitor can see, honors the
respectRobotsTxt toggle, never performs SMTP email verification, and never
crawls off the target domain. You are responsible for how you use the data,
including compliance with GDPR, CCPA, and each site's terms where applicable.
Frequently asked questions
How do I scrape emails from a list of websites?
Paste your URLs into the Websites field and run the actor. Each domain
returns an emails array with addresses found in mailto: links, visible text,
JSON-LD, and embedded script data, including de-obfuscated Cloudflare and
HTML-entity emails. Enable validateEmailMx to confirm each email domain has
valid MX records (DNS only — no SMTP).
Does it separate personal emails from generic inboxes?
Yes. Every email is classified as role (shared inboxes like info@, sales@,
support@, noreply@, and system mailboxes) or personal (an individual's
address). This lets you target real people and skip generic catch-alls, or do
the reverse, depending on your outreach.
Does it detect Facebook and Google ad pixels?
Yes. It detects ad pixels for Meta, Google Ads, Google Tag Manager, TikTok,
LinkedIn, Pinterest, Reddit, Bing, and others — including pixels injected at
runtime through Google Tag Manager, which static-only scrapers miss. Enable
checkLiveAds for a best-effort check of whether the advertiser has active ads.
Can it tell which CMS or technology a website uses?
Yes. The tech block reports the CMS (WordPress, Shopify, Wix, Squarespace,
Webflow, and more), ecommerce platform, JavaScript frameworks, analytics, chat,
marketing automation, CDN, and server, using a data-driven fingerprint engine.
How does the lead score work?
Each domain gets a 0–100 lead_score computed from weighted signals —
reachable contacts, ad pixels, marketing automation, chat, a modern tech stack,
and social presence — with a reasons list explaining the score. You can
override the weights with scoringWeights.
What happens when a website is down or blocks the scraper?
Each site is isolated. A failure never kills the run: the actor records the
problem in that record's crawl.errors and still returns a partial record with
the stable schema. Every record carries a top-level status (ok, partial,
blocked, or failed); when a site can't be fetched, crawl.failure_reason
classifies why (DNS, TLS, connection, timeout) and crawl.block_vendor names the
bot/WAF vendor if one blocked the crawl — so you can tell a dead domain from one
that's merely protected. Blocked, parked, or empty domains are not billed.
Can AI agents use this scraper?
Yes. The actor runs as an MCP (Model Context Protocol) server, so AI agents and clients like Claude Desktop and Cursor can call its tools to enrich a company from a URL live, without building a data pipeline.
How many websites can I scrape at once?
There is no hard limit — provide one URL or thousands. The actor deduplicates by domain, crawls with configurable concurrency, and pushes one record per domain to the dataset as it goes.
Related scrapers and next steps
Use this actor to power lead lists, CRM enrichment, competitive analysis, market research, and ad-targeting audits. Export results as JSON, CSV, or Excel, or integrate via the Apify API, webhooks, and scheduling. For live, on-demand enrichment inside an AI agent, connect through the MCP server described above.