# Website Contact & Tech Stack Scraper (`headply/website-intelligence-scraper`) Actor

Scrape any website for emails, phone numbers, social profiles, tech stack, ad pixels, chatbot, and a lead score. One clean record per domain. Bulk lead generation + MCP server for AI agents.

- **URL**: https://apify.com/headply/website-intelligence-scraper.md
- **Developed by:** [Mayowa Ogedengbe](https://apify.com/headply) (community)
- **Categories:** Lead generation, AI
- **Stats:** 1 total users, 1 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $40.00 / 1,000 domain results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

**Website Contact & Tech Stack Scraper** extracts emails, phone numbers, social
media profiles, technology stack, advertising pixels, chatbots, and a lead score
from any list of websites — returning one clean, structured record per domain.
It is a bulk **website contact scraper** and **B2B lead generation** tool built
for sales teams, marketing and web design agencies, and AI agents that need to
enrich a company from just its URL.

Give it one website or thousands. For every domain you get reachable contacts
(emails, phones, WhatsApp, contact forms), the full tech stack (CMS, ecommerce,
analytics, chat, marketing automation), which ad pixels and chatbots are
actually running, business signals, and a 0–100 lead score with reasons. The
output schema is stable and resale-ready: every field is always present, so you
can drop the dataset straight into a CRM, spreadsheet, or enrichment pipeline.

### What does the Website Contact & Tech Stack Scraper do?

This **website scraper** crawls each domain you give it, prioritizes the pages
that carry contact and company information (contact, about, team, footer), and
extracts a single rich JSON record per domain. It tries cheap HTTP requests
first and only launches a real browser when a page needs JavaScript, so it stays
fast and cost-effective at scale.

In one run you can answer questions like:

- **Who can I contact at this company, and how?** — emails (including obfuscated
  ones), phone numbers in E.164 format, WhatsApp links, social profiles, and the
  best contact-form page.
- **What technology does this site run on?** — CMS, ecommerce platform,
  JavaScript frameworks, analytics, CDN, and server.
- **Are they spending on ads?** — Meta Pixel, Google Ads, TikTok, LinkedIn,
  Pinterest, Reddit and other pixels, including ones injected at runtime through
  Google Tag Manager that static scrapers miss.
- **Do they use a chatbot, and which vendor?** — Intercom, Drift, Zendesk,
  Tidio, Crisp, HubSpot Chat, and more.
- **Is this a good lead?** — a 0–100 lead score with human-readable reasons.

### Who is this website scraper for?

This tool maps a single, high-value output to each type of buyer:

- **Sales & lead generation teams (SDRs):** stop wasting hours finding and
  qualifying contacts. Get emails, phone numbers, and a lead score per domain,
  with role-vs-personal email classification so you can target the right inbox.
- **Ad & marketing agencies:** find businesses that are (or are not) already
  running ads. The ad-pixel and live-ad signals tell you who has budget and who
  to pitch.
- **Web design & development agencies:** build a pipeline of outdated sites to
  pitch for a redesign using `tech_freshness`, legacy-stack detection, and
  missing-SSL flags.
- **Chatbot & AI support vendors:** find which businesses across a market lack a
  chatbot — or already run a competitor's — using chatbot presence and vendor
  detection.
- **AI agent developers:** enrich any company from a URL live, with no pipeline
  to build, by calling the actor as a **Model Context Protocol (MCP) server**.

### What data can this website scraper extract?

Every domain produces one JSON record with a stable schema (missing scalars are
`null`, missing lists are `[]`, keys are never omitted):

| Field group | What you get |
|---|---|
| **Company** | Name, description, logo URL, industry guess |
| **Emails** | Addresses from `mailto:`, visible text, JSON-LD, and `<script>` data; decodes Cloudflare and entity obfuscation; classified role vs personal; placeholder/example addresses filtered; **scoped to the target's own domain** (third-party addresses found on the page go to a separate `emails_off_domain` list); optional DNS MX validation |
| **Phones** | Valid numbers in E.164 format with ISO country code |
| **WhatsApp** | Click-to-chat links and numbers |
| **Socials** | LinkedIn (company + people), Twitter/X, Instagram, Facebook, YouTube, TikTok — normalized, tracking params stripped, **attributed to the target** (profiles belonging to others, e.g. a creator featured on the site, go to `socials.other_profiles`) |
| **Address** | Validated postal address from JSON-LD `PostalAddress` or footer heuristics — marketing copy and bare fragments are rejected, with sub-fields (street/city/region/postal/country) split out when available |
| **Contact form** | Whether a real contact form exists and the best contact-form page URL |
| **Tech stack** | CMS, ecommerce platform, primary framework, analytics, chat, marketing automation, CDN, server; `runtime_signals` flags whether GTM-injected tags were resolved |
| **Chatbot** | Whether a chatbot is present, the vendor, and how it was detected |
| **Ad pixels** | Meta, Google Ads, GTM, LinkedIn, TikTok, X, Pinterest, Reddit, Bing, and more — including runtime/GTM-injected pixels |
| **Live ads** | Optional best-effort check of Meta Ad Library and Google Ads Transparency |
| **Site signals** | HTTPS, SSL validity, mobile-friendly hint, copyright year, last content date, `tech_freshness` (modern / dated / legacy / unknown) |
| **Lead score** | 0–100 score with the reasons that contributed |
| **Status & diagnostics** | A top-level `status` (`ok` / `partial` / `blocked` / `failed`), plus `crawl` metadata: pages crawled, fetch method, detected bot/WAF vendor, and a classified `failure_reason` (dns / tls / connection / timeout) when a site can't be fetched |
| **Screenshot** | Optional homepage screenshot URL |

### How to use the Website Contact & Tech Stack Scraper

1. **Add your websites.** Paste plain URLs into the **Websites** field (one per
   line) and/or add them under **Start URLs**. The scraper deduplicates by
   registrable domain automatically.
2. **(Optional) tune the crawl.** Set `maxPagesPerSite`, `maxDepth`, and
   `renderJavaScript` (`auto` is recommended). Enable `validateEmailMx`,
   `checkLiveAds`, or `captureScreenshot` if you need them.
3. **Run the actor.** Results are pushed to the dataset — one record per
   domain — which you can export to JSON, CSV, Excel, or pull via the API.

#### Example input

```jsonc
{
  "websites": ["https://example.com", "https://acme.co"],
  "maxPagesPerSite": 3,
  "renderJavaScript": "auto",
  "prioritizeContactPages": true,
  "validateEmailMx": false,
  "checkLiveAds": false,
  "captureScreenshot": false
}
````

#### Key input options

| Field | Default | What it does |
|---|---|---|
| `startUrls` / `websites` | — | The websites to scrape (provide at least one) |
| `maxPagesPerSite` | `3` | Maximum pages crawled per domain. Default of 3 = homepage + 2 ranked contact pages (where leads live). Raise for broader coverage. |
| `maxDepth` | `2` | Link depth from the start URL |
| `prioritizeContactPages` | `true` | Crawl contact / about / team pages first |
| `renderJavaScript` | `auto` | `auto` renders only JS-heavy pages; `always` / `never` force it |
| `checkLiveAds` | `false` | Best-effort Meta / Google live-ad check |
| `captureScreenshot` | `false` | Save a homepage screenshot |
| `validateEmailMx` | `false` | DNS MX lookup (never SMTP) |
| `bypassProtection` | `false` | When a site is behind a beatable Cloudflare challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost; interactive CAPTCHAs are never attempted |
| `respectRobotsTxt` | `true` | Honor robots.txt |
| `proxyConfiguration` | Datacenter | Proxy settings; switch to Residential only for sites that block datacenter IPs |
| `scoringWeights` | defaults | Override the lead-score weights |

### Example output

```jsonc
{
  "input_url": "https://example.com",
  "final_url": "https://example.com/",
  "domain": "example.com",
  "status": "ok",
  "crawl": {
    "pages_crawled": 7, "fetch_method": "http", "block_vendor": null,
    "failure_reason": null, "errors": []
  },
  "company": { "name": "Example Inc", "description": "…", "logo_url": "…" },
  "emails": [
    { "value": "hello@example.com", "type": "role", "obfuscated": false,
      "off_domain": false, "source_page": "https://example.com/contact" }
  ],
  "emails_off_domain": [
    { "value": "support@somepartner.com", "type": "role", "off_domain": true,
      "source_page": "https://example.com/integrations" }
  ],
  "phones": [{ "raw": "+1 555 0100", "e164": "+15550100", "country": "US" }],
  "socials": {
    "linkedin_company": "…", "twitter": "…",
    "other_profiles": [{ "platform": "youtube", "url": "https://youtube.com/c/someCreator" }]
  },
  "tech": {
    "cms": "WordPress", "analytics": ["Google Analytics 4"],
    "chat": ["Intercom"], "runtime_signals": "resolved"
  },
  "ads": { "pixels": ["meta", "gtm"], "running_ads": { "checked": false } },
  "site_signals": { "https": true, "tech_freshness": "modern" },
  "lead_score": { "score": 78, "reasons": ["role-based email", "advertising pixels: meta, gtm", "live chat: Intercom"] }
}
```

### Use it as an MCP server for AI agents

This actor is also a **Model Context Protocol (MCP) server**, so an AI client —
Claude Desktop, Cursor, an agent framework, or your own app — can call its
scraping tools directly and enrich a company from a URL in real time.

#### Route 1 — Apify's hosted MCP server (no setup)

Every Apify Actor is callable through Apify's hosted MCP endpoint. Point your
MCP client at it, authenticate with your own Apify API token, and scope it to
this actor. Nothing to deploy.

#### Route 2 — this actor's dedicated MCP endpoint (Standby)

In Standby mode the actor serves MCP over Streamable HTTP at a stable URL:

```
https://USERNAME--website-intelligence-scraper.apify.actor/mcp
```

Configure your MCP client with that URL and an `Authorization: Bearer <APIFY_TOKEN>` header. The Apify platform validates the token for you.

#### MCP tools

| Tool | Description |
|---|---|
| `scrape_website(url, options?)` | Full pipeline on one domain; returns the complete record. Synchronous and fast. |
| `extract_contacts(url)` | Only emails, phones, WhatsApp, socials, and contact form. |
| `check_tech_and_ads(url)` | Only tech stack, chatbot, and ad pixels. |
| `scrape_websites(urls[], options?)` | Asynchronous batch — starts a run and returns `runId` + `datasetId`. |

All tools return structured JSON and never throw across the transport.

### Pricing

Billing is **Pay-Per-Event** — you only pay for results that carry real data:

| Event | Price | When it's charged |
|---|---|---|
| **Domain result** | **$0.04** | Once per domain, only when the record is populated (≥1 contact, or ≥1 detected tech/ad/chat signal). Parked, blocked, or empty domains are **never charged**. |
| **Bot-protection bypass** | **+$0.06** | Only when `bypassProtection` is enabled *and* a stealth browser over residential proxy actually ran to clear a Cloudflare-style challenge. Covers the residential bandwidth and extra renders. Not charged when no bypass was needed. |
| **Live ad check** | **+$0.02** | Only when `checkLiveAds` is enabled and the check runs. |
| **MCP tool call** | **$0.04** | Once per MCP tool invocation in Standby mode. |

The actor defaults to **datacenter proxy** to keep costs low; the
`bypassProtection` option upgrades to a stealth browser over residential proxy
for sites behind bot protection, and is the only path that triggers the bypass
surcharge above. MCP Standby mode adds standby compute (~$0.40 per GB-hour while
awake; it idles down when not in use).

### Is web scraping legal?

Scraping publicly available data is generally legal in most jurisdictions. This
actor only collects **public data** that any visitor can see, honors the
`respectRobotsTxt` toggle, never performs SMTP email verification, and never
crawls off the target domain. You are responsible for how you use the data,
including compliance with GDPR, CCPA, and each site's terms where applicable.

### Frequently asked questions

#### How do I scrape emails from a list of websites?

Paste your URLs into the **Websites** field and run the actor. Each domain
returns an `emails` array with addresses found in `mailto:` links, visible text,
JSON-LD, and embedded script data, including de-obfuscated Cloudflare and
HTML-entity emails. Enable `validateEmailMx` to confirm each email domain has
valid MX records (DNS only — no SMTP).

#### Does it separate personal emails from generic inboxes?

Yes. Every email is classified as `role` (shared inboxes like `info@`, `sales@`,
`support@`, `noreply@`, and system mailboxes) or `personal` (an individual's
address). This lets you target real people and skip generic catch-alls, or do
the reverse, depending on your outreach.

#### Does it detect Facebook and Google ad pixels?

Yes. It detects ad pixels for Meta, Google Ads, Google Tag Manager, TikTok,
LinkedIn, Pinterest, Reddit, Bing, and others — including pixels injected at
runtime through Google Tag Manager, which static-only scrapers miss. Enable
`checkLiveAds` for a best-effort check of whether the advertiser has active ads.

#### Can it tell which CMS or technology a website uses?

Yes. The `tech` block reports the CMS (WordPress, Shopify, Wix, Squarespace,
Webflow, and more), ecommerce platform, JavaScript frameworks, analytics, chat,
marketing automation, CDN, and server, using a data-driven fingerprint engine.

#### How does the lead score work?

Each domain gets a 0–100 `lead_score` computed from weighted signals —
reachable contacts, ad pixels, marketing automation, chat, a modern tech stack,
and social presence — with a `reasons` list explaining the score. You can
override the weights with `scoringWeights`.

#### What happens when a website is down or blocks the scraper?

Each site is isolated. A failure never kills the run: the actor records the
problem in that record's `crawl.errors` and still returns a partial record with
the stable schema. Every record carries a top-level `status` (`ok`, `partial`,
`blocked`, or `failed`); when a site can't be fetched, `crawl.failure_reason`
classifies why (DNS, TLS, connection, timeout) and `crawl.block_vendor` names the
bot/WAF vendor if one blocked the crawl — so you can tell a dead domain from one
that's merely protected. Blocked, parked, or empty domains are **not billed**.

#### Can AI agents use this scraper?

Yes. The actor runs as an MCP (Model Context Protocol) server, so AI agents and
clients like Claude Desktop and Cursor can call its tools to enrich a company
from a URL live, without building a data pipeline.

#### How many websites can I scrape at once?

There is no hard limit — provide one URL or thousands. The actor deduplicates by
domain, crawls with configurable concurrency, and pushes one record per domain
to the dataset as it goes.

### Related scrapers and next steps

Use this actor to power lead lists, CRM enrichment, competitive analysis, market
research, and ad-targeting audits. Export results as JSON, CSV, or Excel, or
integrate via the Apify API, webhooks, and scheduling. For live, on-demand
enrichment inside an AI agent, connect through the MCP server described above.

# Actor input Schema

## `startUrls` (type: `array`):

List of websites to crawl, as a standard Apify request list. Each entry is an object with a `url`. Merged with the `websites` field.

## `websites` (type: `array`):

Convenience list of plain website URLs, one per line. Merged with `startUrls`. At least one of `startUrls` or `websites` must be non-empty.

## `maxPagesPerSite` (type: `integer`):

Hard cap on the number of pages crawled per domain. The default of 3 fetches the homepage plus the two highest-ranked contact pages (contact / about / team) and skips product/category sprawl — that's where the leads live. Raise for broader coverage; lower (e.g. 1) for homepage-only fastest mode.

## `maxDepth` (type: `integer`):

Maximum link depth from the start URL (homepage is depth 0).

## `prioritizeContactPages` (type: `boolean`):

Enqueue likely contact/about/team pages and footer links before generic internal links.

## `renderJavaScript` (type: `string`):

When to use a real browser. `auto` escalates to a browser only when a page looks JS-rendered; `always` renders every page; `never` uses HTTP only.

## `checkLiveAds` (type: `boolean`):

Best-effort check of the Meta Ad Library and Google Ads Transparency Center for active ads. Slower and gated; may return null.

## `captureScreenshot` (type: `boolean`):

Save a homepage screenshot to the key-value store and return its URL. Forces a browser render of the homepage.

## `validateEmailMx` (type: `boolean`):

DNS MX lookup for discovered email domains. No SMTP verification is ever performed.

## `bypassProtection` (type: `boolean`):

When a site is blocked by Cloudflare's passive challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost — most sites don't need it. Interactive CAPTCHAs (PerimeterX, DataDome, Akamai) are never attempted (unbeatable); those fail fast regardless of this flag.

## `respectRobotsTxt` (type: `boolean`):

Honor each site's robots.txt rules while crawling.

## `proxyConfiguration` (type: `object`):

Apify proxy settings. Datacenter proxy (the default) is fast and cost-effective and works for most sites. Switch to Residential only for sites that block datacenter IPs — it is far more expensive per GB and will increase your per-domain cost.

## `maxConcurrency` (type: `integer`):

Maximum number of pages crawled in parallel across the run.

## `requestTimeoutSecs` (type: `integer`):

Per-request fetch/render budget in seconds. Higher values give slow or heavy JavaScript sites more time to render.

## `maxRequestRetries` (type: `integer`):

How many times to retry a failed request before giving up on that page.

## `scoringWeights` (type: `object`):

Optional override of the lead-score weights. Leave empty to use sensible defaults from scoring.py.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "websites": [
    "https://www.example.com"
  ],
  "maxPagesPerSite": 3,
  "maxDepth": 2,
  "prioritizeContactPages": true,
  "renderJavaScript": "auto",
  "checkLiveAds": false,
  "captureScreenshot": false,
  "validateEmailMx": false,
  "bypassProtection": false,
  "respectRobotsTxt": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "maxConcurrency": 10,
  "requestTimeoutSecs": 30,
  "maxRequestRetries": 1,
  "scoringWeights": {}
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com"
        }
    ],
    "websites": [
        "https://www.example.com"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("headply/website-intelligence-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://apify.com" }],
    "websites": ["https://www.example.com"],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("headply/website-intelligence-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "websites": [
    "https://www.example.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call headply/website-intelligence-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=headply/website-intelligence-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Website Contact & Tech Stack Scraper",
        "description": "Scrape any website for emails, phone numbers, social profiles, tech stack, ad pixels, chatbot, and a lead score. One clean record per domain. Bulk lead generation + MCP server for AI agents.",
        "version": "0.1",
        "x-build-id": "XrvjJCwWNVaISfonF"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/headply~website-intelligence-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-headply-website-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/headply~website-intelligence-scraper/runs": {
            "post": {
                "operationId": "runs-sync-headply-website-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/headply~website-intelligence-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-headply-website-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of websites to crawl, as a standard Apify request list. Each entry is an object with a `url`. Merged with the `websites` field.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "websites": {
                        "title": "Websites (plain URLs)",
                        "type": "array",
                        "description": "Convenience list of plain website URLs, one per line. Merged with `startUrls`. At least one of `startUrls` or `websites` must be non-empty.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPagesPerSite": {
                        "title": "Max pages per site",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Hard cap on the number of pages crawled per domain. The default of 3 fetches the homepage plus the two highest-ranked contact pages (contact / about / team) and skips product/category sprawl — that's where the leads live. Raise for broader coverage; lower (e.g. 1) for homepage-only fastest mode.",
                        "default": 3
                    },
                    "maxDepth": {
                        "title": "Max crawl depth",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum link depth from the start URL (homepage is depth 0).",
                        "default": 2
                    },
                    "prioritizeContactPages": {
                        "title": "Prioritize contact pages",
                        "type": "boolean",
                        "description": "Enqueue likely contact/about/team pages and footer links before generic internal links.",
                        "default": true
                    },
                    "renderJavaScript": {
                        "title": "Render JavaScript",
                        "enum": [
                            "auto",
                            "always",
                            "never"
                        ],
                        "type": "string",
                        "description": "When to use a real browser. `auto` escalates to a browser only when a page looks JS-rendered; `always` renders every page; `never` uses HTTP only.",
                        "default": "auto"
                    },
                    "checkLiveAds": {
                        "title": "Check live ads",
                        "type": "boolean",
                        "description": "Best-effort check of the Meta Ad Library and Google Ads Transparency Center for active ads. Slower and gated; may return null.",
                        "default": false
                    },
                    "captureScreenshot": {
                        "title": "Capture homepage screenshot",
                        "type": "boolean",
                        "description": "Save a homepage screenshot to the key-value store and return its URL. Forces a browser render of the homepage.",
                        "default": false
                    },
                    "validateEmailMx": {
                        "title": "Validate email MX records",
                        "type": "boolean",
                        "description": "DNS MX lookup for discovered email domains. No SMTP verification is ever performed.",
                        "default": false
                    },
                    "bypassProtection": {
                        "title": "Bypass bot protection (slower, costs residential proxy)",
                        "type": "boolean",
                        "description": "When a site is blocked by Cloudflare's passive challenge, retry with a stealth browser over residential proxy. Off by default for speed/cost — most sites don't need it. Interactive CAPTCHAs (PerimeterX, DataDome, Akamai) are never attempted (unbeatable); those fail fast regardless of this flag.",
                        "default": false
                    },
                    "respectRobotsTxt": {
                        "title": "Respect robots.txt",
                        "type": "boolean",
                        "description": "Honor each site's robots.txt rules while crawling.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings. Datacenter proxy (the default) is fast and cost-effective and works for most sites. Switch to Residential only for sites that block datacenter IPs — it is far more expensive per GB and will increase your per-domain cost.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of pages crawled in parallel across the run.",
                        "default": 10
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 5,
                        "maximum": 300,
                        "type": "integer",
                        "description": "Per-request fetch/render budget in seconds. Higher values give slow or heavy JavaScript sites more time to render.",
                        "default": 30
                    },
                    "maxRequestRetries": {
                        "title": "Max request retries",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "How many times to retry a failed request before giving up on that page.",
                        "default": 1
                    },
                    "scoringWeights": {
                        "title": "Lead scoring weights (advanced)",
                        "type": "object",
                        "description": "Optional override of the lead-score weights. Leave empty to use sensible defaults from scoring.py.",
                        "default": {}
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
