# Adaptive Website Lead Extractor (`solutionssmart/adaptive-website-lead-extractor`) Actor

Crawl public business websites with Scrapling to extract emails, phones, social profiles, contact pages, automation gaps, and lead scores for CRM-ready outreach.

- **URL**: https://apify.com/solutionssmart/adaptive-website-lead-extractor.md
- **Developed by:** [Solutions Smart](https://apify.com/solutionssmart) (community)
- **Categories:** Lead generation, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Adaptive Website Lead Extractor

Adaptive Website Lead Extractor is an Apify Actor that turns public business websites into structured lead intelligence records.

It crawls one or more websites, inspects a limited number of same-site pages, and returns one clean dataset item per input website with contact details, social profiles, contact-page signals, automation gaps, media assets when enabled, lead score, confidence, and crawl summary.

The Actor uses [Scrapling](https://github.com/D4Vinci/Scrapling) as the core scraping and parsing engine. Scrapling is used for fetching pages, parsing HTML, selector-based extraction, adaptive element lookup where useful, and optional stealth fetching for public pages that need browser-style rendering.

This is not a generic Scrapling wrapper. Scrapling is the engine; the product is lead intelligence for agencies, sales teams, CRM enrichment, and automation workflows.

### What It Extracts

- Company name estimate
- Page title and meta description
- Public emails and phone numbers
- Primary email and primary phone
- Contact page and about page URLs
- Social profile links
- Address-like text when confidently detected
- Contact form signals
- Booking, chat, WhatsApp, and contact automation signals
- Opportunity signals for missing contact automation
- Optional image, video, document, audio, archive, and embed URLs
- Lead score from `0` to `100`
- Extraction confidence from `0` to `1`
- Pages crawled, source pages, and non-fatal errors

### Best Use Cases

- Enrich company website lists with public contact data
- Find businesses with weak contact or booking infrastructure
- Build review queues for AI receptionist, local SEO, web design, or CRM automation outreach
- Send structured lead records into n8n, Make, Zapier, Google Sheets, Airtable, HubSpot, Pipedrive, or a custom CRM
- Discover public media and document URLs referenced by crawled pages when media mode is enabled

### Input

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPagesPerDomain": 20,
  "maxConcurrency": 5,
  "useStealth": false,
  "respectRobotsTxt": true,
  "extractEmails": true,
  "extractPhones": true,
  "extractSocialLinks": true,
  "extractContactPages": true,
  "extractAutomationSignals": true,
  "extractMedia": false,
  "extractImages": true,
  "extractVideos": true,
  "extractDocuments": true,
  "extractOtherMedia": true,
  "maxMediaPerDomain": 100,
  "crawlSameDomainOnly": true,
  "requestTimeoutSecs": 30,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
````

### Important Input Options

| Field | Default | Description |
| --- | --- | --- |
| `startUrls` | required | Websites or domains to crawl. |
| `maxPagesPerDomain` | `20` | Hard limit for pages crawled per input website. |
| `maxConcurrency` | `5` | Number of websites processed in parallel. |
| `useStealth` | `false` | Uses Scrapling's stealth browser fetcher. Slower, intended only for public pages that need browser rendering. |
| `respectRobotsTxt` | `true` | Skips URLs disallowed by robots.txt. |
| `extractAutomationSignals` | `true` | Detects public booking, form, chat, and WhatsApp signals. |
| `extractMedia` | `false` | Enables media URL discovery. Files are not downloaded. |
| `maxMediaPerDomain` | `100` | Maximum media asset URLs returned per website. |
| `crawlSameDomainOnly` | `true` | Stays on the same normalized host. `docs.example.com` does not crawl `blog.example.com`. |
| `proxyConfiguration` | disabled | Optional Apify Proxy configuration for public sites that rate-limit datacenter traffic. |

### Media Extraction

Media discovery is disabled by default because the Actor is primarily a lead intelligence tool.

Set `extractMedia` to `true` to collect public URLs for:

- images from `img`, `source`, `srcset`, Open Graph, Twitter image, icons, and CSS `url(...)`
- videos from video tags and public embeds such as YouTube, Vimeo, Wistia, Loom, and Vidyard
- documents such as PDF, DOCX, PPTX, XLSX, CSV, and TXT
- audio files, archives, and other recognized media file URLs

The Actor records media URLs only. It does not download, store, transform, or rehost media files.

### Output

The Actor pushes one item per input website to the default dataset and stores a run summary in the default Key-Value Store under `OUTPUT_SUMMARY`.

Example dataset item:

```json
{
  "startUrl": "https://example.com",
  "domain": "example.com",
  "siteHost": "example.com",
  "companyName": "Example GmbH",
  "title": "Example GmbH - Digital Services",
  "description": "Example company description...",
  "primaryEmail": "info@example.com",
  "primaryPhone": "+49 30 123456",
  "emails": ["info@example.com"],
  "phones": ["+49 30 123456"],
  "socialLinks": {
    "linkedin": "https://linkedin.com/company/example",
    "instagram": "https://instagram.com/example"
  },
  "mediaSummary": {
    "images": 12,
    "videos": 1,
    "documents": 2,
    "other": 0,
    "total": 15
  },
  "mediaAssets": {
    "images": [
      {
        "url": "https://example.com/assets/logo.png",
        "sourcePage": "https://example.com",
        "extension": "png"
      }
    ],
    "videos": [
      {
        "url": "https://www.youtube.com/embed/example",
        "sourcePage": "https://example.com"
      }
    ],
    "documents": [
      {
        "url": "https://example.com/company-brochure.pdf",
        "sourcePage": "https://example.com/about",
        "extension": "pdf"
      }
    ],
    "other": []
  },
  "contactPage": "https://example.com/contact",
  "aboutPage": "https://example.com/about",
  "addressLikeText": ["Example Street 12, 10115 Berlin"],
  "contactMethods": {
    "hasEmail": true,
    "hasPhone": true,
    "hasContactPage": true,
    "hasContactForm": true,
    "hasSocialProfile": true
  },
  "automationSignals": {
    "hasOnlineBooking": false,
    "hasChatWidget": false,
    "hasContactForm": true,
    "hasWhatsappLink": false
  },
  "opportunitySignals": {
    "missingOnlineBooking": true,
    "missingChatWidget": true,
    "missingWhatsappLink": true,
    "missingContactForm": false,
    "hasMessagingGap": true,
    "hasAutomationGap": true
  },
  "siteClassification": {
    "type": "business_website",
    "businessWebsiteLikely": true,
    "reason": "Business contact or outreach signals were detected on crawled pages."
  },
  "recommendedAction": "Prioritize outbound: public email found and automation gap detected.",
  "leadScore": 78,
  "leadScoreLabel": "high",
  "confidence": 0.84,
  "confidenceLabel": "high",
  "confidenceReasons": [
    "Crawled 12 public page(s).",
    "Public email address found.",
    "Public phone number found.",
    "Likely contact page found.",
    "Company identity inferred from page metadata, title, schema, logo, or domain."
  ],
  "pagesCrawled": 12,
  "errors": [],
  "sourcePages": [
    "https://example.com",
    "https://example.com/contact",
    "https://example.com/about"
  ],
  "crawlSummary": {
    "pagesCrawled": 12,
    "emailsFound": 1,
    "phonesFound": 1,
    "socialProfilesFound": 2,
    "mediaAssetsFound": 15,
    "contactPageFound": true,
    "aboutPageFound": true,
    "errorsFound": 0
  }
}
```

### Output Fields

| Field | Description |
| --- | --- |
| `domain` | Registered domain, for example `example.com`. |
| `siteHost` | Actual host crawled, for example `docs.example.com`. |
| `companyName` | Best-effort company name from title, metadata, schema, logo alt text, or domain. |
| `primaryEmail`, `primaryPhone` | First selected contact candidates for workflow-friendly use. |
| `emails`, `phones` | Deduplicated public contact data found on crawled pages. |
| `socialLinks` | Public social profile URLs grouped by platform. |
| `mediaSummary`, `mediaAssets` | Media counts and URLs when `extractMedia` is enabled. |
| `contactMethods` | Boolean summary of reachable contact methods. |
| `automationSignals` | Detected booking, chat, form, and WhatsApp signals. |
| `opportunitySignals` | Missing automation/contact signals useful for outreach review. |
| `siteClassification` | Best-effort site type classification: `business_website`, `documentation`, `blog`, `ecommerce`, or `unknown`. |
| `leadScore` | Transparent opportunity score from `0` to `100`. |
| `confidence`, `confidenceReasons` | Extraction confidence from `0` to `1` and short reasons explaining the confidence. |
| `engine`, `engineRepository` | Scraping engine metadata for auditability and workflow routing. |
| `crawlSummary` | Compact summary for dashboards and automation filters. |

#### Reliability

- Uses an input schema so Apify validates required input before the run starts.
- Uses an output schema so users, API clients, and AI agents know where to find results.
- Pushes one dataset item per input website, even when no contact data is found.
- Fails gracefully per URL and records non-fatal crawl errors in the output item.
- Stores a run-level `OUTPUT_SUMMARY` record in the default Key-Value Store.
- Uses bounded crawling with `maxPagesPerDomain`, `maxConcurrency`, and request timeouts.
- Runs under Apify limited permissions and does not require account credentials.

#### Automated Test Readiness

Apify's automated Store test expects the Actor's default/prefilled input to finish successfully and produce a non-empty default dataset within a short time window.

Recommended smoke-test input:

```json
{
  "startUrls": [
    {
      "url": "https://docs.apify.com/"
    }
  ],
  "maxPagesPerDomain": 10,
  "maxConcurrency": 1,
  "respectRobotsTxt": true,
  "extractMedia": false
}
```

Expected smoke-test result:

- run status: succeeded
- default dataset: non-empty
- one domain-level item pushed
- no uncaught `ReferenceError`, `TypeError`, or Python traceback
- `OUTPUT_SUMMARY` present in the Key-Value Store

#### Ease of Use

- Provides form-friendly input controls for URLs, crawling limits, concurrency, robots.txt, contact extraction, automation signals, media extraction, timeout, and proxy settings.
- Uses conservative defaults for normal public website enrichment.
- Keeps media extraction disabled by default to reduce output size and cost.
- Returns CRM-friendly fields such as `primaryEmail`, `primaryPhone`, `leadScore`, `confidence`, `recommendedAction`, and `crawlSummary`.

#### Trust and Safety

- Crawls public pages only.
- Respects robots.txt when enabled.
- Avoids authenticated, private, checkout, account, and obvious sensitive paths.
- Does not submit forms.
- Does not solve CAPTCHAs.
- Does not perform aggressive anti-bot bypassing.
- Does not download or rehost media files; media mode records public URLs only.

#### Congruency

The Actor title, description, input schema, output schema, dataset view, README, and monetization events use the same terminology:

- website/domain lead record
- basic lead record
- qualified lead record
- media assets
- automation signals
- lead score
- confidence
- crawl summary

This consistency is intentional because Apify's quality score considers whether an Actor's text, schemas, and behavior align.

### Lead Score

The score is intentionally simple and transparent. It is an outreach opportunity score, not a business quality score.

Example scoring factors:

- `+20` email found
- `+20` phone found
- `+15` contact page found
- `+10` social profile found
- `+10` contact form found
- `+15` appointment-based business appears to lack online booking
- `+15` no chat, WhatsApp, or similar messaging automation detected
- capped at `100`

Use the score for prioritization and human review, not automated eligibility decisions.

### Confidence

Confidence is separate from lead score. It increases when more useful pages are crawled, contact/about pages are found, contact details are detected, and multiple signals confirm the company identity. It decreases when pages fail, data is sparse, or identity/contact signals are weak.

### Recommended Workflows

#### CRM Enrichment

1. Upload a list of company websites.
2. Extract email, phone, social profiles, contact page, score, and confidence.
3. Export the dataset to CSV or send it to HubSpot, Pipedrive, Airtable, or Google Sheets.

#### AI Receptionist or Booking Automation Leads

Filter for websites with:

- phone number present
- email or contact page present
- `opportunitySignals.hasAutomationGap = true`
- missing booking, chat, WhatsApp, or contact form

#### n8n Automation

1. Trigger this Actor from n8n.
2. Read the default dataset items.
3. Filter by `leadScore`, `confidence`, and `opportunitySignals`.
4. Send qualified records to Google Sheets, Airtable, HubSpot, Slack, or an outreach queue.

Example filter:

```text
leadScore >= 70
AND confidence >= 0.7
AND opportunitySignals.hasAutomationGap = true
```

### Proxy and Stealth Use

The default run does not use Apify Proxy and does not use stealth fetching.

For larger public crawls or sites that rate-limit datacenter traffic, enable Apify Proxy:

```json
{
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

Use `useStealth: true` only when public pages need browser rendering. This Actor does not solve CAPTCHAs, submit forms, scrape authenticated content, or perform aggressive anti-bot bypassing.

### Performance Tips

- Keep `maxPagesPerDomain` between `5` and `20` for quick enrichment.
- Use `20` to `50` pages for deeper lead analysis.
- Disable `extractMedia` unless you need media URLs.
- Keep `crawlSameDomainOnly` enabled for cleaner results.
- Use moderate concurrency for large input lists.
- Enable proxy only when needed.

### Limitations

Websites vary widely. Some sites hide contact details behind JavaScript, publish contact data as images, block automated requests, use ambiguous phone/address formats, or disallow crawling in robots.txt.

Automation signals are best-effort public-page signals. They should be treated as review hints, not guarantees.

### Ethical Usage

Use this Actor only on public web pages and for legitimate business purposes. Respect robots.txt when enabled and comply with applicable privacy, marketing, platform, and data protection rules.

Do not use this Actor for spam, harassment, credential collection, sensitive profiling, scraping private or authenticated data, bypassing access restrictions, or deceptive outreach.

Always review leads before contacting them.

# Actor input Schema

## `startUrls` (type: `array`):

Websites or domains to crawl. One lead intelligence record is produced for each domain.

## `maxPagesPerDomain` (type: `integer`):

Hard limit for internal pages crawled per input domain.

## `maxConcurrency` (type: `integer`):

Maximum number of domains processed in parallel.

## `useStealth` (type: `boolean`):

Use Scrapling's stealth browser fetcher. This is slower and intended for public pages that need JavaScript rendering; it does not perform aggressive bypassing.

## `respectRobotsTxt` (type: `boolean`):

Check robots.txt and skip disallowed paths for a generic crawler user agent.

## `extractEmails` (type: `boolean`):

Extract public email addresses from crawled pages.

## `extractPhones` (type: `boolean`):

Extract conservative phone number candidates from crawled page text.

## `extractSocialLinks` (type: `boolean`):

Extract public social profile links for common platforms.

## `extractContactPages` (type: `boolean`):

Detect likely contact and about page URLs during the crawl.

## `extractAutomationSignals` (type: `boolean`):

Detect contact forms, booking tools, chat widgets, WhatsApp links, and similar public page signals.

## `extractMedia` (type: `boolean`):

Discover public image, video, document, audio, archive, and embed URLs referenced by crawled pages. Files are not downloaded.

## `extractImages` (type: `boolean`):

When media extraction is enabled, include image URLs from img, source, srcset, poster, and CSS url references.

## `extractVideos` (type: `boolean`):

When media extraction is enabled, include video file URLs and public video embeds such as YouTube, Vimeo, Wistia, Loom, and Vidyard.

## `extractDocuments` (type: `boolean`):

When media extraction is enabled, include document URLs such as PDF, DOCX, PPTX, XLSX, CSV, and text files.

## `extractOtherMedia` (type: `boolean`):

When media extraction is enabled, include audio, archive, and other recognized media file URLs.

## `maxMediaPerDomain` (type: `integer`):

Maximum number of media asset URLs included in each domain output item.

## `crawlSameDomainOnly` (type: `boolean`):

Restrict page crawling to the same normalized host as the input URL. For example, docs.example.com stays on docs.example.com.

## `requestTimeoutSecs` (type: `integer`):

Timeout per fetched page in seconds.

## `proxyConfiguration` (type: `object`):

Optional Apify Proxy configuration. Useful for geo-distributed public crawling when targets rate-limit datacenter traffic.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPagesPerDomain": 20,
  "maxConcurrency": 5,
  "useStealth": false,
  "respectRobotsTxt": true,
  "extractEmails": true,
  "extractPhones": true,
  "extractSocialLinks": true,
  "extractContactPages": true,
  "extractAutomationSignals": true,
  "extractMedia": false,
  "extractImages": true,
  "extractVideos": true,
  "extractDocuments": true,
  "extractOtherMedia": true,
  "maxMediaPerDomain": 100,
  "crawlSameDomainOnly": true,
  "requestTimeoutSecs": 30,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `leadResults` (type: `string`):

Structured lead intelligence records. One item is pushed for each input domain.

## `runSummary` (type: `string`):

Aggregated crawl and lead extraction summary for the Actor run.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("solutionssmart/adaptive-website-lead-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://example.com" }] }

# Run the Actor and wait for it to finish
run = client.actor("solutionssmart/adaptive-website-lead-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ]
}' |
apify call solutionssmart/adaptive-website-lead-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=solutionssmart/adaptive-website-lead-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Adaptive Website Lead Extractor",
        "description": "Crawl public business websites with Scrapling to extract emails, phones, social profiles, contact pages, automation gaps, and lead scores for CRM-ready outreach.",
        "version": "0.1",
        "x-build-id": "BECDTslymstuBaogY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/solutionssmart~adaptive-website-lead-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-solutionssmart-adaptive-website-lead-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/solutionssmart~adaptive-website-lead-extractor/runs": {
            "post": {
                "operationId": "runs-sync-solutionssmart-adaptive-website-lead-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/solutionssmart~adaptive-website-lead-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-solutionssmart-adaptive-website-lead-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Websites or domains to crawl. One lead intelligence record is produced for each domain.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPagesPerDomain": {
                        "title": "Max pages per domain",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Hard limit for internal pages crawled per input domain.",
                        "default": 20
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 25,
                        "type": "integer",
                        "description": "Maximum number of domains processed in parallel.",
                        "default": 5
                    },
                    "useStealth": {
                        "title": "Use stealth browser fetching",
                        "type": "boolean",
                        "description": "Use Scrapling's stealth browser fetcher. This is slower and intended for public pages that need JavaScript rendering; it does not perform aggressive bypassing.",
                        "default": false
                    },
                    "respectRobotsTxt": {
                        "title": "Respect robots.txt",
                        "type": "boolean",
                        "description": "Check robots.txt and skip disallowed paths for a generic crawler user agent.",
                        "default": true
                    },
                    "extractEmails": {
                        "title": "Extract emails",
                        "type": "boolean",
                        "description": "Extract public email addresses from crawled pages.",
                        "default": true
                    },
                    "extractPhones": {
                        "title": "Extract phone numbers",
                        "type": "boolean",
                        "description": "Extract conservative phone number candidates from crawled page text.",
                        "default": true
                    },
                    "extractSocialLinks": {
                        "title": "Extract social links",
                        "type": "boolean",
                        "description": "Extract public social profile links for common platforms.",
                        "default": true
                    },
                    "extractContactPages": {
                        "title": "Extract contact/about pages",
                        "type": "boolean",
                        "description": "Detect likely contact and about page URLs during the crawl.",
                        "default": true
                    },
                    "extractAutomationSignals": {
                        "title": "Extract automation signals",
                        "type": "boolean",
                        "description": "Detect contact forms, booking tools, chat widgets, WhatsApp links, and similar public page signals.",
                        "default": true
                    },
                    "extractMedia": {
                        "title": "Extract media assets",
                        "type": "boolean",
                        "description": "Discover public image, video, document, audio, archive, and embed URLs referenced by crawled pages. Files are not downloaded.",
                        "default": false
                    },
                    "extractImages": {
                        "title": "Extract images",
                        "type": "boolean",
                        "description": "When media extraction is enabled, include image URLs from img, source, srcset, poster, and CSS url references.",
                        "default": true
                    },
                    "extractVideos": {
                        "title": "Extract videos",
                        "type": "boolean",
                        "description": "When media extraction is enabled, include video file URLs and public video embeds such as YouTube, Vimeo, Wistia, Loom, and Vidyard.",
                        "default": true
                    },
                    "extractDocuments": {
                        "title": "Extract documents",
                        "type": "boolean",
                        "description": "When media extraction is enabled, include document URLs such as PDF, DOCX, PPTX, XLSX, CSV, and text files.",
                        "default": true
                    },
                    "extractOtherMedia": {
                        "title": "Extract other media",
                        "type": "boolean",
                        "description": "When media extraction is enabled, include audio, archive, and other recognized media file URLs.",
                        "default": true
                    },
                    "maxMediaPerDomain": {
                        "title": "Max media assets per domain",
                        "minimum": 0,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of media asset URLs included in each domain output item.",
                        "default": 100
                    },
                    "crawlSameDomainOnly": {
                        "title": "Crawl same site host only",
                        "type": "boolean",
                        "description": "Restrict page crawling to the same normalized host as the input URL. For example, docs.example.com stays on docs.example.com.",
                        "default": true
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout",
                        "minimum": 5,
                        "maximum": 120,
                        "type": "integer",
                        "description": "Timeout per fetched page in seconds.",
                        "default": 30
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional Apify Proxy configuration. Useful for geo-distributed public crawling when targets rate-limit datacenter traffic.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
