# GPT Crawler MCP — Knowledge files for ChatGPT, Claude, RAG (`kazkn/gpt-crawler-mcp`) Actor

Crawl any website and turn it into a clean knowledge file for your custom GPT, Claude Project, or RAG pipeline. Native MCP server in Standby mode + classic batch mode.

- **URL**: https://apify.com/kazkn/gpt-crawler-mcp.md
- **Developed by:** [KazKN](https://apify.com/kazkn) (community)
- **Categories:** AI, MCP servers, Developer tools
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $35.00 / 1,000 mcp tool calls

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GPT Crawler MCP — Build knowledge files for ChatGPT, Claude Projects & RAG in one click

> **Crawl any website. Get a clean JSON knowledge file. Plug it into your custom GPT, Claude Project, or RAG pipeline. Now also as MCP server for AI agents.**

[![Apify Actor](https://img.shields.io/badge/Apify-Actor-97D700?logo=apify&logoColor=000)](https://apify.com/kazkn/gpt-crawler-mcp)
[![License: ISC](https://img.shields.io/badge/License-ISC-blue.svg)](#license)
[![Built on BuilderIO/gpt-crawler](https://img.shields.io/badge/Built%20on-BuilderIO%2Fgpt--crawler-orange)](https://github.com/BuilderIO/gpt-crawler)

---

### 🎯 Why this Actor

- **No more `pip install` + Python venv broken at 11pm.** One click, no setup, no local Chromium dance.
- **Built on the legendary [BuilderIO/gpt-crawler](https://github.com/BuilderIO/gpt-crawler)** (19k+ GitHub stars, ISC) — battle-tested crawl logic, wrapped for Apify and extended with MCP.
- **Pay only for what you crawl, no subscription.** $0.005 per page, hard-capped by your `maxPagesToCrawl`. No monthly fee.

---

### 📚 What is a "knowledge file"?

A single JSON (or Markdown / plain-text) file containing the cleaned content of every page on a docs site, blog, or knowledge base. You upload it to:

- **ChatGPT** → custom GPT → "Knowledge" → drop the file.
- **Claude Projects** → "Project knowledge" → drop the file.
- **RAG pipelines** → embed it, store in Pinecone / pgvector / Weaviate.
- **AI agents** → call this Actor's MCP server live, no pre-indexing.

That's the whole pitch: turn a website into LLM-ready context in one click.

---

### ⚖️ How it stacks up

| Feature | Run BuilderIO/gpt-crawler locally | Firecrawl ($39/mo+) | **GPT Crawler MCP (this Actor)** |
|---|---|---|---|
| Setup time | 15 min (clone, `npm i`, Playwright install, fight ESM errors) | 5 min (account + API key) | **0 — one click** |
| Pricing | Free + your time + your laptop | $39/mo flat | **$0.005 / page (PPE), no subscription** |
| MCP server mode for AI agents | No | No | **Yes — Apify Standby** |
| Auto retries / proxy rotation | Manual | Yes | **Yes (Apify infra)** |
| n8n / Zapier / Make integrations | No | No | **Yes (Apify connectors)** |
| Output as JSON / Markdown / plain text | JSON only | JSON / Markdown | **JSON / Markdown / TXT** |
| Headless browser (JS-rendered sites) | Yes | Yes | **Yes (Playwright + Chromium)** |

---

### 🚀 Quick start

#### 1. From the Apify Console (recommended)

1. Go to [apify.com/kazkn/gpt-crawler-mcp](https://apify.com/kazkn/gpt-crawler-mcp).
2. Click **Try for free**.
3. Paste your start URL (e.g. `https://docs.your-product.com`), set match pattern (`https://docs.your-product.com/**`) and `maxPagesToCrawl`, and click **Start**.
4. When the run finishes, download `output.json` from the **Storage → Key-value store** tab (or grab the dataset).

#### 2. From the API

```bash
curl -X POST "https://api.apify.com/v2/acts/kazkn~gpt-crawler-mcp/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://docs.your-product.com"],
    "match": "https://docs.your-product.com/**",
    "maxPagesToCrawl": 50
  }'
````

#### 3. From n8n / Zapier / Make

Search for the **Apify** connector → action **Run an Actor** → pick `kazkn/gpt-crawler-mcp` → wire your inputs.

***

### ⚙️ Input parameters

| Field | Type | Default | Description |
|---|---|---|---|
| `urls` | `string[]` | — (required) | Start URLs. Sitemap `.xml` URLs are auto-detected. |
| `match` | `string` | `**` | Glob pattern controlling which links to follow. |
| `selector` | `string` | `body` | CSS or XPath selector for content extraction. |
| `maxPagesToCrawl` | `integer` | `10` | Hard cap on pages crawled (also caps your cost). Max 1000. |
| `outputFileName` | `string` | `output.json` | Name of the combined knowledge file. |
| `outputFormat` | `enum` | `json` | `json` / `markdown` / `txt`. |
| `headless` | `boolean` | `true` | Run Chromium headless. |
| `waitForSelectorTimeout` | `integer` | `1000` | ms to wait for the selector. |
| `cookie` | `string` | — | Optional `name=value` cookie (for cookie-walls or auth). |
| `maxTokens` | `integer` | `0` | Optional cap on tokens per output file. `0` = no limit. |
| `mcpMode` | `boolean` | `false` | Run as MCP server (Standby mode). |

***

### 📦 Output format example

Each page becomes a dataset item:

```json
{
  "url": "https://docs.your-product.com/getting-started",
  "title": "Getting started — YourProduct docs",
  "html": "Welcome to YourProduct...",
  "text": "Welcome to YourProduct. This guide walks you through the first 5 minutes...",
  "tokens": 412,
  "crawledAt": "2026-04-27T09:14:22.181Z"
}
```

The combined knowledge file (Key-value store → `output.json`) is the same array, ready to upload to ChatGPT / Claude / a vector store.

***

### 🤖 MCP server mode (for ChatGPT, Claude Desktop & AI agents)

This Actor can run as a **persistent MCP server** via [Apify Standby](https://docs.apify.com/platform/actors/running/standby). Instead of pre-crawling a site and uploading a static file, your AI agent calls the `crawl_to_knowledge` tool **live**, on demand.

#### 🔧 Tool exposed

| Tool | Description |
|---|---|
| `crawl_to_knowledge` | Crawl a website and return a JSON knowledge file (array of pages with `title`, `url`, `text`, `tokens`). |

#### 💬 Add to Claude Desktop

Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "gpt-crawler": {
      "url": "https://kazkn--gpt-crawler-mcp.apify.actor/mcp?token=YOUR_APIFY_TOKEN"
    }
  }
}
```

#### 🟢 Add to ChatGPT (custom GPT, MCP-compatible clients)

Use the same `https://kazkn--gpt-crawler-mcp.apify.actor/mcp?token=...` URL as your MCP server endpoint.

The tool is then callable in-conversation: *"Crawl `docs.stripe.com/api`, return the first 30 pages as a knowledge file"*.

#### ⏱️ Client compatibility & timeouts

**Read this if you see "interrupted connection" or "invalid authentication" errors.**

A `crawl_to_knowledge` call takes **15 seconds to 3 minutes** depending on `maxPagesToCrawl`, target site latency, and JS-rendering needs:

| Pages crawled | Typical wall-clock |
|---|---|
| 5 pages | 10-25 s |
| 30 pages | 45-90 s |
| 100 pages | 2-4 min |
| 500+ pages | 5+ min |

Most MCP clients ship with default timeouts of **30 seconds** — too short. Configure your client with **120 seconds minimum** (180 s if you crawl 100+ pages).

##### Recommended timeout per client

| Client | Default | Recommended | How to configure |
|---|---|---|---|
| **Claude Desktop** | 30 s | 180 s | Add `"timeout": 180000` to your server entry in `claude_desktop_config.json` |
| **Cursor IDE** | 30 s | 180 s | Settings → MCP → Request timeout (ms) → `180000` |
| **Windsurf** | 60 s | 180 s | MCP config → `requestTimeoutMs: 180000` |
| **Continue.dev** | 30 s | 180 s | `requestTimeoutMs: 180000` in MCP config |
| **langchain-mcp** (Python) | none | 180 s | `MultiServerMCPClient(..., timeout=180)` |
| **@modelcontextprotocol/sdk** (npm) | 30 s | 180 s | `new Client({...}, { requestTimeoutMs: 180000 })` |

##### Claude Desktop config example

```json
{
  "mcpServers": {
    "gpt-crawler": {
      "type": "url",
      "url": "https://kazkn--gpt-crawler-mcp.apify.actor/mcp?token=YOUR_APIFY_TOKEN",
      "timeout": 180000
    }
  }
}
```

##### Best practices to avoid timeouts

1. **Start with `maxPagesToCrawl: 10`** to validate the site works, then scale up.
2. **One crawl at a time per client.** Sequential calls are reliable; concurrent calls hit your client's pool limit.
3. **Cold-start adds 5-8 s.** First request after idle wakes the Actor. Consecutive crawls within 60 s share a warm instance.
4. **For very large sites (1000+ pages)**, use **batch mode** (run the Actor with input from the Console) instead of MCP. Batch runs have a 1-hour default timeout, no MCP-client-side timeout to fight.

##### Troubleshooting common errors

| Error message you see | Likely cause | Fix |
|---|---|---|
| *"Invalid or expired MCP authentication"* | Client closed the connection before the crawl finished | Increase MCP timeout to 180 s |
| *"interrupted network connection"* | Same as above | Increase MCP timeout to 180 s |
| *"Tool call returned no content"* | Site blocked or no pages matched the `match` pattern | Verify `match` pattern; try `match: "**"` |
| *"403 / blocked by target site"* | Aggressive anti-bot on target | Try `headless: true` (default) or use batch mode with custom proxy |
| *"Bad Request: No valid session ID"* | You called `/mcp` without the `initialize` handshake | Use a real MCP client, not raw `curl` |

##### Verifying it works

After connecting, ask your AI assistant:

> *"Use gpt-crawler to crawl `https://docs.stripe.com/api/customers`, max 5 pages, return as JSON."*

You should see a JSON knowledge file with 5 page entries (title, url, text, tokens) within **30 seconds**. If you get "interrupted connection", your client timeout is the issue.

If problems persist after raising the timeout, **open an issue with your Apify Run ID** — server logs always tell the truth and we can pinpoint the cause.

***

### 💰 Pricing

**Pay-Per-Event (PPE):**

| Event | Price | When charged |
|---|---|---|
| Actor Start (`apify-actor-start`) | **$0.00005** (one-time per GB) | Each cold-start |
| **MCP Tool Call** (`tool-request`) ⭐ Primary | **$0.05** | Each `crawl_to_knowledge` invocation in MCP standby mode |
| Page crawled — batch only (`apify-default-dataset-item`) | **$0.001** | Each page written to dataset (batch mode only — never charged in MCP mode) |
| Capability Discovery (`list-request`) | $0.0001 | When client lists tools/resources/prompts |
| Resource Read (`resource-request`) | $0.0001 | When client reads a server resource |
| Prompt Request (`prompt-request`) | $0.0001 | When client requests a prompt |
| Completion Request (`completion-request`) | $0.0001 | When client requests a completion |

**Examples:**

- 1 MCP crawl returning 30 pages → **$0.05** (one tool-request)
- 1 MCP crawl returning 200 pages → **$0.05** (still one tool-request — the tool returns all pages in one response)
- Batch run via Console crawling 30 pages → **$0.03** (30 × $0.001)
- Batch run crawling 200 pages → **$0.20**

**Why MCP mode is flat-rate per call:** the tool returns the entire knowledge file in a single response, so we charge once per call regardless of page count. The page cap is enforced by your `maxPagesToCrawl` input — set it conservatively to control crawl duration.

Apify subscription tier discounts (Bronze 10%, Silver 13%, Gold 20%) apply automatically. There is **no monthly fee** — if you don't run the Actor, you don't pay.

***

### 💡 Use cases

Real workflows people use this Actor for. Pick the closest to yours and the input config is almost identical.

#### 🎓 Build a Custom GPT for your product docs

Crawl `docs.your-product.com`, drop the JSON file into ChatGPT → Create a GPT → Knowledge. Your GPT now answers support questions in your product's voice, cites exact URLs, and stops hallucinating about features that don't exist.

#### 📊 Sales objection handler for AI agencies

Crawl your competitor's website + your own pricing page + your case studies. The combined knowledge file becomes a Custom GPT that any sales rep can talk to: *"Why are we more expensive than X?"* and the GPT answers with your pre-vetted positioning.

#### 🔎 Live RAG for customer support agents

Run the Actor in MCP standby mode. Your support agent (Claude Desktop or a custom n8n workflow) calls `crawl_to_knowledge` whenever the user asks about a topic that isn't already in the cache. Always-fresh context, zero pre-indexing.

#### 📚 Train a RAG pipeline (LangChain / LlamaIndex / pgvector)

Crawl 200 pages of technical content, get a clean JSON, embed each `text` field with OpenAI / Cohere / Voyage embeddings, store in Pinecone or pgvector. Output JSON is already chunk-friendly with `tokens` field included.

#### 🛒 Competitive intelligence for B2B SaaS

Schedule a weekly crawl of your top 3 competitors' marketing sites. Diff the resulting knowledge files to detect new features, pricing changes, or messaging pivots before they hit your Slack.

***

### 🔗 Other Actors by KazKN

If this Actor helps you, you might also like:

- **[Vinted MCP Server](https://apify.com/kazkn/vinted-mcp-server)** — MCP server exposing 5 Vinted tools (search, compare prices across 19 EU countries, trending, seller intel).
- **[Vinted Smart Scraper](https://apify.com/kazkn/vinted-smart-scraper)** — bulk Vinted data extraction with cross-country price comparison and arbitrage detection.
- **[Vinted Turbo Scraper](https://apify.com/kazkn/vinted-turbo-scraper)** — fast URL-based extractor for individual Vinted listings.
- **[App Store Localization Scraper](https://apify.com/kazkn/apple-app-store-localization-scraper)** — find App Store localization gaps across 175 regions.

***

### ❓ FAQ

#### How is this different from running BuilderIO/gpt-crawler locally?

Same crawl logic (we wrap their `core.ts` 1:1 + a tiny adapter). What you get on top: zero local setup, hosted Chromium, automatic retries, Apify proxy rotation, scheduled runs, n8n/Zapier/Make integrations, and an MCP server mode that doesn't exist upstream.

#### Will this work on JavaScript-heavy sites (React / Vue / Next.js)?

Yes. We use Playwright + Chromium under the hood (same as upstream), so client-rendered content is fully supported. Use the `selector` input to target the exact container after JS hydration.

#### Does it respect robots.txt / can I crawl any site?

You are responsible for what you crawl. The Actor will fetch what you ask it to. For competitive/copyrighted content, don't. For your own docs, your customer's docs (with permission), or public technical documentation that explicitly invites indexing — go for it.

#### Can I crawl behind a cookie-wall or auth?

Yes — use the `cookie` input (`name=value` format). For more complex auth (OAuth, multi-step login), [open an issue on GitHub](https://github.com/DataKazKN/gpt-crawler-mcp/issues) and we'll add it.

#### What's the difference between batch mode and MCP mode?

- **Batch:** you specify URLs once, get a file. Best for *building a static knowledge base* you'll upload to a custom GPT or RAG store.
- **MCP:** an AI agent calls the crawler *live, mid-conversation*. Best for *agentic workflows* where the URL to crawl isn't known ahead of time.

***

### 🏗️ Built on

This Actor is a thin Apify wrapper around **[BuilderIO/gpt-crawler](https://github.com/BuilderIO/gpt-crawler)** (ISC licensed, 19k+ stars). All credit for the core crawl logic goes to the Builder.io team and the upstream contributors. The original ISC license text is preserved in the source repository.

Wrapper authored by **[KazKN](https://apify.com/kazkn)** — see all my Actors on [Apify Store](https://apify.com/kazkn).

### 📜 License

ISC — same as upstream. Free for personal and commercial use.

### 🆘 Support

- 🐛 **Issues / feature requests:** open one on [GitHub](https://github.com/DataKazKN/gpt-crawler-mcp/issues) — fastest reply.
- 💬 **Apify Console:** use the **Issues** tab on the [Actor page](https://apify.com/kazkn/gpt-crawler-mcp) to report bugs directly to the maintainer with run IDs attached.
- ⭐ **Liked it?** Leave a 5-star rating on the Actor page — that's how this Actor stays alive and improves.

# Actor input Schema

## `urls` (type: `array`):

One or more URLs to start crawling from. Sitemap URLs ending in .xml are detected automatically.

## `match` (type: `string`):

Glob pattern that the crawler will follow. Use \*\* to crawl everything under a path (e.g. https://example.com/docs/\*\*).

## `selector` (type: `string`):

CSS or XPath selector pointing at the main content container. Defaults to body so you always get something.

## `maxPagesToCrawl` (type: `integer`):

Hard cap on the number of pages the crawler will visit. You pay $0.005 per page (PPE), so this also caps your cost.

## `outputFileName` (type: `string`):

Name of the combined knowledge file saved to the key-value store (in addition to the dataset items).

## `outputFormat` (type: `string`):

Format of the combined knowledge file. JSON is the most flexible for LLM ingestion; markdown/txt are easier to read.

## `headless` (type: `boolean`):

Run Chromium in headless mode. Disable only for debugging.

## `waitForSelectorTimeout` (type: `integer`):

Maximum time to wait for the content selector to appear before extracting.

## `cookie` (type: `string`):

Optional cookie value (format: name=value) to set on every page — useful for sites behind a cookie consent or auth wall.

## `maxTokens` (type: `integer`):

Optional upper bound on tokens per output file. Useful if you target a specific LLM context window. 0 = no limit.

## `mcpMode` (type: `boolean`):

If checked, the Actor starts an MCP server on /mcp instead of running a one-shot crawl. Apify Standby mode also enables this automatically.

## Actor input object example

```json
{
  "urls": [
    "https://www.builder.io/c/docs/developers"
  ],
  "match": "https://www.builder.io/c/docs/**",
  "selector": "body",
  "maxPagesToCrawl": 10,
  "outputFileName": "output.json",
  "outputFormat": "json",
  "headless": true,
  "waitForSelectorTimeout": 1000,
  "maxTokens": 0,
  "mcpMode": false
}
```

# Actor output Schema

## `knowledgeFile` (type: `string`):

Single JSON file containing every crawled page (title, url, text, tokens). Drop this file into ChatGPT Custom GPT 'Knowledge', Claude Project 'Project knowledge', or load it into a vector DB for RAG.

## `pagesDataset` (type: `string`):

Per-page records — same data as the knowledge file but one row per page. Useful for filtering, sorting, or exporting to CSV.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.builder.io/c/docs/developers"
    ],
    "match": "https://www.builder.io/c/docs/**"
};

// Run the Actor and wait for it to finish
const run = await client.actor("kazkn/gpt-crawler-mcp").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://www.builder.io/c/docs/developers"],
    "match": "https://www.builder.io/c/docs/**",
}

# Run the Actor and wait for it to finish
run = client.actor("kazkn/gpt-crawler-mcp").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.builder.io/c/docs/developers"
  ],
  "match": "https://www.builder.io/c/docs/**"
}' |
apify call kazkn/gpt-crawler-mcp --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=kazkn/gpt-crawler-mcp",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GPT Crawler MCP — Knowledge files for ChatGPT, Claude, RAG",
        "description": "Crawl any website and turn it into a clean knowledge file for your custom GPT, Claude Project, or RAG pipeline. Native MCP server in Standby mode + classic batch mode.",
        "version": "0.1",
        "x-build-id": "YUFBaRGmnLSUjIOmQ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/kazkn~gpt-crawler-mcp/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-kazkn-gpt-crawler-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/kazkn~gpt-crawler-mcp/runs": {
            "post": {
                "operationId": "runs-sync-kazkn-gpt-crawler-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/kazkn~gpt-crawler-mcp/run-sync": {
            "post": {
                "operationId": "run-sync-kazkn-gpt-crawler-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more URLs to start crawling from. Sitemap URLs ending in .xml are detected automatically.",
                        "default": [
                            "https://www.builder.io/c/docs/developers"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "match": {
                        "title": "Match pattern",
                        "type": "string",
                        "description": "Glob pattern that the crawler will follow. Use ** to crawl everything under a path (e.g. https://example.com/docs/**).",
                        "default": "**"
                    },
                    "selector": {
                        "title": "Content selector",
                        "type": "string",
                        "description": "CSS or XPath selector pointing at the main content container. Defaults to body so you always get something.",
                        "default": "body"
                    },
                    "maxPagesToCrawl": {
                        "title": "Max pages to crawl",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Hard cap on the number of pages the crawler will visit. You pay $0.005 per page (PPE), so this also caps your cost.",
                        "default": 10
                    },
                    "outputFileName": {
                        "title": "Output filename",
                        "type": "string",
                        "description": "Name of the combined knowledge file saved to the key-value store (in addition to the dataset items).",
                        "default": "output.json"
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "json",
                            "markdown",
                            "txt"
                        ],
                        "type": "string",
                        "description": "Format of the combined knowledge file. JSON is the most flexible for LLM ingestion; markdown/txt are easier to read.",
                        "default": "json"
                    },
                    "headless": {
                        "title": "Headless browser",
                        "type": "boolean",
                        "description": "Run Chromium in headless mode. Disable only for debugging.",
                        "default": true
                    },
                    "waitForSelectorTimeout": {
                        "title": "Wait for selector (ms)",
                        "minimum": 0,
                        "maximum": 60000,
                        "type": "integer",
                        "description": "Maximum time to wait for the content selector to appear before extracting.",
                        "default": 1000
                    },
                    "cookie": {
                        "title": "Cookie",
                        "type": "string",
                        "description": "Optional cookie value (format: name=value) to set on every page — useful for sites behind a cookie consent or auth wall."
                    },
                    "maxTokens": {
                        "title": "Max tokens per output file",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Optional upper bound on tokens per output file. Useful if you target a specific LLM context window. 0 = no limit.",
                        "default": 0
                    },
                    "mcpMode": {
                        "title": "Run as MCP server (Standby)",
                        "type": "boolean",
                        "description": "If checked, the Actor starts an MCP server on /mcp instead of running a one-shot crawl. Apify Standby mode also enables this automatically.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
