# Chatgpt Detector (`sovanza.inc/chatgpt-detector`) Actor

ChatGPT Detector analyses web pages and estimates whether visible text is AI-generated, human-written, mixed, or insufficient for review. It provides probability scores, confidence bands, review priority, and explainable signals for editorial QA, moderation, compliance, and SEO audits.

- **URL**: https://apify.com/sovanza.inc/chatgpt-detector.md
- **Developed by:** [Sovanza](https://apify.com/sovanza.inc) (community)
- **Categories:** AI, SEO tools, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $10.00 / 1,000 analysed pages

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### AI Content Detector — ChatGPT, Claude & AI Text Analyzer with Scoring

Instantly detect whether web page content reads as human-written or AI-generated — including writing patterns commonly associated with ChatGPT, Claude, Gemini, Llama, and other LLMs. Instead of a binary label, this actor returns **structured probability + confidence** plus **explainable reasons** for every analysis, so teams can make defensible decisions at scale.

Built for educators, publishers, content agencies, and compliance teams who need **reliable, explainable AI content detection** — not a black box.

### Overview

The AI Content Detector analyzes one or more **URLs**, extracts the main readable text, and estimates whether the content is:

- `likely_ai`
- `likely_human`
- `mixed`
- `insufficient_text`

For every page, it outputs:

- **AI probability / human probability**
- A **confidence score** and **confidence band**
- A **reviewPriority** label for triage
- A **reasons breakdown** (`topReasons`, `warningFlags`, and `signals`)

This is designed for real-world content: messy pages, mixed edits, SEO templates, and editorial posts — not just clean demo text.

### Important disclaimer

AI detection is probabilistic, not definitive. This actor provides **risk scoring and review guidance**, not proof of authorship. Use it as a screening layer that routes borderline cases to human review.

### Key benefits

- Save hours of manual review with automated AI detection across large volumes of pages  
- Make defensible decisions with **scored, reasoned output** — not opaque labels  
- Scale screening across many URLs or internal crawls with repeatable runs  
- Improve auditability with structured signal breakdowns and exportable reports  
- Reduce reputational and compliance risk by flagging AI-like pages before they go live  

### Features

- **AI vs human-like classification** for extracted web page text  
- **Explainable scoring**: probabilities, confidence, and top reasons  
- **Hybrid detection modes**: `heuristic`, `model`, or `hybrid`  
- **Crawl support**: optionally follow internal links with caps on depth and pages per domain  
- **Content extraction pipeline**: semantic containers + readability + fallbacks to reduce boilerplate noise  
- **Long-form template signals** tuned for SEO-style patterns: heading repetition, section similarity, CTA/FAQ templates  
- **Batch processing**: multiple start URLs in one run  
- **Structured output** for dashboards and downstream automation  

### Export formats

Results are written to the Apify dataset and can be exported as **JSON**, **CSV**, **Excel**, and (where supported by Apify export options) **XML**.

### Use cases (high-value)

- **Academic integrity**: Screen student submissions hosted online (or LMS-exported pages) for AI-like authorship patterns with scored evidence.  
- **Publishing & editorial**: Audit freelance or user-submitted articles before publication to enforce authenticity policies.  
- **Content agency QA**: Verify vendors are delivering human-written work, or flag drafts for additional review.  
- **SEO & compliance**: Identify templated, AI-like pages in a publishing pipeline before indexing.  
- **HR & recruitment**: Screen public application pages/portfolios for AI-like writing patterns (use responsibly).  
- **Legal & contract review**: Flag AI-drafted public disclosures/communications that warrant additional verification.  

### Why this tool stands out

Most detectors return “AI” vs “Human” with no explanation. This actor provides:

- A **confidence score** and **confidence band**
- A **reviewPriority** label to operationalize triage
- A structured **signals** object (so you can audit and tune thresholds)
- Clear **topReasons** so reviewers understand what triggered the verdict

It’s built to support defensible workflows — not just labels.

### How to use on Apify

#### Using the Actor

1. Open the Actor on Apify and go to the **Input** tab.  
2. Add `startUrls` and set crawl/detection options.  
3. Start the run.  
4. Open the **Dataset** tab to inspect scores and explanations.  
5. Export JSON/CSV/Excel or pull via API for automation.  

#### Input Configuration

Full schema: `INPUT_SCHEMA.json`. Example:

```json
{
  "startUrls": [
    { "url": "https://blog.apify.com/" },
    { "url": "https://en.wikipedia.org/wiki/Natural_language_processing" },
    { "url": "https://openai.com/research/" }
  ],
  "crawlLinkedPages": false,
  "maxPagesPerDomain": 1,
  "maxDepth": 0,
  "includeSubdomains": false,
  "detectionMode": "hybrid",
  "languageHint": "en",
  "minTextLength": 300,
  "includeRawText": false,
  "includeHtmlMetadata": true,
  "maxConcurrency": 5,
  "requestTimeoutSecs": 60,
  "saveMarkdown": false,
  "blockAssets": true,
  "includeDebugFields": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
````

- `startUrls` (required): URL list to analyze
- `crawlLinkedPages`: follow internal links
- `maxDepth`, `maxPagesPerDomain`, `includeSubdomains`: crawl controls
- `detectionMode`: `heuristic`, `model`, `hybrid`
- `languageHint`: optional language hint
- `minTextLength`: minimum cleaned text length for analysis
- `includeRawText`, `saveMarkdown`, `includeHtmlMetadata`: output detail controls
- `blockAssets`, `requestTimeoutSecs`, `maxConcurrency`: performance controls
- `includeDebugFields`: include extraction and threshold diagnostics in output
- `proxyConfiguration`, `userAgent`: access and request customization

### Output

Results are stored in the Actor’s default dataset.

Each analyzed page can include:

- **Identity & metadata:** `inputUrl`, `finalUrl`, `domain`, `statusCode`, `title`, `metaDescription`, `canonicalUrl`, `language`
- **Text stats:** `wordCount`, `paragraphCount`, `sentenceCount`
- **Scoring:** `classification`, `aiProbability`, `humanProbability`, `confidence`, `confidenceBand`, `reviewPriority`
- **Signals:** lexical/structure/repetition/specificity/citation + long-form template signals
- **Explainability:** `topReasons`, `warningFlags`, `thresholdDecisionReason`
- **Debug fields (optional):** `extractionSource`, `rawExtractedLength`, `cleanedExtractedLength`
- **Optional content fields:** `rawText`, `markdown`
- **Meta:** `timestamp`

Triage helpers:

- `confidenceBand`
  - `low` for confidence `< 0.40`
  - `medium` for `0.40-0.69`
  - `high` for `>= 0.70`
- `reviewPriority`
  - `high`: `likely_ai` with high confidence
  - `medium`: mixed or uncertain results
  - `low`: likely human with acceptable confidence

Example summary row:

```json
{
  "type": "__summary__",
  "totalUrls": 3,
  "processedUrls": 3,
  "skippedUrls": 0,
  "likelyAiCount": 1,
  "mixedCount": 1,
  "likelyHumanCount": 1,
  "insufficientTextCount": 0,
  "timestamp": "2026-03-26T10:31:45.000000+00:00"
}
```

### Detection methodology (explainability)

The detector combines:

1. **Heuristic signals**
   - lexical diversity, burstiness proxies, repetition patterns
   - generic/explanatory phrasing and templated conclusions
   - specificity and citation grounding proxies
   - long-form template signals for SEO-style structure detection

2. **Rule-based cues**
   - formulaic transitions/conclusions
   - low source-grounding markers
   - repetitive stylistic patterns

3. **Model-style layer (`detect_with_model`)**
   - local calibrated scorer over engineered features
   - no paid external dependency
   - pluggable for future model upgrades

4. **Hybrid scoring**
   - weighted combination of heuristic and model-style signals
   - weights calibrated via benchmark separation diagnostics
   - reference/docs signal dampening for selected template cues

Classification thresholds (conservative defaults):

- `likely_ai` if `aiProbability >= 0.80` and `confidence >= 0.60`
- `likely_human` if `aiProbability < 0.45` with acceptable confidence
- `mixed` otherwise
- `insufficient_text` when cleaned text is below `minTextLength`

### Error Handling

The run does not fail because of one URL.

Handled cases include:

- invalid URLs
- timeout/network failures
- blocked/challenged pages
- JS-heavy pages with little readable text
- short or insufficient content

Failure rows still include structured fields like `inputUrl`, `classification`, `error`, and `timestamp`.

### Performance and Anti-Blocking

- Parallel processing via `maxConcurrency`
- Capped retry logic per page
- Optional heavy asset blocking (`blockAssets`)
- Apify proxy support (`proxyConfiguration`)
- URL deduplication and crawl depth/domain constraints

### Benchmarking and Calibration

Reusable benchmark suites in `benchmarks/`:

- `benchmark-editorial.json`
- `benchmark-reference.json`
- `benchmark-seo.json`
- `benchmark-thin.json`

Run benchmark diagnostics:

```bash
python scripts/benchmark_runner.py
```

Runner output includes:

- per-URL calibration rows (`classification`, `aiProbability`, `confidence`, bands, priority)
- grouped totals by benchmark family
- average template signals per family
- delta vs editorial baseline and ranked signal separation diagnostics

### Integrations & API

- Run and export through the Apify platform
- Retrieve results via Apify API (dataset endpoints)
- Integrate with Python/Node workflows, webhooks, schedules, and automation tools

### Run Locally

```bash
pip install -r requirements.txt
python main.py
```

Use local Apify storage input (`storage/key_value_stores/default/INPUT.json`) or platform input.

### Run on Apify

1. Create/upload Actor as `chatgpt-detector`
2. Configure input in the Actor UI
3. Start a run
4. Read/export dataset results

### Why choose this actor?

- Conservative by default (false-positive resistant)
- Explainable scoring with actionable priority labels
- Benchmark-driven calibration loop built in
- Production-oriented crawling, output, and failure handling

### FAQ

#### What data does this actor return for each analysis?

For each analyzed URL, the actor returns: `classification`, `aiProbability`, `humanProbability`, `confidence`, `confidenceBand`, `reviewPriority`, plus `topReasons`, `warningFlags`, and a structured `signals` breakdown. Optional fields include `rawText` and `markdown` when enabled.

#### Which AI models can this detector identify?

It does not rely on model “fingerprints”. Instead, it scores statistical and structural writing signals that often correlate with LLM-generated text. This generally generalizes across ChatGPT, Claude, Gemini, Llama, and other models — but **edited** or **mixed** content can reduce certainty.

#### How does the confidence score work?

Confidence reflects how strongly the extracted signals support the predicted class. Middle-range confidence often indicates mixed or heavily edited text, or pages with limited signal density.

#### Can I analyze multiple items in a single run?

Yes — provide multiple entries in `startUrls`. You can also enable `crawlLinkedPages` to analyze additional internal pages with caps via `maxDepth` and `maxPagesPerDomain`.

#### Is technical experience required?

No — run it from the Apify UI. For pipelines, use the Apify API to automate runs and consume datasets.

#### How accurate is AI detection?

Accuracy depends on length, language, domain style, and how edited the text is. The actor performs best on longer, unedited or lightly edited text. Use `confidenceBand`, `reviewPriority`, and `topReasons` to guide manual review for borderline cases.

#### What content lengths are supported?

Any length, but pages below `minTextLength` (default 300 characters of cleaned text) are labeled `insufficient_text`. Very short text has weaker signals.

#### Is this a definitive AI detector?

No. It is a probabilistic detector and review-priority tool.

#### Why do many pages show `mixed`?

Conservative thresholds intentionally reduce overconfident labels, especially on reference and editorial content.

#### Why do some pages return `insufficient_text`?

The page may be very short, blocked, dynamic, or otherwise not extractable above `minTextLength`.

#### Can I tune behavior for my domain?

Yes. Use benchmark files + `scripts/benchmark_runner.py` and reweight signals/thresholds for your content distribution.

### SEO Keywords

chatgpt detector\
ai content detector apify\
gpt text detection\
llm generated text checker\
ai writing probability tool\
website ai text classifier\
content authenticity scoring\
ai text risk scoring\
editorial ai moderation tool\
seo template content detector

### Actor Permissions

This Actor is designed to run with limited permissions: read input and write dataset output, with optional proxy/KV usage as configured.

### Limitations

- Highly dynamic/blocked pages can reduce extraction quality
- Mixed human+AI writing remains hard to separate perfectly
- Edited AI text can resemble human writing
- Short content has limited signal strength
- Signal behavior varies by language/domain style

### Get Started

Add URLs, run the Actor, inspect `confidenceBand` + `reviewPriority`, and iterate with benchmark diagnostics for your domain. 🚀

# Actor input Schema

## `startUrls` (type: `array`):

List of URLs to analyze for AI-generated text patterns.

## `maxPagesPerDomain` (type: `integer`):

Maximum pages to analyze per domain when crawling linked pages.

## `crawlLinkedPages` (type: `boolean`):

If enabled, recursively analyzes internal links from each start URL.

## `maxDepth` (type: `integer`):

Maximum depth for internal-link crawling when crawlLinkedPages is enabled.

## `includeSubdomains` (type: `boolean`):

If true, internal-link crawling can include subdomains of the start domain.

## `detectionMode` (type: `string`):

heuristic = rule-based only, model = model-style classifier only, hybrid = combine both.

## `languageHint` (type: `string`):

Optional language hint such as en, ur, fr.

## `minTextLength` (type: `integer`):

Skip pages with fewer than this many characters of extracted clean text.

## `includeRawText` (type: `boolean`):

If true, include extracted plain text in output.

## `includeHtmlMetadata` (type: `boolean`):

Include title, meta description, canonical URL, schema presence, and basic metadata.

## `userAgent` (type: `string`):

Optional custom User-Agent header for requests.

## `proxyConfiguration` (type: `object`):

Apify proxy settings.

## `maxConcurrency` (type: `integer`):

Maximum number of pages processed in parallel.

## `requestTimeoutSecs` (type: `integer`):

Per-page timeout for loading and extraction.

## `saveMarkdown` (type: `boolean`):

If true, include cleaned markdown-like text output.

## `blockAssets` (type: `boolean`):

Block images, fonts, media, and other heavy resources to speed up crawling.

## `includeDebugFields` (type: `boolean`):

Include extraction and threshold decision debug fields in output.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com/blog/post-1"
    }
  ],
  "maxPagesPerDomain": 1,
  "crawlLinkedPages": false,
  "maxDepth": 0,
  "includeSubdomains": false,
  "detectionMode": "hybrid",
  "languageHint": "",
  "minTextLength": 300,
  "includeRawText": false,
  "includeHtmlMetadata": true,
  "userAgent": "",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "maxConcurrency": 10,
  "requestTimeoutSecs": 60,
  "saveMarkdown": false,
  "blockAssets": true,
  "includeDebugFields": false
}
```

# Actor output Schema

## `results` (type: `string`):

Per-page AI detection results in the default dataset.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com/blog/post-1"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("sovanza.inc/chatgpt-detector").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://example.com/blog/post-1" }] }

# Run the Actor and wait for it to finish
run = client.actor("sovanza.inc/chatgpt-detector").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com/blog/post-1"
    }
  ]
}' |
apify call sovanza.inc/chatgpt-detector --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=sovanza.inc/chatgpt-detector",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Chatgpt Detector",
        "description": "ChatGPT Detector analyses web pages and estimates whether visible text is AI-generated, human-written, mixed, or insufficient for review. It provides probability scores, confidence bands, review priority, and explainable signals for editorial QA, moderation, compliance, and SEO audits.",
        "version": "0.0",
        "x-build-id": "XHSzEC7Jgb3UpDl76"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/sovanza.inc~chatgpt-detector/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-sovanza.inc-chatgpt-detector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/sovanza.inc~chatgpt-detector/runs": {
            "post": {
                "operationId": "runs-sync-sovanza.inc-chatgpt-detector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/sovanza.inc~chatgpt-detector/run-sync": {
            "post": {
                "operationId": "run-sync-sovanza.inc-chatgpt-detector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of URLs to analyze for AI-generated text patterns.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPagesPerDomain": {
                        "title": "Max pages per domain",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum pages to analyze per domain when crawling linked pages.",
                        "default": 1
                    },
                    "crawlLinkedPages": {
                        "title": "Crawl linked pages",
                        "type": "boolean",
                        "description": "If enabled, recursively analyzes internal links from each start URL.",
                        "default": false
                    },
                    "maxDepth": {
                        "title": "Max crawl depth",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum depth for internal-link crawling when crawlLinkedPages is enabled.",
                        "default": 0
                    },
                    "includeSubdomains": {
                        "title": "Include subdomains",
                        "type": "boolean",
                        "description": "If true, internal-link crawling can include subdomains of the start domain.",
                        "default": false
                    },
                    "detectionMode": {
                        "title": "Detection mode",
                        "enum": [
                            "heuristic",
                            "hybrid",
                            "model"
                        ],
                        "type": "string",
                        "description": "heuristic = rule-based only, model = model-style classifier only, hybrid = combine both.",
                        "default": "hybrid"
                    },
                    "languageHint": {
                        "title": "Language hint",
                        "type": "string",
                        "description": "Optional language hint such as en, ur, fr.",
                        "default": ""
                    },
                    "minTextLength": {
                        "title": "Minimum text length",
                        "minimum": 50,
                        "type": "integer",
                        "description": "Skip pages with fewer than this many characters of extracted clean text.",
                        "default": 300
                    },
                    "includeRawText": {
                        "title": "Include raw text",
                        "type": "boolean",
                        "description": "If true, include extracted plain text in output.",
                        "default": false
                    },
                    "includeHtmlMetadata": {
                        "title": "Include HTML metadata",
                        "type": "boolean",
                        "description": "Include title, meta description, canonical URL, schema presence, and basic metadata.",
                        "default": true
                    },
                    "userAgent": {
                        "title": "User agent override",
                        "type": "string",
                        "description": "Optional custom User-Agent header for requests.",
                        "default": ""
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of pages processed in parallel.",
                        "default": 10
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 10,
                        "maximum": 300,
                        "type": "integer",
                        "description": "Per-page timeout for loading and extraction.",
                        "default": 60
                    },
                    "saveMarkdown": {
                        "title": "Save markdown text",
                        "type": "boolean",
                        "description": "If true, include cleaned markdown-like text output.",
                        "default": false
                    },
                    "blockAssets": {
                        "title": "Block heavy assets",
                        "type": "boolean",
                        "description": "Block images, fonts, media, and other heavy resources to speed up crawling.",
                        "default": true
                    },
                    "includeDebugFields": {
                        "title": "Include debug fields",
                        "type": "boolean",
                        "description": "Include extraction and threshold decision debug fields in output.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
