# RAG Web Browser (`scraper-engine/rag-web-browser`) Actor

- **URL**: https://apify.com/scraper-engine/rag-web-browser.md
- **Developed by:** [Scraper Engine](https://apify.com/scraper-engine) (community)
- **Categories:** AI, Agents, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.99 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🌐 RAG Web Browser — Search & Scrape for AI Agents & LLM Pipelines

> **One actor. Any question. Clean Markdown back.**
> Search Google → scrape the top results → return polished Markdown / HTML / plain text — ready to drop straight into your **RAG pipeline**, **LangChain / LlamaIndex retriever**, **OpenAI Assistant**, **Claude**, **Gemini**, or **custom AI agent**.

[![Apify Actor](https://apify.com/actor-badge?actor=rag-web-browser)](https://apify.com/)

---

### ✨ Why Choose This Actor?

| 🔥 | What you get |
|---|---|
| 🚀 | **Blazing-fast** async pipeline (aiohttp + selectolax + lexbor) |
| 🧠 | **LLM-ready output** — clean Markdown by default, HTML & plain text on demand |
| 🛡️ | **Smart proxy ladder** — starts direct, auto-upgrades to datacenter → residential if a site blocks us |
| 🔁 | **Resilient retries** — 3 residential attempts before giving up |
| 🕸️ | **Bulk URLs or one search query** — single input, two modes |
| 🍪 | **Removes cookie / GDPR banners** automatically |
| 📰 | **Readability mode** — isolates article body for cleaner context |
| 💾 | **Live dataset writes** — partial results survive crashes |
| 🪟 | **Open-source friendly** — Apify SDK 3.x, Python 3.13 |

---

### 🎯 Key Features

* 🔍 **Google Search backbone** — paginated, deduped, ranked results
* 🌐 **Direct URL mode** — paste a list of URLs and skip search entirely
* 🧹 **Custom CSS scrub** — strip nav, footer, scripts, modals, ads, …
* 📑 **Per-page metadata** — title, description, language, redirect chain
* 🔢 **Per-section dataset views** — Results · Metadata · Crawl status · Content
* 🎚️ **Tunable concurrency** — 1 to 50 parallel fetches
* 🐞 **Debug mode** — see byte length, final URL, content type
* 💸 **Pay-per-usage pricing** — no separate per-event charges

---

### 📥 Input

The form matches the official [RAG Web Browser](https://apify.com/apify/rag-web-browser/input-schema) layout, plus an optional bulk **URLs** field.

| Field | Type | Default | Description |
|---|---|---|---|
| `query` | string | `web browser for RAG pipelines -site:reddit.com` | Search keywords or a single URL. |
| `urls` | array | `[]` | Optional bulk URLs — skips search when set. |
| `maxResults` | integer | `3` | Top organic results to scrape (1–100). |
| `outputFormats` | array | `["markdown"]` | `text`, `markdown`, and/or `html`. |
| `serpProxyGroup` | string | `GOOGLE_SERP` | Proxy group for Google Search (`GOOGLE_SERP` or `SHADER`). |
| `serpMaxRetries` | integer | `2` | Retries when SERP fetch fails. |
| `proxyConfiguration` | object | `{ "useApifyProxy": true }` | Target-page proxies; auto-escalates to residential on block. |
| `scrapingTool` | string | `raw-http` | `raw-http` (supported) or `browser-playwright` (falls back to HTTP). |
| `removeElementsCssSelector` | string | (sensible default) | CSS to strip before extraction. |
| `htmlTransformer` | enum | `none` | `none` or `readable` (article body). |
| `maxRequestRetries` | integer | `1` | Target page retries (0–3). |
| `dynamicContentWaitSecs` | integer | `10` | For browser mode only (ignored for Raw HTTP). |
| `removeCookieWarnings` | boolean | `true` | Strip cookie & GDPR dialogs. |
| `debugMode` | boolean | `false` | Add per-page debug info. |

#### Example input

```json
{
  "query": "best web scraping libraries 2026",
  "maxResults": 5,
  "outputFormats": ["markdown"],
  "removeCookieWarnings": true,
  "proxyConfiguration": { "useApifyProxy": false }
}
````

Or scrape specific URLs:

```json
{
  "urls": [
    "https://apify.com",
    "https://playwright.dev",
    "https://crawlee.dev"
  ],
  "outputFormats": ["markdown", "text"]
}
```

***

### 📤 Output

Each dataset row contains:

```json
{
  "crawl": {
    "httpStatusCode": 200,
    "httpStatusMessage": "OK",
    "loadedAt": "2026-05-19T12:50:40.591Z",
    "uniqueKey": "21f8d32712",
    "requestStatus": "handled"
  },
  "searchResult": {
    "title": "RAG Web Browser",
    "description": "Web search and fetch tool for AI agents and RAG pipelines ...",
    "url": "https://apify.com/apify/rag-web-browser",
    "resultType": "ORGANIC",
    "rank": 1
  },
  "metadata": {
    "title": "RAG Web Browser · Apify",
    "description": "Web search and fetch tool for AI agents and RAG pipelines.",
    "languageCode": "en",
    "url": "https://apify.com/apify/rag-web-browser",
    "redirectedUrl": "https://apify.com/apify/rag-web-browser"
  },
  "query": "web browser for RAG pipelines -site:reddit.com",
  "markdown": "## RAG Web Browser\n\nWeb search and fetch tool for AI agents..."
}
```

The Apify Console renders the dataset with **five tabs**:

- 📋 **Overview** — everything at a glance
- 📄 **Search results** — rank, title, snippet, URL
- 📑 **Page metadata** — title, description, language, redirect chain
- 🛰️ **Crawl status** — HTTP code, request outcome, timestamps
- 📝 **Extracted content** — Markdown / HTML / plain text per page

***

### 🚀 How to Use (Apify Console)

1. Go to [Apify Console → Actors](https://console.apify.com/actors).
2. Open this actor (or import it as a task).
3. Set your **🔎 Search query** *or* paste a list of **🔗 URLs**.
4. Pick which **📝 Output formats** you want (Markdown is the default).
5. Click **▶ Start**.
6. Watch the run feed — you'll see emoji-prefixed live progress: `🔎 Searching…`, `📄 Page 1 → +10 new`, `🔗 [3] Fetching …`, `✅ [3] 200 — Title…`, `📊 Progress: 5/10 (50%)`.
7. Open the **📦 Output** tab to browse results by section.
8. Export as JSON / CSV / XLSX, or pull via the Apify API.

***

### 🤖 Use via API / Integration

#### REST API

```bash
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "query": "vector database benchmarks 2026",
       "maxResults": 5,
       "outputFormats": ["markdown"]
     }'
```

#### Python SDK

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("<ACTOR_ID>").call(run_input={
    "query": "LangChain vs LlamaIndex",
    "maxResults": 5,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["metadata"]["title"], "→", item["markdown"][:200])
```

#### Drop-in for LangChain retrievers

```python
from langchain.schema import Document

docs = [
    Document(page_content=item["markdown"],
             metadata={"source": item["metadata"]["url"], "rank": item["searchResult"]["rank"]})
    for item in items if item.get("markdown")
]
```

***

### 🛡️ How blocking & proxies are handled

You don't need to think about proxies — the actor auto-tunes:

1. **🟢 Direct** by default (fastest, cheapest).
2. If a site blocks us → **🟡 Datacenter proxy** is engaged.
3. Still blocked? → **🔴 Residential proxy** with up to 3 retries.
4. Once residential kicks in, it **sticks** for the rest of the run so successive pages don't fight the same wall.

All escalations are logged so you can audit them, e.g. `🛡️ Switching to residential connection (sticky) — reason: site responded with 403`.

***

### 🎯 Best Use Cases

- 🧠 **RAG pipelines** — feed fresh web context to your LLM at query time
- 🤖 **AI agents** — give Claude / GPT / Gemini a real web-browsing skill
- 🔬 **Research assistants** — bulk-summarize top N results for a topic
- 📈 **Competitive intelligence** — track competitor pages on a schedule
- 📰 **Content monitoring** — convert articles to Markdown for analysis
- 🪄 **Prompt enrichment** — auto-grab fresh facts before generating text

***

### 💰 Pricing

This actor is **pay-per-usage** — you only pay for the Apify platform compute units (CUs) and proxy traffic it actually uses. There are no separate per-event charges.

| Driver | Notes |
|---|---|
| ⏱️ Compute units | Proportional to memory × runtime. Typical 10-result run = a few cents. |
| 🛡️ Datacenter proxy | Used only if a site blocks the direct request. |
| 🛡️ Residential proxy | Used as a last resort. Higher cost but unblocks most walls. |
| 💾 Storage | A few KB per dataset row. |

> Want to lower cost further? Set `maxResults` lower, enable `htmlTransformer: "readable"`, or skip `html` output.

***

### ❓ Frequently Asked Questions

**Q: Do I need to configure a proxy myself?**
**A:** No. Start with **no proxy** (the default). If a site blocks the direct request, the actor automatically tries datacenter, then residential. You only need to pick a proxy explicitly if you want a specific geography.

**Q: How is the Markdown produced?**
**A:** We parse HTML with [selectolax](https://github.com/rushter/selectolax) (lexbor backend), strip noise via your CSS selectors, optionally isolate the article body, then convert to Markdown via [markdownify](https://github.com/matthewwfinkel/markdownify) with ATX-style headings.

**Q: Can I scrape JavaScript-heavy sites?**
**A:** This actor uses HTTP-only fetching for maximum speed. For sites that require a full browser (heavy SPA / login flows), use a Playwright-based actor.

**Q: Does it handle redirects?**
**A:** Yes — `metadata.redirectedUrl` captures the final URL after following redirects.

**Q: What happens if half my pages succeed and half fail?**
**A:** You still get the successful ones. Each record is pushed to the dataset **live**, so a crash mid-run cannot wipe earlier results. Failed pages are saved with `crawl.requestStatus: "failed"` and the error message.

**Q: Can I export results?**
**A:** Yes — JSON, CSV, XLSX, RSS, XML, HTML table, all available in the **Output** tab and via the Apify API.

***

### 📜 Cautions / Legal

- The actor scrapes only **publicly available** web content.
- Don't use it to scrape private, gated, or authenticated content unless you have explicit authorization.
- You are responsible for legal compliance (GDPR, CCPA, site Terms of Service, robots.txt, copyright).
- Be a good citizen — avoid excessive `maxResults` on sites you do not own or operate.

***

### 📨 Support & Feedback

Found a bug or have a feature request? Open an issue from the actor page in the Apify Console and we'll take a look. PRs welcome.

***

*Built with 💙 on the Apify platform.*

# Actor input Schema

## `query` (type: `string`):

Enter Google Search keywords or a URL of a specific web page. Supports <a href="https://blog.apify.com/how-to-scrape-google-like-a-pro/" target="_blank">advanced search operators</a>. Examples: <code>san francisco weather</code>, <code>https://www.cnn.com</code>, <code>function calling site:openai.com</code>. Leave empty only when using <b>URLs (optional bulk)</b> below.

## `urls` (type: `array`):

Skip Google Search and scrape these URLs directly. If set, the search term above is ignored.

## `maxResults` (type: `integer`):

The maximum number of top organic Google Search results whose web pages will be extracted. If the query is a URL, this field is ignored and only that page is fetched.

## `outputFormats` (type: `array`):

Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.

## `serpProxyGroup` (type: `string`):

Overrides the default Apify Proxy group used for fetching Google Search results.

## `serpMaxRetries` (type: `integer`):

The maximum number of times the Actor will retry fetching Google Search results on error. If the last attempt fails, the entire search step fails.

## `proxyConfiguration` (type: `object`):

Apify Proxy settings for scraping target web pages. When enabled, requests start on datacenter proxies. If a site blocks the request, the Actor automatically escalates to residential proxies (up to 3 retries, then stays on residential for the rest of the run).

## `scrapingTool` (type: `string`):

Raw HTTP is fast and works for most static sites. Browser (Playwright) mode is not available in this Python build — if selected, Raw HTTP is used instead.

## `removeElementsCssSelector` (type: `string`):

CSS selectors for elements removed from the DOM before conversion to text or Markdown. Set to a non-matching selector like <code>dummy\_keep\_everything</code> to disable removal.

## `htmlTransformer` (type: `string`):

How to transform HTML after element removal. <b>None</b> keeps the cleaned page; <b>Readable text</b> extracts the main article body.

## `maxRequestRetries` (type: `integer`):

Per-page retry budget after the first failure, before the page is skipped or proxy escalation continues.

## `dynamicContentWaitSecs` (type: `integer`):

Maximum seconds to wait for dynamic content when using Browser mode. Ignored for Raw HTTP (the default scraping tool in this build).

## `removeCookieWarnings` (type: `boolean`):

Remove cookie consent banners before extraction. Slightly increases processing time.

## `debugMode` (type: `boolean`):

Store debugging information (proxy tier, final URL, byte length) in each dataset record under the <code>debug</code> field.

## Actor input object example

```json
{
  "query": "web browser for RAG pipelines -site:reddit.com",
  "urls": [],
  "maxResults": 10,
  "outputFormats": [
    "markdown"
  ],
  "serpProxyGroup": "GOOGLE_SERP",
  "serpMaxRetries": 2,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "scrapingTool": "raw-http",
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]",
  "htmlTransformer": "none",
  "maxRequestRetries": 1,
  "dynamicContentWaitSecs": 10,
  "removeCookieWarnings": true,
  "debugMode": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "web browser for RAG pipelines -site:reddit.com"
};

// Run the Actor and wait for it to finish
const run = await client.actor("scraper-engine/rag-web-browser").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "web browser for RAG pipelines -site:reddit.com" }

# Run the Actor and wait for it to finish
run = client.actor("scraper-engine/rag-web-browser").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "web browser for RAG pipelines -site:reddit.com"
}' |
apify call scraper-engine/rag-web-browser --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scraper-engine/rag-web-browser",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RAG Web Browser",
        "description": null,
        "version": "0.2",
        "x-build-id": "WYBu9CYXfAZglVlaI"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scraper-engine~rag-web-browser/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scraper-engine-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scraper-engine~rag-web-browser/runs": {
            "post": {
                "operationId": "runs-sync-scraper-engine-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scraper-engine~rag-web-browser/run-sync": {
            "post": {
                "operationId": "run-sync-scraper-engine-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search term or URL",
                        "type": "string",
                        "description": "Enter Google Search keywords or a URL of a specific web page. Supports <a href=\"https://blog.apify.com/how-to-scrape-google-like-a-pro/\" target=\"_blank\">advanced search operators</a>. Examples: <code>san francisco weather</code>, <code>https://www.cnn.com</code>, <code>function calling site:openai.com</code>. Leave empty only when using <b>URLs (optional bulk)</b> below."
                    },
                    "urls": {
                        "title": "URLs (optional bulk)",
                        "type": "array",
                        "description": "Skip Google Search and scrape these URLs directly. If set, the search term above is ignored.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Maximum results",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "The maximum number of top organic Google Search results whose web pages will be extracted. If the query is a URL, this field is ignored and only that page is fetched.",
                        "default": 10
                    },
                    "outputFormats": {
                        "title": "Output formats",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "text",
                                "markdown",
                                "html"
                            ],
                            "enumTitles": [
                                "Plain text",
                                "Markdown",
                                "HTML"
                            ]
                        },
                        "default": [
                            "markdown"
                        ]
                    },
                    "serpProxyGroup": {
                        "title": "SERP proxy group",
                        "enum": [
                            "GOOGLE_SERP",
                            "SHADER"
                        ],
                        "type": "string",
                        "description": "Overrides the default Apify Proxy group used for fetching Google Search results.",
                        "default": "GOOGLE_SERP"
                    },
                    "serpMaxRetries": {
                        "title": "SERP max retries",
                        "minimum": 0,
                        "maximum": 5,
                        "type": "integer",
                        "description": "The maximum number of times the Actor will retry fetching Google Search results on error. If the last attempt fails, the entire search step fails.",
                        "default": 2
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify Proxy settings for scraping target web pages. When enabled, requests start on datacenter proxies. If a site blocks the request, the Actor automatically escalates to residential proxies (up to 3 retries, then stays on residential for the rest of the run).",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "scrapingTool": {
                        "title": "Select a scraping tool",
                        "enum": [
                            "raw-http",
                            "browser-playwright"
                        ],
                        "type": "string",
                        "description": "Raw HTTP is fast and works for most static sites. Browser (Playwright) mode is not available in this Python build — if selected, Raw HTTP is used instead.",
                        "default": "raw-http"
                    },
                    "removeElementsCssSelector": {
                        "title": "Remove HTML elements (CSS selector)",
                        "type": "string",
                        "description": "CSS selectors for elements removed from the DOM before conversion to text or Markdown. Set to a non-matching selector like <code>dummy_keep_everything</code> to disable removal.",
                        "default": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]"
                    },
                    "htmlTransformer": {
                        "title": "HTML transformer",
                        "enum": [
                            "none",
                            "readable"
                        ],
                        "type": "string",
                        "description": "How to transform HTML after element removal. <b>None</b> keeps the cleaned page; <b>Readable text</b> extracts the main article body.",
                        "default": "none"
                    },
                    "maxRequestRetries": {
                        "title": "Target page max retries",
                        "minimum": 0,
                        "maximum": 3,
                        "type": "integer",
                        "description": "Per-page retry budget after the first failure, before the page is skipped or proxy escalation continues.",
                        "default": 1
                    },
                    "dynamicContentWaitSecs": {
                        "title": "Target page dynamic content timeout",
                        "minimum": 0,
                        "maximum": 60,
                        "type": "integer",
                        "description": "Maximum seconds to wait for dynamic content when using Browser mode. Ignored for Raw HTTP (the default scraping tool in this build).",
                        "default": 10
                    },
                    "removeCookieWarnings": {
                        "title": "Remove cookie warnings",
                        "type": "boolean",
                        "description": "Remove cookie consent banners before extraction. Slightly increases processing time.",
                        "default": true
                    },
                    "debugMode": {
                        "title": "Enable debug mode",
                        "type": "boolean",
                        "description": "Store debugging information (proxy tier, final URL, byte length) in each dataset record under the <code>debug</code> field.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
