# Smart Page Fetcher (`shelvick/smart-page-fetcher`) Actor

Fetch a batch of URLs adaptively: cheap HTTP for static pages, browser render for JavaScript pages, stealth+residential proxy only for actively defended pages. Pay per URL by the difficulty that actually worked, with browser launches amortized across the batch.

- **URL**: https://apify.com/shelvick/smart-page-fetcher.md
- **Developed by:** [Scott Helvick](https://apify.com/shelvick) (community)
- **Categories:** Developer tools, Automation, AI
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.43 / 1,000 page fetched (basic tier)s

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Smart Page Fetcher

Fetch a batch of URLs adaptively. The Actor tries the cheapest method that works on each URL — plain HTTP first, then a real browser if JavaScript is needed, then a stealth + residential-proxy path only for actively defended pages. You pay only for the difficulty each URL actually needed, with browser startup amortized across the whole batch.

### What this does

Submit a list of URLs. The Actor walks each URL up an escalation chain until something works, then pushes one dataset record per URL with the requested output formats.

- **Tier 1 — basic HTTP.** Plain `GET`, no JavaScript, no proxy. Fast and cheap. Good for static pages, JSON-LD-heavy product pages, documentation, RSS-style content.
- **Tier 2 — JavaScript render.** Real browser, no stealth shims. Loads the page, runs JS, captures the rendered DOM. Good for SPAs and lazy-rendered content.
- **Tier 3 — stealth + residential proxy.** Hardened browser session routed through residential IPs in the country of your choice. Used only when the cheaper tiers can't get past bot defenses.

Each tier can be locked on or off per request. The default is `auto` on all three — escalate from cheapest, stop as soon as a tier returns usable content.

Output formats are derived from the same fetched HTML, no extra fetch charge per format:

- **`html`** — raw HTML, returned as a Key-Value Store URL (so the dataset record stays small). Byte-for-byte what the target returned — no scripts, banners, or wrappers injected into the response.
- **`text`** — boilerplate-stripped visible text (LLM-friendly)
- **`markdown`** — page content as Markdown
- **`links`** — every anchor as `{url, text, title}` with relative hrefs resolved
- **`json_ld`** — every `<script type="application/ld+json">` block, parsed
- **`og`** — OpenGraph values (`title`, `description`, `image`, `url`, `type`, `site_name`)
- **`meta`** — other meta tags as a flat dict (description, canonical, viewport, twitter:\*, etc.)
- **`a11y`** — browser accessibility tree as JSON (tier 2/3 only)
- **`screenshot`** — full-page PNG (tier 2/3 only)

`html`, `a11y`, and `screenshot` are uploaded to the Apify Key-Value Store and the dataset record stores a public URL. Smaller structured outputs stay inline.

### Why adaptive tiering matters

The cost gap between fetch methods is enormous. A plain HTTP request takes 100ms and ~$0.0005 of resources. A full stealth render against a Cloudflare-protected site can take 30 seconds and burn an entire residential proxy session worth ~$0.03. If you pre-commit to one method, you either overpay for easy pages or fail on hard ones.

A worse failure mode: the page *renders* but is silently wrong. Bot defenses sometimes serve a 200 with a JavaScript challenge interstitial that looks like a page to a naive HTTP client — the LLM downstream gets a string of obfuscated JS instead of the article body and has no way to know.

This Actor solves both ends. It picks the right method per URL — paying basic-tier rates for the 90% of URLs that don't need anything fancy, and only escalating to the expensive paths when the cheaper ones return content that fails an escalation check (known anti-bot markers, JS-required signals, the typical 403/429/503 status codes from bot defenses). The customer never has to think about which tier is right; they just submit URLs.

Batches amortize the fixed costs too. A real browser takes 3-5 seconds to launch — paid once per batch, not once per URL. Pass 50 URLs and the browser cost spreads across all of them.

### Use cases

- Fetch a mixed batch of public pages where you don't know which will be easy and which will be defended
- Build a corpus of pages for downstream LLM extraction (use `outputs: ["markdown"]` and let the Actor pick the cheapest tier that returns the rendered content)
- Refresh structured data (JSON-LD, OpenGraph) across a list of product / article / event URLs
- Capture screenshots and accessibility trees across a list of pages for QA or audit purposes
- Pull pages from a specific country (residential proxy, tier 3, via the `country` field)

### How it compares to fixed-method scrapers

| Approach | Static HTML | JavaScript-rendered | Bot-defended | Cost on easy pages |
|---|---|---|---|---|
| Plain HTTP fetch | ✓ | ✗ | ✗ | cheapest |
| Always-stealth fetcher | ✓ | ✓ | ✓ | overpaying for easy pages |
| **Smart Page Fetcher** | ✓ | ✓ | ✓ | basic-tier price |

You don't pay tier-2 rates on a tier-1 page and you don't fail on a tier-3 page. One callable surface, one batch input, one set of output formats — regardless of which tier each URL ended up on.

### Input

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `urls` | array | ✓ | — | List of URLs to fetch, 1-500 per batch. Each entry is either a plain URL string or an object `{"url": "...", "headers": {...}}`. URLs must start with `http://` or `https://`. |
| `basic` | enum: `auto` / `true` / `false` | | `auto` | Controls the basic HTTP tier. `auto`: included in the escalation chain. `true`: starts here. `false`: skipped. |
| `js` | enum: `auto` / `true` / `false` | | `auto` | Controls the JS render tier. |
| `stealth` | enum: `auto` / `true` / `false` | | `auto` | Controls the stealth tier. |
| `outputs` | array of strings | | `["html", "markdown"]` | Any combination of `html`, `text`, `markdown`, `a11y`, `screenshot`, `links`, `json_ld`, `og`, `meta`. |
| `runtime_budget_ms` | integer (30000–3600000) | | `270000` | Total wall-clock budget for the whole batch. Unprocessed URLs come back as `deferred` (zero charge). The default keeps synchronous callers under Apify's 5-minute sync API limit with headroom. |
| `country` | string (ISO-2) | | — | Optional country code (e.g. `US`, `GB`, `DE`) forwarded to the stealth tier's residential proxy. Ignored by basic and JS tiers. |

The convenience input `url: "<single-url>"` is accepted silently as syntactic sugar for `urls: ["<single-url>"]`.

#### Per-URL request headers

When a `urls` entry uses the object form, the `headers` map sets request headers for that URL only — they don't leak to other URLs in the same batch. Same effect across all three tiers (httpx GET, Playwright context, stealth backend).

Allowed header names (case-insensitive):

- `Accept`
- `Accept-Language`
- `Accept-Encoding`
- `User-Agent`
- `Referer`
- `Content-Type`

Anything else — `Cookie`, `Authorization`, `Proxy-Authorization`, any `X-*` header, `Origin`, etc. — is rejected at input validation time. The Actor is a general-purpose unauthenticated fetcher; allowing credential or session headers would turn it into an authenticated-session proxy on demand. Use a purpose-built Actor for authenticated scraping.

Example: route Reddit-style listings through the JSON variant by setting `Accept`:

```json
{
  "urls": [
    "https://example.com",
    {
      "url": "https://old.reddit.com/r/programming/.json",
      "headers": { "Accept": "application/json" }
    }
  ]
}
````

### Output

One dataset record per URL, in input order.

**Success record** (a tier returned usable content):

```json
{
  "url": "https://example.com",
  "status": "success",
  "realized_tier": "basic",
  "attempted_tiers": ["basic"],
  "final_url": "https://example.com",
  "response_status": 200,
  "outputs": {
    "html": "https://api.apify.com/v2/key-value-stores/.../records/html-0.html",
    "markdown": "This domain is for use in documentation examples..."
  }
}
```

**Failure record** (every allowed tier was tried and errored):

```json
{
  "url": "https://example.com",
  "status": "failed",
  "attempted_tiers": ["basic", "js", "stealth"],
  "tier_errors": {
    "basic": "http_404",
    "js": "navigation_failed",
    "stealth": "upstream_unsolved"
  }
}
```

**Deferred record** (runtime budget exhausted before the URL was attempted):

```json
{
  "url": "https://example.com",
  "status": "deferred",
  "reason": "runtime_budget_exhausted",
  "attempted_tiers": []
}
```

Only `success` records trigger a fetch charge. `failed` and `deferred` are zero-charge.

The run's `OUTPUT.json` (visible in the run summary) is a small batch-level object:

```json
{
  "batch_size": 50,
  "by_tier": { "basic": 41, "js": 7, "stealth": 1 },
  "failed": 1,
  "deferred": 0,
  "duration_ms": 28430,
  "runtime_budget_exhausted": false
}
```

### Example

```json
{
  "urls": [
    "https://example.com",
    "https://news.ycombinator.com",
    "https://www.python.org/"
  ],
  "outputs": ["markdown", "links", "og"]
}
```

Via the API:

```bash
curl -X POST "https://api.apify.com/v2/acts/shelvick~smart-page-fetcher/run-sync-get-dataset-items?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com"], "outputs": ["markdown"]}'
```

`run-sync-get-dataset-items` blocks until the run finishes and returns the dataset records directly. Use the async run endpoint for batches over ~50 URLs (Apify's sync endpoint has a 5-minute cap).

### Calling from an AI agent

The Actor is designed for agent discovery and invocation.

**Apify MCP server** (`mcp.apify.com`): the Actor surfaces as a callable tool. The input schema is self-documenting, so an LLM can construct correct calls from the tool description without external context. Pay per call via x402 USDC on Base or Skyfire managed tokens.

**Apify SDK** (Python):

```python
from apify_client import ApifyClient

client = ApifyClient(token=API_TOKEN)
run = client.actor("shelvick/smart-page-fetcher").call(
    run_input={"urls": ["https://example.com"], "outputs": ["markdown"]}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["url"], item["status"], item.get("outputs", {}).get("markdown"))
```

**REST API**: `/run-sync-get-dataset-items` for batches that fit under 5 minutes, the async `/runs` endpoint for larger batches.

### Pricing

Pay-per-event, billed only on success. Each URL is charged once at the tier that actually produced its content — `basic`, `js`, or `stealth`. `failed` and `deferred` URLs are free. A single Actor-start event is amortized across the whole batch. Higher tiers cost more because they involve more infrastructure (a real browser at tier 2, a real browser plus residential proxy at tier 3); on a typical mixed batch most URLs land on tier 1 and the effective per-URL cost is dominated by that floor.

See the **Pricing** tab on this Store page for the current per-event rates and any active subscriber discounts.

### Errors

The run itself is marked **FAILED** only on input validation problems:

- `urls` empty or missing, with no `url` either
- A URL doesn't start with `http://` or `https://`
- A `urls` object-form entry carries a header name outside the allowlist (see *Per-URL request headers* above), or an unknown key
- All three tier flags set to `false` (no tier allowed)
- `screenshot` or `a11y` requested but the tier chain excludes JS and stealth

Per-URL fetch failures don't fail the run — they land in the dataset as `failed` records with a `tier_errors` map. Common reasons:

- `http_404` / `http_410` / `http_5xx` — target returned a terminal HTTP error
- `non_html_content_type: <type>` — target returned binary, JSON, or PDF; we only handle HTML/XML
- `navigation_failed` — browser couldn't reach the page (DNS, TLS, timeout)
- `upstream_unsolved` — stealth tier couldn't bypass the target's bot defenses
- `upstream_policy_block` — stealth backend refused the URL by policy (target on its denylist); not retried
- `upstream_rate_limited` — backend hit a rate limit; retried internally before giving up

### Performance expectations

Latency depends on the realized tier:

- Tier 1: 100–500 ms per URL, fired in parallel up to 50 at a time
- Tier 2: 3–10 s per URL, ~3 in parallel sharing one browser (3-5 s of browser startup amortized across the batch)
- Tier 3: 15–90 s per URL, ~5 in parallel; the long tail is bot-defense challenge solving

A typical 50-URL mixed batch finishes in 30-90 seconds. Pure tier-1 batches finish in 2-5 seconds. The Actor's runtime budget defaults to 4 minutes 30 seconds (270000 ms) — raise it up to 60 minutes for very large batches.

### FAQ

**What if I know all my URLs need JavaScript?**
Set `basic: "false"`. The chain starts at JS and skips the wasted tier-1 attempt.

**What if I know my URLs are heavily defended?**
Set `stealth: "true"`. The chain starts directly at the stealth tier — saves the cost of two failed lower tiers.

**Can I cap costs at JS-tier price and refuse stealth?**
Yes: `stealth: "false"`. URLs that fail JS will come back as `failed` records (zero charge) instead of escalating to the expensive tier.

**What if my batch is too big to finish in 5 minutes?**
Raise `runtime_budget_ms` up to 3600000 (60 minutes) and use Apify's async run endpoint. Anything not finished before the budget runs out returns as `deferred` and you can retry just those.

**Do I get charged if a fetch fails?**
No. Charges fire only after a success record is pushed to the dataset. Failed and deferred URLs cost you nothing per-URL — the actor-start fee is the only baseline.

**Do screenshot and a11y work on tier 1?**
No — they require a real browser. Requesting them with `js: "false"` and `stealth: "false"` is a configuration error and the run will fail at validation time. With `auto`, the start tier is bumped automatically.

**Can I crawl a site with this?**
Not directly. The Actor takes a list of URLs and fetches them; it does not follow links. Use the `links` output to feed a second invocation if you want a one-level crawl.

**Is the returned HTML unmodified?**
Yes — byte-for-byte what the target server returned, with no scripts or wrappers injected by us or by the storage layer. Useful when feeding the HTML to a DOM parser, a diff tool, or an LLM that's particular about its inputs.

### What this doesn't do

- **No authentication.** Read-only, unauthenticated. Per-URL headers are limited to content-negotiation and polite-identification headers (`Accept`, `Accept-Language`, `User-Agent`, `Referer`, etc.); credential or session headers (`Cookie`, `Authorization`, `X-*`) are rejected at input validation. The fetch is anonymous from the target's perspective.
- **No forms or interactions.** This is a fetcher, not a browser-automation tool. Use a dedicated Actor for clicking, scrolling, or form submission.
- **No automatic pagination.** Pass the paginated URLs as a batch yourself.
- **No PDF or binary content.** HTML/XML only. Non-HTML responses come back as `failed` with `non_html_content_type`.
- **No retries on terminal failures.** The stealth tier retries internally on transient backend errors, but a 404 or unsolved challenge is final — we don't re-queue it.

For workflows that need any of these, this Actor is the right primitive to *build on* — call it from your own orchestrator and handle the higher-level loop there.

# Actor input Schema

## `urls` (type: `array`):

List of URLs to fetch. The Actor processes them as a batch: tier-1 (plain HTTP) is tried for all URLs in parallel first, escalating any that fail to tier-2 (JavaScript render), and only the still-failed ones to tier-3 (stealth + residential proxy). Browser launch and Actor overhead are amortized across the batch, so larger batches lower the effective per-URL cost. Each entry is either a string (URL only) or an object {"url": "https://...", "headers": {"Accept": "application/json"}} to set request headers for that URL. Allowed header names: Accept, Accept-Language, Accept-Encoding, User-Agent, Referer, Content-Type. Anything else (Cookie, Authorization, X-\*) is rejected to prevent the Actor from being used as an authenticated session proxy.

## `basic` (type: `string`):

Controls the plain-HTTP tier. 'auto' lets it participate in the escalation chain (lowest cost first). 'false' skips it (use when callers know JS is required). 'true' forces the chain to start here (the default behavior in most cases).

## `js` (type: `string`):

Controls the JavaScript-render tier (real browser, no stealth, no proxy). 'auto' uses it after basic fails. 'true' starts the chain here when caller knows JS is needed. 'false' caps cost at basic (or jumps straight to stealth if basic fails too).

## `stealth` (type: `string`):

Controls the stealth tier (anti-bot bypass with rotating residential proxy). 'auto' uses it as the last fallback. 'true' starts here when caller knows the target has aggressive defenses (saves the cost of two failed lower tiers). 'false' caps the per-URL max at the JS-render tier price.

## `outputs` (type: `array`):

Output formats to compute per URL — derived from the same fetched HTML, no extra fetch charge. Choose any combination of: html (full raw HTML, returned as a Key-Value Store URL), text (boilerplate-stripped visible text), markdown (page content as Markdown), a11y (browser accessibility tree, requires JS render or higher, returned as a Key-Value Store URL), screenshot (full-page PNG, requires JS render or higher, returned as a Key-Value Store URL), links (every anchor as {url, text, title}), json\_ld (Schema.org JSON-LD blocks), og (OpenGraph values with og: prefix stripped), meta (other meta tags as a flat dict including twitter:\*). Unknown formats cause the run to fail.

## `runtime_budget_ms` (type: `integer`):

Total wall-clock budget for the whole batch. When exhausted, unprocessed URLs come back as 'deferred' records (zero charge) and the run ends cleanly. Default 270000 (4m30s) keeps synchronous callers under Apify's 5-minute sync API timeout with headroom. Raise up to 3600000 (60m) for large async batches.

## `country` (type: `string`):

Optional ISO-3166-1 alpha-2 country code (e.g. 'US', 'GB', 'DE'). Forwarded to the stealth tier's residential proxy for geo-targeting. Ignored by basic and JS tiers.

## Actor input object example

```json
{
  "urls": [
    "https://example.com",
    {
      "url": "https://example.org",
      "headers": {
        "Accept": "application/json"
      }
    }
  ],
  "basic": "auto",
  "js": "auto",
  "stealth": "auto",
  "outputs": [
    "html",
    "markdown"
  ],
  "runtime_budget_ms": 270000
}
```

# Actor output Schema

## `results` (type: `string`):

One dataset record per input URL — status (success/failed/deferred), realized tier, and the requested outputs (HTML/text/Markdown/a11y/screenshot URL/links/JSON-LD/OpenGraph/meta).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://example.com",
        {
            "url": "https://example.org",
            "headers": {
                "Accept": "application/json"
            }
        }
    ],
    "basic": "auto",
    "js": "auto",
    "stealth": "auto",
    "outputs": [
        "html",
        "markdown"
    ],
    "runtime_budget_ms": 270000
};

// Run the Actor and wait for it to finish
const run = await client.actor("shelvick/smart-page-fetcher").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [
        "https://example.com",
        {
            "url": "https://example.org",
            "headers": { "Accept": "application/json" },
        },
    ],
    "basic": "auto",
    "js": "auto",
    "stealth": "auto",
    "outputs": [
        "html",
        "markdown",
    ],
    "runtime_budget_ms": 270000,
}

# Run the Actor and wait for it to finish
run = client.actor("shelvick/smart-page-fetcher").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://example.com",
    {
      "url": "https://example.org",
      "headers": {
        "Accept": "application/json"
      }
    }
  ],
  "basic": "auto",
  "js": "auto",
  "stealth": "auto",
  "outputs": [
    "html",
    "markdown"
  ],
  "runtime_budget_ms": 270000
}' |
apify call shelvick/smart-page-fetcher --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=shelvick/smart-page-fetcher",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Smart Page Fetcher",
        "description": "Fetch a batch of URLs adaptively: cheap HTTP for static pages, browser render for JavaScript pages, stealth+residential proxy only for actively defended pages. Pay per URL by the difficulty that actually worked, with browser launches amortized across the batch.",
        "version": "0.0",
        "x-build-id": "rCePdavLRGrC4Vudb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/shelvick~smart-page-fetcher/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-shelvick-smart-page-fetcher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/shelvick~smart-page-fetcher/runs": {
            "post": {
                "operationId": "runs-sync-shelvick-smart-page-fetcher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/shelvick~smart-page-fetcher/run-sync": {
            "post": {
                "operationId": "run-sync-shelvick-smart-page-fetcher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "URLs",
                        "minItems": 1,
                        "maxItems": 500,
                        "type": "array",
                        "description": "List of URLs to fetch. The Actor processes them as a batch: tier-1 (plain HTTP) is tried for all URLs in parallel first, escalating any that fail to tier-2 (JavaScript render), and only the still-failed ones to tier-3 (stealth + residential proxy). Browser launch and Actor overhead are amortized across the batch, so larger batches lower the effective per-URL cost. Each entry is either a string (URL only) or an object {\"url\": \"https://...\", \"headers\": {\"Accept\": \"application/json\"}} to set request headers for that URL. Allowed header names: Accept, Accept-Language, Accept-Encoding, User-Agent, Referer, Content-Type. Anything else (Cookie, Authorization, X-*) is rejected to prevent the Actor from being used as an authenticated session proxy."
                    },
                    "basic": {
                        "title": "Basic HTTP tier",
                        "enum": [
                            "auto",
                            "true",
                            "false"
                        ],
                        "type": "string",
                        "description": "Controls the plain-HTTP tier. 'auto' lets it participate in the escalation chain (lowest cost first). 'false' skips it (use when callers know JS is required). 'true' forces the chain to start here (the default behavior in most cases).",
                        "default": "auto"
                    },
                    "js": {
                        "title": "JavaScript render tier",
                        "enum": [
                            "auto",
                            "true",
                            "false"
                        ],
                        "type": "string",
                        "description": "Controls the JavaScript-render tier (real browser, no stealth, no proxy). 'auto' uses it after basic fails. 'true' starts the chain here when caller knows JS is needed. 'false' caps cost at basic (or jumps straight to stealth if basic fails too).",
                        "default": "auto"
                    },
                    "stealth": {
                        "title": "Stealth + proxy tier",
                        "enum": [
                            "auto",
                            "true",
                            "false"
                        ],
                        "type": "string",
                        "description": "Controls the stealth tier (anti-bot bypass with rotating residential proxy). 'auto' uses it as the last fallback. 'true' starts here when caller knows the target has aggressive defenses (saves the cost of two failed lower tiers). 'false' caps the per-URL max at the JS-render tier price.",
                        "default": "auto"
                    },
                    "outputs": {
                        "title": "Output formats",
                        "type": "array",
                        "description": "Output formats to compute per URL — derived from the same fetched HTML, no extra fetch charge. Choose any combination of: html (full raw HTML, returned as a Key-Value Store URL), text (boilerplate-stripped visible text), markdown (page content as Markdown), a11y (browser accessibility tree, requires JS render or higher, returned as a Key-Value Store URL), screenshot (full-page PNG, requires JS render or higher, returned as a Key-Value Store URL), links (every anchor as {url, text, title}), json_ld (Schema.org JSON-LD blocks), og (OpenGraph values with og: prefix stripped), meta (other meta tags as a flat dict including twitter:*). Unknown formats cause the run to fail.",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "html",
                            "markdown"
                        ]
                    },
                    "runtime_budget_ms": {
                        "title": "Total runtime budget (ms)",
                        "minimum": 30000,
                        "maximum": 3600000,
                        "type": "integer",
                        "description": "Total wall-clock budget for the whole batch. When exhausted, unprocessed URLs come back as 'deferred' records (zero charge) and the run ends cleanly. Default 270000 (4m30s) keeps synchronous callers under Apify's 5-minute sync API timeout with headroom. Raise up to 3600000 (60m) for large async batches.",
                        "default": 270000
                    },
                    "country": {
                        "title": "Proxy geo (stealth tier only)",
                        "pattern": "^[A-Z]{2}$",
                        "type": "string",
                        "description": "Optional ISO-3166-1 alpha-2 country code (e.g. 'US', 'GB', 'DE'). Forwarded to the stealth tier's residential proxy for geo-targeting. Ignored by basic and JS tiers."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
