# RAG Web Crawler: Clean Markdown + Token-Sized Chunks (`commonelements/rag-ready-crawler`) Actor

Turn any website into embeddings-ready chunks for RAG and vector databases. Structure-aware token-sized chunking, clean LLM-ready markdown, per-chunk citations and metadata, dedup, and junk filtering. Pay per result, no surprise compute bills.

- **URL**: https://apify.com/commonelements/rag-ready-crawler.md
- **Developed by:** [Harry Schoeller](https://apify.com/commonelements) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$2.00 / 1,000 dataset item scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## RAG Web Crawler — Clean Markdown + Token-Sized Chunks, Pay-Per-Result

Turn any website into **embeddings-ready chunks** with citations and predictable
per-chunk pricing. No CSS tuning, no runaway compute bills.

Generic crawlers hand you raw pages and make you build the RAG pipeline yourself.
This actor hands you clean, token-sized, deduplicated, citable chunks — at a fixed
price **per chunk you keep**.

### What it does

- **Clean LLM-ready Markdown** — `@mozilla/readability` strips nav, footers, ads,
  and cookie banners; `turndown` + GFM converts the cleaned DOM to Markdown with
  heading hierarchy, fenced code blocks, and tables preserved.
- **Structure-aware, token-budgeted chunking** — splits on the heading tree, then
  recursively sub-splits oversized sections to your token budget (default 512)
  with overlap (default 75). Code blocks and tables are kept intact, never split
  mid-block.
- **Rich per-chunk provenance** — every chunk ships source URL + deep anchor,
  page title, full headings path, content hash, token count, content type, and
  language for metadata-filtered vector search and deep-link citations.
- **Dedup + junk filtering** — exact content-hash dedup plus 64-bit SimHash
  near-duplicate collapsing, and low-information / nav-residue chunk filtering.
- **Four output formats** — `chunks-jsonl` (one record per chunk), `markdown`
  (one record per page), `langchain` (drop-in `{page_content, metadata}` Document
  JSON), and `jsonl-bulk` (flat one-record-per-chunk for DB/`COPY`/pgvector).
- **Incremental / delta sync** — on scheduled re-runs, only NEW or CHANGED pages
  are re-emitted (and billed). Makes daily/weekly crawls cheap.
- **Budget guarantee** — `maxPages` is a hard ceiling; billing is per emitted
  result, so a runaway crawl can never produce a runaway bill.

### Incremental / delta sync — cheap scheduled re-runs

Turn on **Incremental sync** (`incremental: true`) and schedule the actor to run
daily or weekly. The first run does a full crawl and seeds a per-URL content-hash
state in a named key-value store. Every later run crawls the site, but only
re-emits chunks for pages that are **new** or **changed** — unchanged pages cost
nothing. A typical weekly docs re-crawl re-emits a handful of pages instead of
hundreds.

- **State is automatic.** The state store name defaults to a deterministic hash of
  your start URLs, so a scheduled task reuses its own prior state with zero config.
  Set `stateStoreName` explicitly to share state across tasks/schedules.
- **`forceFullCrawl: true`** re-emits everything and rebuilds the baseline — use
  after changing chunking settings or to refresh a stale index.
- **`emitDeletions: true`** writes a tombstone record (`{ deleted: true, url, ... }`)
  to a separate `deletions` dataset for every URL that disappeared since the last
  run, so downstream vector stores can purge stale vectors. Tombstones are not billed.
- When incremental is ON, each emitted record carries a `change_status`
  (`new` | `changed`) in its metadata.

The run summary (`OUTPUT` key-value record) includes a delta block:
`pages_new`, `pages_changed`, `pages_unchanged`, `pages_deleted`,
`chunks_skipped_unchanged` (the spend you saved), `state_store`, `prior_run_id`.

When all incremental options are OFF (the default), behavior and output are
byte-for-byte identical to v1.0.

### Output (chunks-jsonl)

```json
{
  "id": "a1f3c9e29b2c4d10",
  "url": "https://docs.example.com/guide/install",
  "title": "Getting Started — Example Docs",
  "chunkIndex": 3,
  "chunkTotal": 11,
  "headingsPath": ["Getting Started", "Setup", "Installation"],
  "text": "## Installation\n\nInstall via npm:\n\n```bash\nnpm install crawlee\n```",
  "tokenEstimate": 498,
  "fetchedAt": "2026-06-20T14:02:11Z",
  "content_hash": "sha256:...",
  "metadata": {
    "source_url": "https://docs.example.com/guide/install",
    "deep_link": "https://docs.example.com/guide/install#installation",
    "anchor": "installation",
    "canonical_url": "https://docs.example.com/guide/install",
    "page_title": "Getting Started — Example Docs",
    "char_count": 2104,
    "content_type": "mixed",
    "language": "en",
    "last_modified": null,
    "crawl_timestamp": "2026-06-20T14:02:11Z"
  }
}
````

Each record maps 1:1 to a vector-DB upsert: `{ id, values=embed(text), metadata }`.

### Output (langchain)

One record per chunk, drop-in for LangChain — `[Document(**r) for r in dataset]`:

```json
{
  "page_content": "## Installation\n\nInstall via npm...",
  "metadata": {
    "id": "a1f3c9e29b2c4d10",
    "source": "https://docs.example.com/guide/install",
    "title": "Getting Started — Example Docs",
    "deep_link": "https://docs.example.com/guide/install#installation",
    "canonical_url": "https://docs.example.com/guide/install",
    "headings_path": ["Getting Started", "Setup", "Installation"],
    "chunk_index": 3,
    "chunk_total": 11,
    "content_type": "mixed",
    "language": "en",
    "token_estimate": 498,
    "char_count": 2104,
    "content_hash": "sha256:...",
    "last_modified": null,
    "crawl_timestamp": "2026-06-20T14:02:11Z"
  }
}
```

### Output (jsonl-bulk)

Fully flat one-record-per-chunk for generic bulk import (DB `COPY` / pgvector):

```json
{
  "id": "a1f3c9e29b2c4d10",
  "text": "## Installation\n\nInstall via npm...",
  "source_url": "https://docs.example.com/guide/install",
  "deep_link": "https://docs.example.com/guide/install#installation",
  "canonical_url": "https://docs.example.com/guide/install",
  "title": "Getting Started — Example Docs",
  "headings_path": "Getting Started > Setup > Installation",
  "chunk_index": 3,
  "chunk_total": 11,
  "content_type": "mixed",
  "language": "en",
  "token_estimate": 498,
  "char_count": 2104,
  "content_hash": "sha256:...",
  "last_modified": null,
  "crawl_timestamp": "2026-06-20T14:02:11Z"
}
```

### Input

See `.actor/input_schema.json`. Key fields: `startUrls`, `crawlScope`,
`maxCrawlDepth`, `maxPages`, `renderJs`, `outputFormat`, `chunkSize`,
`chunkOverlap`, `dedupNearDuplicates`, `filterJunkChunks`, and the incremental
sync fields `incremental`, `forceFullCrawl`, `stateStoreName`, `emitDeletions`.

### Pricing

Pay-Per-Event. Billable unit = one emitted dataset item. Deduped and
junk-filtered chunks are **not** billed.

| Event | Price |
|---|---|
| Per chunk emitted (chunks-jsonl) | $0.0008 / chunk ($0.80 / 1,000) |
| Per page emitted (markdown) | $0.002 / page ($2.00 / 1,000) |

### Run locally

```bash
npm install
npm run build
apify run    # reads .actor/INPUT.json
```

### Roadmap (v1.2+)

Inline embeddings, direct vector-DB push (Pinecone/Qdrant/Weaviate/pgvector),
`missedGraceRuns` before tombstoning, Standby low-latency mode.

# Actor input Schema

## `startUrls` (type: `array`):

One or more seed URLs to crawl.

## `crawlScope` (type: `string`):

Which links to follow.

## `maxCrawlDepth` (type: `integer`):

Link hops from a start URL. 0 = only the start URLs.

## `includeGlobs` (type: `array`):

Only crawl URLs matching these glob patterns, e.g. https://x.com/docs/\*\*

## `excludeGlobs` (type: `array`):

Skip URLs matching these globs, e.g. **/tag/**, \*\*?print=1

## `maxPages` (type: `integer`):

Hard ceiling on pages crawled. Protects against runaway crawls and runaway bills.

## `renderJs` (type: `boolean`):

Use a headless browser for JS-heavy/SPA sites. Off = fast static HTTP crawl (recommended default).

## `outputFormat` (type: `string`):

RAG chunks = one record per chunk (recommended). Markdown = one record per page. LangChain = {page\_content, metadata} Document JSON per chunk. JSONL bulk = flat one-record-per-chunk for bulk import.

## `enableChunking` (type: `boolean`):

Split each page into token-sized retrieval units. Forced ON when output format is RAG chunks.

## `chunkSize` (type: `integer`):

Target token budget per chunk (sized for your embedding model).

## `chunkOverlap` (type: `integer`):

Token overlap between adjacent chunks (typically 10-20% of chunk size).

## `dedupNearDuplicates` (type: `boolean`):

Drop exact + near-duplicate chunks (shared boilerplate, syndicated content) via content hash + SimHash.

## `filterJunkChunks` (type: `boolean`):

Drop near-empty, nav-residue, and low-information chunks.

## `proxyConfiguration` (type: `object`):

Proxy configuration for the crawl. Apify Proxy recommended for large or rate-limited sites.

## `incremental` (type: `boolean`):

When ON, pages whose content is unchanged since the last run are crawled but NOT re-emitted (not billed). Requires the actor to persist state in a named key-value store (see 'State store name').

## `forceFullCrawl` (type: `boolean`):

Override: emit every page this run even if incremental is ON, then overwrite the saved state. Use after changing chunking settings or to rebuild a stale index.

## `stateStoreName` (type: `string`):

Named key-value store that persists per-URL content hashes between runs. Defaults to a deterministic name derived from the start URLs so scheduled re-runs of the same task share state automatically. Set explicitly to share state across tasks/schedules.

## `emitDeletions` (type: `boolean`):

When incremental is ON, emit a deletion record for each URL seen last run but missing this run, so downstream vector stores can purge stale vectors. Tombstones are NOT billed.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://docs.apify.com"
    }
  ],
  "crawlScope": "same-subdomain",
  "maxCrawlDepth": 3,
  "includeGlobs": [],
  "excludeGlobs": [],
  "maxPages": 200,
  "renderJs": false,
  "outputFormat": "chunks-jsonl",
  "enableChunking": true,
  "chunkSize": 512,
  "chunkOverlap": 75,
  "dedupNearDuplicates": true,
  "filterJunkChunks": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "incremental": false,
  "forceFullCrawl": false,
  "stateStoreName": "",
  "emitDeletions": false
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `runSummary` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://docs.apify.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("commonelements/rag-ready-crawler").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://docs.apify.com" }] }

# Run the Actor and wait for it to finish
run = client.actor("commonelements/rag-ready-crawler").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://docs.apify.com"
    }
  ]
}' |
apify call commonelements/rag-ready-crawler --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=commonelements/rag-ready-crawler",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RAG Web Crawler: Clean Markdown + Token-Sized Chunks",
        "description": "Turn any website into embeddings-ready chunks for RAG and vector databases. Structure-aware token-sized chunking, clean LLM-ready markdown, per-chunk citations and metadata, dedup, and junk filtering. Pay per result, no surprise compute bills.",
        "version": "1.1",
        "x-build-id": "cqhSZTFeSPfHr8CBl"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/commonelements~rag-ready-crawler/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-commonelements-rag-ready-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/commonelements~rag-ready-crawler/runs": {
            "post": {
                "operationId": "runs-sync-commonelements-rag-ready-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/commonelements~rag-ready-crawler/run-sync": {
            "post": {
                "operationId": "run-sync-commonelements-rag-ready-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more seed URLs to crawl.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "crawlScope": {
                        "title": "Crawl scope",
                        "enum": [
                            "same-domain",
                            "same-subdomain",
                            "same-path",
                            "start-urls-only"
                        ],
                        "type": "string",
                        "description": "Which links to follow.",
                        "default": "same-subdomain"
                    },
                    "maxCrawlDepth": {
                        "title": "Max crawl depth",
                        "minimum": 0,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Link hops from a start URL. 0 = only the start URLs.",
                        "default": 3
                    },
                    "includeGlobs": {
                        "title": "Include URL globs",
                        "type": "array",
                        "description": "Only crawl URLs matching these glob patterns, e.g. https://x.com/docs/**",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "excludeGlobs": {
                        "title": "Exclude URL globs",
                        "type": "array",
                        "description": "Skip URLs matching these globs, e.g. **/tag/**, **?print=1",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPages": {
                        "title": "Max pages (hard cap)",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Hard ceiling on pages crawled. Protects against runaway crawls and runaway bills.",
                        "default": 200
                    },
                    "renderJs": {
                        "title": "Render JavaScript",
                        "type": "boolean",
                        "description": "Use a headless browser for JS-heavy/SPA sites. Off = fast static HTTP crawl (recommended default).",
                        "default": false
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "chunks-jsonl",
                            "markdown",
                            "langchain",
                            "jsonl-bulk"
                        ],
                        "type": "string",
                        "description": "RAG chunks = one record per chunk (recommended). Markdown = one record per page. LangChain = {page_content, metadata} Document JSON per chunk. JSONL bulk = flat one-record-per-chunk for bulk import.",
                        "default": "chunks-jsonl"
                    },
                    "enableChunking": {
                        "title": "Enable chunking",
                        "type": "boolean",
                        "description": "Split each page into token-sized retrieval units. Forced ON when output format is RAG chunks.",
                        "default": true
                    },
                    "chunkSize": {
                        "title": "Chunk size (tokens)",
                        "minimum": 64,
                        "maximum": 4096,
                        "type": "integer",
                        "description": "Target token budget per chunk (sized for your embedding model).",
                        "default": 512
                    },
                    "chunkOverlap": {
                        "title": "Chunk overlap (tokens)",
                        "minimum": 0,
                        "maximum": 1024,
                        "type": "integer",
                        "description": "Token overlap between adjacent chunks (typically 10-20% of chunk size).",
                        "default": 75
                    },
                    "dedupNearDuplicates": {
                        "title": "Collapse near-duplicate chunks",
                        "type": "boolean",
                        "description": "Drop exact + near-duplicate chunks (shared boilerplate, syndicated content) via content hash + SimHash.",
                        "default": true
                    },
                    "filterJunkChunks": {
                        "title": "Filter junk chunks",
                        "type": "boolean",
                        "description": "Drop near-empty, nav-residue, and low-information chunks.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy",
                        "type": "object",
                        "description": "Proxy configuration for the crawl. Apify Proxy recommended for large or rate-limited sites.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "incremental": {
                        "title": "Incremental sync (only new/changed pages)",
                        "type": "boolean",
                        "description": "When ON, pages whose content is unchanged since the last run are crawled but NOT re-emitted (not billed). Requires the actor to persist state in a named key-value store (see 'State store name').",
                        "default": false
                    },
                    "forceFullCrawl": {
                        "title": "Force full crawl (ignore prior state)",
                        "type": "boolean",
                        "description": "Override: emit every page this run even if incremental is ON, then overwrite the saved state. Use after changing chunking settings or to rebuild a stale index.",
                        "default": false
                    },
                    "stateStoreName": {
                        "title": "State store name",
                        "type": "string",
                        "description": "Named key-value store that persists per-URL content hashes between runs. Defaults to a deterministic name derived from the start URLs so scheduled re-runs of the same task share state automatically. Set explicitly to share state across tasks/schedules.",
                        "default": ""
                    },
                    "emitDeletions": {
                        "title": "Emit tombstones for deleted pages",
                        "type": "boolean",
                        "description": "When incremental is ON, emit a deletion record for each URL seen last run but missing this run, so downstream vector stores can purge stale vectors. Tombstones are NOT billed.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```