# PDF URL to Markdown, Tables & RAG Extractor (`thescrapelab/apify-pdf-url-scraper`) Actor

Extract clean Markdown, page text, tables, metadata, summaries, and AI-ready RAG chunks from PDF URLs.

- **URL**: https://apify.com/thescrapelab/apify-pdf-url-scraper.md
- **Developed by:** [Inus Grobler](https://apify.com/thescrapelab) (community)
- **Categories:** Agents, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PDF to Markdown & AI-Ready Document Extractor

Convert PDF URLs into clean Markdown and structured JSON for AI agents, RAG pipelines, document processing workflows, scraping pipelines, and downstream Apify Actors.

This Actor downloads one PDF URL, extracts page-level content, converts the document to Markdown, optionally uses an OpenRouter LLM for cleanup, and can create source-aware RAG chunks.

### Features

- Convert PDF URLs to clean Markdown.
- Extract page-level text and page-level Markdown.
- Extract PDF metadata such as title, author, subject, creator, producer, dates, page count, file size, hash, and final URL.
- LLM modes enable table extraction and OCR fallback by default.
- Optional LLM cleanup with either the cheap or premium OpenRouter model.
- RAG-ready chunks with page references and source URL.
- Dynamic memory defaults: 512 MB for `no_llm`, 1024 MB for `llm_cheap`, and 2048 MB for `llm_premium`.
- Robust download logic with redirects, realistic headers, retries, PDF signature checks, size limits, and proxy fallback only when needed.
- One dataset item per processed page, so Apify's default result event can be used as per-page pricing.

### Input Options

The public Apify input form has two fields: one PDF URL and one mode.

```json
{
  "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "mode": "no_llm"
}
````

LLM cleanup example:

```json
{
  "pdfUrl": "https://example.com/document.pdf",
  "mode": "llm_cheap"
}
```

Visible fields:

- `pdfUrl`: one PDF URL.
- `mode`: `no_llm`, `llm_cheap`, or `llm_premium`.

Advanced JSON/API fields are still supported for automation and legacy integrations, but they are not shown in the public form.

### Modes

- `no_llm`: fast PDF extraction with no LLM, OCR, or table extraction. This is the lowest-cost mode.
- `llm_cheap`: AI-ready extraction with RAG chunks, table extraction, OCR fallback, and OpenRouter cheap-model cleanup.
- `llm_premium`: AI-ready extraction with RAG chunks, table extraction, OCR fallback, and OpenRouter premium-model cleanup.

### LLM Configuration

LLM usage is optional and off by default for cost control. In production, AI features use the OpenRouter API key configured on the Actor.

OpenRouter is the native provider used by the existing Actor path:

- `OPENROUTER_API_KEY`
- `OPENROUTER_CHEAP_MODEL`
- `OPENROUTER_PREMIUM_MODEL`
- `OPENROUTER_MODEL`
- `OPENROUTER_INPUT_COST_PER_MILLION`
- `OPENROUTER_OUTPUT_COST_PER_MILLION`
- `PDF_TABLE_EXTRACTION_MAX_PAGES`, default `150`, skips structured table extraction above this page count to protect memory and runtime.
- `PDFPLUMBER_ENABLE_TEXT_TABLES` for aggressive whitespace-based table detection, disabled by default to avoid false positives.

Do not ask users to paste API keys into the input form. Configure keys as Actor environment variables or Apify secrets.

### Output Format

The Actor pushes one dataset item per processed page. This means Apify's `apify-default-dataset-item` result event acts as per-page pricing for successful PDFs. Failed PDFs still push one failure row.

If every processed page has empty Markdown, the Actor suppresses dataset output for that PDF so users are not charged for empty page rows.

Document-level Markdown and optional artifacts are saved in the key-value store. Each page row includes document metadata plus the current page's text, Markdown, tables, links, and matching RAG chunks.

```json
{
  "sourceUrl": "https://example.com/document.pdf",
  "finalUrl": "https://example.com/document.pdf",
  "status": "success",
  "recordType": "page",
  "fileName": "document.pdf",
  "fileSizeBytes": 842193,
  "contentHash": "sha256-hash-here",
  "title": "Document title",
  "author": "Author",
  "subject": "Subject",
  "createdDate": "2026-01-01T00:00:00Z",
  "modifiedDate": "2026-02-01T00:00:00Z",
  "pageCount": 12,
  "processedPageCount": 12,
  "language": "en",
  "processedAt": "2026-05-07T00:00:00.000Z",
  "processingDurationMs": 1842,
  "mode": "llm_cheap",
  "inputMode": "llm_cheap",
  "processingMode": "ai_ready",
  "llmPreset": "llm_cheap",
  "page": 1,
  "pageNumber": 1,
  "pageIndex": 0,
  "isFirstPage": true,
  "isLastProcessedPage": false,
  "markdownText": "Markdown for this page",
  "markdown": "Markdown for this page",
  "text": "Raw page text...",
  "pageMarkdownText": "Markdown for this page",
  "pageMarkdown": "Markdown for this page",
  "pageText": "Raw page text...",
  "pages": [
    {
      "page": 1,
      "text": "Raw page text...",
      "markdown": "Markdown for this page",
      "tables": [],
      "links": [],
      "textCharCount": 1234,
      "markdownCharCount": 1250,
      "tableCount": 0,
      "linkCount": 0,
      "source": "native",
      "qualityScore": 260
    }
  ],
  "tables": [
    {
      "tableIndex": 0,
      "page": 1,
      "markdown": "| Item | Price |\\n| --- | --- |\\n| Example | R120 |",
      "rows": [
        {
          "Item": "Example",
          "Price": "R120"
        }
      ],
      "rowCount": 2,
      "columnCount": 2,
      "confidence": 0.82,
      "extractionMethod": "pdfplumber"
    }
  ],
  "ragChunks": [
    {
      "chunkId": "stable-short-id",
      "chunkIndex": 0,
      "pageStart": 1,
      "pageEnd": 2,
      "text": "Chunk text...",
      "markdown": "Chunk markdown...",
      "charCount": 842,
      "tokenEstimate": 211,
      "headings": ["Document heading"],
      "sourceUrl": "https://example.com/document.pdf"
    }
  ],
  "summary": "Optional summary.",
  "keywords": ["optional", "keywords"],
  "extractedData": null,
  "documentStats": {
    "markdownCharCount": 58214,
    "rawTextCharCount": 54008,
    "tableCount": 3,
    "ragChunkCount": 49,
    "emptyPageCount": 0,
    "ocrUsed": false,
    "llmCleanupUsed": false
  },
  "download": {
    "attempts": 1,
    "usedProxy": false,
    "contentType": "application/pdf"
  },
  "outputKeys": {
    "markdown": "OUTPUT_MARKDOWN"
  },
  "documentMarkdownKey": "OUTPUT_MARKDOWN",
  "warnings": [],
  "errors": []
}
```

Failed items are still pushed:

```json
{
  "sourceUrl": "https://example.com/broken.pdf",
  "status": "failed",
  "recordType": "failure",
  "processedAt": "2026-05-07T00:00:00.000Z",
  "errors": [
    {
      "step": "download",
      "message": "Failed to download PDF after retries"
    }
  ],
  "warnings": []
}
```

The full document Markdown is stored in the key-value store under `OUTPUT_MARKDOWN` for single-PDF runs, or `OUTPUT_MARKDOWN_001`, `OUTPUT_MARKDOWN_002`, and so on for batches. The Actor does not build one combined Markdown file for all PDFs, which keeps batch memory usage lower. Dataset items include `documentStats`, `download`, and `outputKeys` objects for monitoring and downstream automation.

### Use Cases

- Convert PDFs to Markdown for AI prompts and agents.
- Prepare PDFs for RAG ingestion and vector databases.
- Extract page-level text with source references.
- Extract tables for finance, procurement, research, and compliance workflows.
- Clean messy PDF text with optional LLM cleanup.
- Process scanned PDFs with OCR fallback.
- Feed downstream Apify Actors with consistent document JSON.

### Cost Notes

- `no_llm` is the cheapest mode.
- `llm_cheap` uses the cheaper OpenRouter model.
- `llm_premium` uses the premium OpenRouter model for harder PDFs.
- The Actor uses 512 MB for `no_llm`, 1024 MB for `llm_cheap`, and 2048 MB for `llm_premium` by default.
- The default run timeout is 3600 seconds on Apify so large LLM PDFs have room to finish.
- OCR and table extraction are off in `no_llm` mode to keep runs cheap.
- OCR fallback and table extraction are enabled in `llm_cheap` and `llm_premium` because those modes carry the higher paid feature set.
- Large text PDFs use a fast native extraction path before heavier cleanup, which keeps `llm_cheap` more efficient.
- Very large PDFs skip structured table extraction by default after `PDF_TABLE_EXTRACTION_MAX_PAGES` to avoid timeout and memory failures.
- Table extraction uses conservative pdfplumber strategies by default. Enable `PDFPLUMBER_ENABLE_TEXT_TABLES=true` only when whitespace-based tables are more important than speed/noise control.
- Long documents are compacted before document-level LLM tasks.
- Page-image export, source PDF saving, diagnostics, OCR, and LLM tasks can increase compute or storage costs.

### Limitations

- Some scanned PDFs require OCR, and OCR quality depends on installed Tesseract language packs.
- Complex, nested, or visually designed tables may not extract perfectly.
- LLM cleanup can improve formatting but may introduce interpretation.
- Very large PDFs may take longer or need advanced page limits for testing.
- Password-protected or encrypted PDFs are not supported.
- Full embedded image extraction is not implemented yet; page PNG export is available for review.

### Local Checks

Install dependencies and run tests:

```bash
python3 -m venv .venv-local
.venv-local/bin/pip install -r requirements.txt
.venv-local/bin/python -m unittest discover
```

Run locally through the Apify runtime or CLI with an input similar to:

```json
{
  "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "mode": "no_llm"
}
```

### FAQ

#### Does it use an LLM by default?

No. The default `no_llm` mode does not use the LLM.

#### Can it process multiple PDFs?

The public form is single-URL by design. Advanced/API batch input with `pdfUrls` is still accepted for automation.

#### Does it support RAG?

Yes. `llm_cheap` and `llm_premium` create source-aware chunks by default. Advanced API users can also enable chunks in other modes.

#### Does it extract tables?

`no_llm` skips table extraction for speed and cost. `llm_cheap` and `llm_premium` enable table extraction by default using `pdfplumber` heuristics. Complex tables may still need review.

#### What happens on broken URLs?

The Actor pushes a failed dataset item with `status: "failed"` and an `errors` array describing the failed step.

#### Why are there only two inputs?

The Apify form shows the options clients actually need: URLs and LLM mode. Advanced controls remain available through JSON/API input for power users and integrations.

# Actor input Schema

## `pdfUrl` (type: `string`):

The PDF URL to convert into page-level Markdown and structured dataset rows.

## `mode` (type: `string`):

No LLM is the fastest and cheapest. LLM modes add table extraction, OCR fallback, RAG chunks, and AI cleanup.

## Actor input object example

```json
{
  "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "mode": "no_llm"
}
```

# Actor output Schema

## `dataset` (type: `string`):

Dataset items containing one row per processed PDF page with page Markdown, page text, metadata, page tables, matching RAG chunks, warnings, and errors.

## `markdown` (type: `string`):

Final Markdown saved as OUTPUT\_MARKDOWN for single-PDF runs. Batch runs save per-PDF keys such as OUTPUT\_MARKDOWN\_001.

## `rawMarkdown` (type: `string`):

Pre-LLM Markdown saved only when LLM cleanup changed at least one page.

## `pageMarkdown` (type: `string`):

Optional page Markdown key-value store artifact.

## `diagnostics` (type: `string`):

Optional page diagnostics and extraction metadata.

## `sourcePdf` (type: `string`):

Original downloaded PDF, saved only when enabled.

## `pageImagesManifest` (type: `string`):

Optional page PNG export manifest.

## `pageImagesZip` (type: `string`):

Optional ZIP archive of rendered page PNGs.

## `files` (type: `string`):

All records in the run's default key-value store.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
};

// Run the Actor and wait for it to finish
const run = await client.actor("thescrapelab/apify-pdf-url-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" }

# Run the Actor and wait for it to finish
run = client.actor("thescrapelab/apify-pdf-url-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
}' |
apify call thescrapelab/apify-pdf-url-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=thescrapelab/apify-pdf-url-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PDF URL to Markdown, Tables & RAG Extractor",
        "description": "Extract clean Markdown, page text, tables, metadata, summaries, and AI-ready RAG chunks from PDF URLs.",
        "version": "0.0",
        "x-build-id": "r5Kz9kW3qcNHtWDIL"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/thescrapelab~apify-pdf-url-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-thescrapelab-apify-pdf-url-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/thescrapelab~apify-pdf-url-scraper/runs": {
            "post": {
                "operationId": "runs-sync-thescrapelab-apify-pdf-url-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/thescrapelab~apify-pdf-url-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-thescrapelab-apify-pdf-url-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "pdfUrl"
                ],
                "properties": {
                    "pdfUrl": {
                        "title": "PDF URL",
                        "pattern": "^https?://.+",
                        "type": "string",
                        "description": "The PDF URL to convert into page-level Markdown and structured dataset rows."
                    },
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "no_llm",
                            "llm_cheap",
                            "llm_premium"
                        ],
                        "type": "string",
                        "description": "No LLM is the fastest and cheapest. LLM modes add table extraction, OCR fallback, RAG chunks, and AI cleanup.",
                        "default": "no_llm"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```