# AI Training Data Curator (`vamsi-krishna/ai-training-data-curator`) Actor

Turn any public website into a clean LLM training dataset. Crawl docs, blogs, and help centers, extract readable text, filter by language, remove duplicates, and export JSON, JSONL, or CSV for fine-tuning, RAG, and AI workflows. No coding required.

- **URL**: https://apify.com/vamsi-krishna/ai-training-data-curator.md
- **Developed by:** [Vamsi Krishna](https://apify.com/vamsi-krishna) (community)
- **Categories:** AI, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 66.7% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 saved training pages

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AI Training Data Curator — Turn websites into LLM training data

**Build clean datasets from any public website** for LLM fine-tuning, RAG (retrieval-augmented generation), and AI model training — without writing scrapers or cleaning HTML by hand.

Paste your URLs, run the Actor, and download structured records with page title, body text, language, and word count — ready for JSON, JSONL, or CSV export.

---

### What this Actor does

1. **Crawls** pages from your start URLs (and optional sitemap)
2. **Extracts** clean article-style text — navigation menus and boilerplate are filtered out
3. **Filters** by language and minimum text length
4. **Removes** near-duplicate pages automatically
5. **Exports** one row per page to your Apify dataset

You get training-ready text, not raw HTML.

---

### Who is it for?

- **AI & ML teams** preparing fine-tuning or pre-training corpora
- **RAG builders** indexing documentation, blogs, or knowledge bases
- **Researchers** collecting web text datasets at scale
- **Product teams** turning help centers or marketing sites into searchable AI content

---

### Quick start (3 steps)

1. **Start URLs** — Add one or more website URLs to crawl (or paste a sitemap URL).
2. **Max pages** — Choose how many pages to collect (default: 10).
3. **Run** — Open the **Dataset** tab when finished and download as JSON, JSONL, CSV, or Excel.

**Tip:** For a single site section, use **Crawl strategy → seeds-only**. To follow internal links, use **recurse** and keep **Stay within seed domains** enabled.

---

### Ready-made tasks (Apify Console)

Create these public tasks under **Actor → Tasks** to improve Store discoverability:

| Task | Input highlights |
|------|------------------|
| **Quick demo — single page** | `maxPages: 1`, `seeds-only`, `https://httpbin.org/html` |
| **Documentation crawl** | `recurse`, `stayWithinDomain: true`, `language: en`, `maxPages: 100` |
| **Blog archive from sitemap** | `sitemapUrl` + `maxPages: 500`, `deduplicate: true` |

---

### Store monetization (Apify Console)

Under **Actor → Monetization**:

1. Turn **off** “Pay per event + platform usage” so users see predictable pricing.
2. Set per-page event price to cover platform cost (current runs are ~$0.0005/page compute + margin).
3. Enable **Store discounts** with ~10% step-down per tier (Free → Bronze → Silver → Gold).

---

### What you get (output)

Each saved page becomes one dataset record:

| Field | What it contains |
|-------|------------------|
| `url` | Page address |
| `title` | Page title |
| `text` | Clean extracted body text |
| `language` | Detected language (e.g. `en`) |
| `wordCount` | Number of words |
| `author` | Author name, when detected |
| `publishedDate` | Publish date, when detected |

**Example record:**

```json
{
  "url": "https://example.com/blog/getting-started",
  "title": "Getting Started with RAG",
  "text": "Retrieval-augmented generation combines search with large language models...",
  "language": "en",
  "wordCount": 842,
  "author": "Jane Doe",
  "publishedDate": "2025-03-15"
}
````

***

### Common settings

| Setting | What it does |
|---------|----------------|
| **Start URLs** | Where crawling begins |
| **Sitemap URL** | Optional — load many URLs from `sitemap.xml` |
| **Max pages** | Stop after this many pages (1–100,000) |
| **Language filter** | Keep only pages in one language (e.g. `en`) |
| **Minimum text length** | Skip very short pages (menus, stubs) |
| **Remove duplicates** | Drop near-duplicate content (recommended: on) |
| **Crawl strategy** | `recurse` = follow links; `seeds-only` = only listed URLs |
| **Stay within seed domains** | Do not leave the original website |
| **Export rejected pages** | Optional second dataset showing filtered-out URLs |
| **Proxy configuration** | Use if the site blocks automated access |

***

### Example input

```json
{
  "startUrls": [{ "url": "https://docs.example.com" }],
  "maxPages": 100,
  "language": "en",
  "minTextLength": 200,
  "deduplicate": true,
  "crawlStrategy": "seeds-only",
  "stayWithinDomain": true
}
```

***

### Download your dataset

After a run completes:

1. Go to the run in [Apify Console](https://console.apify.com)
2. Open the **Dataset** tab
3. Click **Export** → choose JSON, JSONL, CSV, or Excel

Use the dataset directly in Hugging Face, OpenAI fine-tuning pipelines, vector databases, or your own ML workflow.

***

### FAQ

**Can I use this for RAG?**\
Yes. The output is clean text per URL — ideal for chunking and embedding into Pinecone, Weaviate, Chroma, or similar vector stores.

**Does it work on documentation sites and blogs?**\
Yes. It is designed for article-style pages: docs, blogs, news, help centers, and marketing content.

**Does it remove duplicate pages?**\
Yes, by default. Near-duplicate pages are detected and only one copy is kept.

**What formats can I export?**\
JSON, JSONL, CSV, and Excel from the Apify dataset. JSONL is common for LLM training pipelines.

**Do I need to code?**\
No. Configure inputs in the Apify UI and download results. API access is available if you want to automate runs.

**Is crawling always allowed?**\
No. You must have permission to access and use the content you crawl. See **Legal notice** below.

***

### Legal notice

You are responsible for complying with each website's terms of service and applicable copyright law. Only crawl sites you are allowed to access, respect `robots.txt`, and use extracted data in line with applicable regulations.

# Actor input Schema

## `startUrls` (type: `array`):

Seed URLs to begin crawling. Provide at least one start URL or a sitemapUrl.

## `sitemapUrl` (type: `string`):

Optional sitemap.xml URL; URLs are merged with startUrls

## `maxPages` (type: `integer`):

Maximum number of pages to crawl and export.

## `maxSitemapUrls` (type: `integer`):

Cap merged seed/sitemap URLs before crawling (defaults to maxPages)

## `minTextLength` (type: `integer`):

Minimum extracted text length in characters; shorter pages are filtered out.

## `language` (type: `string`):

ISO 639-1 code (e.g. en). Leave empty to accept all.

## `deduplicate` (type: `boolean`):

When true, near-duplicate pages are removed using SimHash fingerprinting.

## `maxFingerprints` (type: `integer`):

Maximum SimHash fingerprints kept in memory (FIFO eviction when exceeded)

## `crawlStrategy` (type: `string`):

recurse: follow links from seed pages; seeds-only: crawl only seed/sitemap URLs

## `stayWithinDomain` (type: `boolean`):

When recursing, only follow links on seed hostnames

## `includeUrlGlobs` (type: `array`):

Optional glob patterns; enqueued URLs must match at least one when set

## `excludeUrlGlobs` (type: `array`):

Optional glob patterns; matching URLs are never enqueued

## `maxConcurrency` (type: `integer`):

Maximum number of pages fetched in parallel

## `minRequestDelaySecs` (type: `number`):

Minimum delay between requests for politeness (0 = no limit)

## `maxDepth` (type: `integer`):

Maximum link hops from seed URLs (0 = seeds only). Leave empty for unlimited depth.

## `exportRejectedPages` (type: `boolean`):

When true, URLs filtered out during processing are written to a separate 'rejected' dataset with filterReason

## `proxyConfiguration` (type: `object`):

Apify proxy settings for blocked or geo-restricted sites

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://docs.example.com/getting-started"
    }
  ],
  "sitemapUrl": "https://docs.example.com/sitemap.xml",
  "maxPages": 100,
  "minTextLength": 200,
  "language": "en",
  "deduplicate": true,
  "maxFingerprints": 50000,
  "crawlStrategy": "seeds-only",
  "stayWithinDomain": true,
  "maxConcurrency": 3,
  "minRequestDelaySecs": 0,
  "exportRejectedPages": false
}
```

# Actor output Schema

## `dataset` (type: `string`):

Crawled pages with url, title, text, language, word count, author, and published date.

## `rejected` (type: `string`):

Pages filtered out during crawling (only when Export rejected pages is enabled).

## `summary` (type: `string`):

Crawl statistics: pages crawled, saved, filtered, and failed.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://httpbin.org/html"
        }
    ],
    "maxPages": 1,
    "minTextLength": 50,
    "deduplicate": false,
    "crawlStrategy": "seeds-only",
    "maxConcurrency": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("vamsi-krishna/ai-training-data-curator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://httpbin.org/html" }],
    "maxPages": 1,
    "minTextLength": 50,
    "deduplicate": False,
    "crawlStrategy": "seeds-only",
    "maxConcurrency": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("vamsi-krishna/ai-training-data-curator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://httpbin.org/html"
    }
  ],
  "maxPages": 1,
  "minTextLength": 50,
  "deduplicate": false,
  "crawlStrategy": "seeds-only",
  "maxConcurrency": 3
}' |
apify call vamsi-krishna/ai-training-data-curator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=vamsi-krishna/ai-training-data-curator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AI Training Data Curator",
        "description": "Turn any public website into a clean LLM training dataset. Crawl docs, blogs, and help centers, extract readable text, filter by language, remove duplicates, and export JSON, JSONL, or CSV for fine-tuning, RAG, and AI workflows. No coding required.",
        "version": "0.0",
        "x-build-id": "5fO8t22F8I34Cq0vV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/vamsi-krishna~ai-training-data-curator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-vamsi-krishna-ai-training-data-curator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/vamsi-krishna~ai-training-data-curator/runs": {
            "post": {
                "operationId": "runs-sync-vamsi-krishna-ai-training-data-curator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/vamsi-krishna~ai-training-data-curator/run-sync": {
            "post": {
                "operationId": "run-sync-vamsi-krishna-ai-training-data-curator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Seed URLs to begin crawling. Provide at least one start URL or a sitemapUrl.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "sitemapUrl": {
                        "title": "Sitemap URL",
                        "type": "string",
                        "description": "Optional sitemap.xml URL; URLs are merged with startUrls"
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl and export.",
                        "default": 10
                    },
                    "maxSitemapUrls": {
                        "title": "Max sitemap URLs",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Cap merged seed/sitemap URLs before crawling (defaults to maxPages)"
                    },
                    "minTextLength": {
                        "title": "Minimum text length",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Minimum extracted text length in characters; shorter pages are filtered out.",
                        "default": 100
                    },
                    "language": {
                        "title": "Language filter",
                        "type": "string",
                        "description": "ISO 639-1 code (e.g. en). Leave empty to accept all."
                    },
                    "deduplicate": {
                        "title": "Remove duplicates",
                        "type": "boolean",
                        "description": "When true, near-duplicate pages are removed using SimHash fingerprinting.",
                        "default": true
                    },
                    "maxFingerprints": {
                        "title": "Max dedup fingerprints",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum SimHash fingerprints kept in memory (FIFO eviction when exceeded)",
                        "default": 50000
                    },
                    "crawlStrategy": {
                        "title": "Crawl strategy",
                        "enum": [
                            "recurse",
                            "seeds-only"
                        ],
                        "type": "string",
                        "description": "recurse: follow links from seed pages; seeds-only: crawl only seed/sitemap URLs",
                        "default": "seeds-only"
                    },
                    "stayWithinDomain": {
                        "title": "Stay within seed domains",
                        "type": "boolean",
                        "description": "When recursing, only follow links on seed hostnames",
                        "default": true
                    },
                    "includeUrlGlobs": {
                        "title": "Include URL globs",
                        "type": "array",
                        "description": "Optional glob patterns; enqueued URLs must match at least one when set",
                        "items": {
                            "type": "string"
                        }
                    },
                    "excludeUrlGlobs": {
                        "title": "Exclude URL globs",
                        "type": "array",
                        "description": "Optional glob patterns; matching URLs are never enqueued",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum number of pages fetched in parallel",
                        "default": 10
                    },
                    "minRequestDelaySecs": {
                        "title": "Min request delay (seconds)",
                        "minimum": 0,
                        "type": "number",
                        "description": "Minimum delay between requests for politeness (0 = no limit)",
                        "default": 0
                    },
                    "maxDepth": {
                        "title": "Max crawl depth",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum link hops from seed URLs (0 = seeds only). Leave empty for unlimited depth."
                    },
                    "exportRejectedPages": {
                        "title": "Export rejected pages",
                        "type": "boolean",
                        "description": "When true, URLs filtered out during processing are written to a separate 'rejected' dataset with filterReason",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings for blocked or geo-restricted sites"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```