# Kaggle Datasets & Models Scraper (`automation-lab/kaggle-scraper`) Actor

Scrape datasets and ML models from Kaggle including metadata, votes, downloads, and more

- **URL**: https://apify.com/automation-lab/kaggle-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Kaggle Datasets & Models Scraper

Extract metadata for datasets and ML models from [Kaggle](https://www.kaggle.com) — the world's largest data science community with 15 million users, 300,000+ datasets, and a growing library of open-source ML models. No Kaggle account or API key required.

### What Does It Do?

This actor scrapes Kaggle's public API to return structured metadata about datasets and machine learning models. You can filter by keyword, sort by popularity or recency, and collect hundreds of results in seconds.

Use it to discover ML training datasets, benchmark models, track popular research topics, or automate data pipeline discovery — all without a Kaggle login.

### Who Is It For?

🧪 **ML researchers and data scientists** — Find relevant datasets for your next project. Search by topic (climate, NLP, finance) and sort by votes or downloads to surface the best-quality data fast.

🏢 **AI/ML teams at companies** — Audit the Kaggle landscape for datasets in your domain. Identify which public datasets competitors are using or which models are trending.

📊 **Data journalists and analysts** — Track trending datasets across topics. Build lists of the most-used datasets in a given field for reporting or research.

🤖 **AI application developers** — Programmatically discover training datasets and pre-trained models to power your LLM fine-tuning, classification, or computer vision pipelines.

📈 **Market researchers** — Monitor Kaggle model and dataset growth over time. Identify what AI topics are gaining traction by tracking vote counts and download trends.

### Why Use This Actor?

- ✅ **No login required** — Kaggle's public API works without authentication
- ✅ **Structured JSON output** — All fields are normalized and flattened (no nested objects)
- ✅ **Covers both datasets and models** — Scrape either or both in one run
- ✅ **Keyword search support** — Filter by any topic (e.g., "climate", "healthcare", "llama")
- ✅ **Multiple sort orders** — Hottest, most votes, downloads, recently updated
- ✅ **Pagination handled automatically** — Just set `maxResults` and get paginated results
- ✅ **Fast and cheap** — Pure HTTP, no browser needed, results in seconds

### What Data Is Extracted?

#### Datasets

| Field | Description | Example |
|-------|-------------|---------|
| `type` | Record type | `"dataset"` |
| `id` | Kaggle dataset ID | `29` |
| `ref` | Owner/slug reference | `"berkeleyearth/climate-change-earth-surface-temperature-data"` |
| `url` | Full Kaggle URL | `"https://www.kaggle.com/datasets/..."` |
| `title` | Dataset title | `"Climate Change: Earth Surface Temperature Data"` |
| `subtitle` | Short description | `"Exploring global temperatures since 1750"` |
| `description` | Long description | `"This dataset contains records of surface temperatures..."` |
| `ownerName` | Owner display name | `"Berkeley Earth"` |
| `ownerRef` | Owner slug | `"organizations/berkeleyearth"` |
| `creatorName` | Creator name | `"John Doe"` |
| `licenseName` | Data license | `"CC BY-NC-SA 4.0"` |
| `totalBytes` | File size in bytes | `88843537` |
| `voteCount` | Number of votes/upvotes | `2453` |
| `downloadCount` | Total download count | `181589` |
| `viewCount` | Total page views | `1242616` |
| `kernelCount` | Number of notebooks using it | `695` |
| `currentVersionNumber` | Dataset version | `2` |
| `usabilityRating` | Kaggle usability score 0-1 | `0.76` |
| `isPrivate` | Whether dataset is private | `false` |
| `isFeatured` | Whether featured by Kaggle | `false` |
| `lastUpdated` | Last update timestamp | `"2024-02-22T08:53:54.627Z"` |
| `thumbnailImageUrl` | Thumbnail image URL | `"https://storage.googleapis.com/..."` |
| `tags` | List of topic tags | `["climate", "education", "data visualization"]` |

#### Models

| Field | Description | Example |
|-------|-------------|---------|
| `type` | Record type | `"model"` |
| `id` | Kaggle model ID | `619281` |
| `ref` | Owner/slug reference | `"kienngx/nemotron-nano-30b-trained"` |
| `url` | Full Kaggle URL | `"https://www.kaggle.com/models/..."` |
| `title` | Model title | `"Nemotron-Nano-30B variances"` |
| `subtitle` | Short description | `"LoRA fine-tuned adapter for reasoning tasks"` |
| `description` | Model description (up to 500 chars) | `"### Model Details..."` |
| `author` | Author display name | `"Ngô Xuân Kiên"` |
| `slug` | Model slug | `"nemotron-nano-30b-trained"` |
| `voteCount` | Number of votes | `55` |
| `isPrivate` | Whether model is private | `false` |
| `authorImageUrl` | Author avatar URL | `"https://storage.googleapis.com/kaggle-avatars/..."` |

### How Much Does It Cost to Scrape Kaggle Datasets?

This actor uses Pay-Per-Event (PPE) pricing. You are charged a flat fee when a run starts, plus a per-result fee that depends on your Apify subscription tier:

| Event | FREE | BRONZE | SILVER | GOLD | PLATINUM | DIAMOND |
|-------|------|--------|--------|------|----------|---------|
| Run start (one-time) | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 |
| Per result extracted | $0.00115 | $0.001 | $0.00078 | $0.0006 | $0.0004 | $0.00028 |

**Cost examples (BRONZE tier):**
- 100 datasets = $0.005 + 100 × $0.001 = **$0.105**
- 500 datasets = $0.005 + 500 × $0.001 = **$0.505**
- 1,000 datasets + models = $0.005 + 1,000 × $0.001 = **$1.005**

**Free plan estimate:** Apify's free tier includes $5/month, which is enough for approximately **4,300 results** per month at the FREE tier rate ($0.00115/result).

### How to Use It

#### Step 1: Choose what to scrape

Set `searchMode` to:
- `"datasets"` — scrape Kaggle datasets only
- `"models"` — scrape ML models only
- `"both"` — scrape datasets and models (results split 50/50)

#### Step 2: Optionally add a search query

Use the `search` field to filter results by keyword (e.g., `"natural language processing"`, `"computer vision"`, `"finance"`).

#### Step 3: Set your sort order

For datasets: `hottest` (trending), `votes` (most upvoted), `updated` (recent), `active` (most notebooks), `published` (newest).

For models: `hotness` (trending), `downloadCount` (most downloaded).

#### Step 4: Set maxResults

Set `maxResults` to however many items you need. The actor will paginate automatically. For example, `maxResults: 200` fetches 200 items across multiple pages.

#### Step 5: Run and download results

Results appear in the actor's dataset. Download as JSON, CSV, XLSX, or use the API.

### Input Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `searchMode` | string | `"datasets"` | What to scrape: `datasets`, `models`, or `both` |
| `search` | string | `""` | Keyword to filter results |
| `datasetSortBy` | string | `"hottest"` | Dataset sort: `hottest`, `votes`, `updated`, `active`, `published` |
| `modelSortBy` | string | `"hotness"` | Model sort: `hotness`, `downloadCount` |
| `maxResults` | integer | `100` | Maximum number of items to return |
| `maxRequestRetries` | integer | `3` | Retry attempts for failed requests |

### Output Example

```json
{
  "type": "dataset",
  "id": 29,
  "ref": "berkeleyearth/climate-change-earth-surface-temperature-data",
  "url": "https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data",
  "title": "Climate Change: Earth Surface Temperature Data",
  "subtitle": "Exploring global temperatures since 1750",
  "description": "",
  "ownerName": "Berkeley Earth",
  "ownerRef": "organizations/berkeleyearth",
  "creatorName": "[Deleted User]",
  "licenseName": "CC BY-NC-SA 4.0",
  "totalBytes": 88843537,
  "voteCount": 2453,
  "downloadCount": 181589,
  "viewCount": 1242616,
  "kernelCount": 695,
  "currentVersionNumber": 2,
  "usabilityRating": 0.7647059,
  "isPrivate": false,
  "isFeatured": false,
  "lastUpdated": "2017-05-01T17:29:10.78Z",
  "thumbnailImageUrl": "https://storage.googleapis.com/kaggle-datasets-images/29/33/default-backgrounds/dataset-thumbnail.jpg",
  "tags": ["atmospheric science", "environment", "business", "news"]
}
````

### Tips and Best Practices

💡 **Use keyword search for niche topics** — The `search` field is powerful. Try terms like `"llm fine-tuning"`, `"medical imaging"`, `"stock market"`, or `"speech recognition"` to find highly specific datasets.

💡 **Sort by `votes` for quality** — Highly upvoted datasets tend to be well-documented, clean, and widely used. Use `votes` sort when you want the most trusted datasets in a category.

💡 **Run `both` mode for AI model discovery** — If you're building an LLM application, run with `searchMode: "both"` and `search: "llm"` to discover both training datasets and pre-trained models in one run.

💡 **Download in CSV for spreadsheets** — Click "Export" → "CSV" in the run output tab to get a clean spreadsheet of all results.

💡 **Set a higher maxResults for comprehensive coverage** — Kaggle returns 20 items per page. For broad topic searches, set `maxResults: 500` or higher to get full coverage.

💡 **Track trends over time** — Schedule this actor to run weekly and save results to a Google Sheet to track which datasets are growing in popularity.

### Integrations

#### 📊 Connect to Google Sheets

Export Kaggle dataset metadata directly to Google Sheets for tracking and analysis:

1. Run this actor with your search query
2. Go to **Integrations** → **Google Sheets** in the Apify console
3. Connect your spreadsheet and select the export fields

#### 🔗 Connect to Airtable

Build a dataset discovery database in Airtable:

1. Use the Apify → Airtable integration to push results to a base
2. Create filtered views by license type, vote count, or topic tags
3. Add a formula field to compute bytes to GB for file size display

#### ⚡ Use with Zapier or Make

Automate notifications when popular new datasets appear:

1. Schedule this actor to run daily
2. Use Zapier's Apify trigger to catch new results
3. Send Slack alerts or email digests for datasets with `voteCount > 500`

#### 📦 Export to MongoDB or BigQuery

Store results in your data warehouse for longitudinal analysis:

1. Download results as JSONL from the dataset
2. Use `mongoimport` or BigQuery's JSON import to load
3. Query by tag, vote count, download count over time

### API Usage

#### Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/kaggle-scraper').call({
    searchMode: 'datasets',
    search: 'natural language processing',
    datasetSortBy: 'votes',
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} datasets`);
items.forEach(item => {
    console.log(`${item.title} — ${item.voteCount} votes, ${item.downloadCount} downloads`);
});
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("automation-lab/kaggle-scraper").call(run_input={
    "searchMode": "both",
    "search": "computer vision",
    "maxResults": 200,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Extracted {len(items)} items")
for item in items:
    print(f"[{item['type']}] {item['title']} — {item.get('voteCount', 0)} votes")
```

#### cURL

```bash
curl -X POST "https://api.apify.com/v2/acts/automation-lab~kaggle-scraper/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "searchMode": "datasets",
    "search": "finance",
    "datasetSortBy": "hottest",
    "maxResults": 50
  }'
```

### Use with Claude and MCP

You can use this actor directly from Claude Code, Claude Desktop, or any MCP-compatible AI assistant.

#### Claude Code (Terminal)

```bash
claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/kaggle-scraper"
```

Then ask Claude:

> "Find the top 20 most-voted climate change datasets on Kaggle"
> "Search Kaggle for NLP models and return the hottest ones"
> "Get 50 Kaggle datasets about healthcare, sorted by download count"

#### Claude Desktop / Cursor / VS Code

Add to your MCP config file (`~/.config/claude/claude_desktop_config.json` or `.vscode/mcp.json`):

```json
{
  "mcpServers": {
    "apify": {
      "type": "http",
      "url": "https://mcp.apify.com?tools=automation-lab/kaggle-scraper",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}
```

**Example prompts:**

- "Search Kaggle for 'time series' datasets and show me the ones with more than 1000 downloads"
- "What are the hottest ML models on Kaggle right now?"
- "Find datasets related to 'autonomous vehicles' and export them to a CSV"

### Legality and Terms of Service

This actor only accesses Kaggle's **public API endpoints** that are available without authentication. The data returned is publicly visible on the Kaggle website to all visitors without login.

We only collect **metadata** (titles, descriptions, vote counts, tags) — not dataset files or model weights. No personal data is collected beyond publicly displayed creator names.

**Important:** Always check Kaggle's [Terms of Service](https://www.kaggle.com/terms) and [Privacy Policy](https://www.kaggle.com/privacy) before using scraped data for commercial purposes. Respect dataset licenses (CC0, CC BY, etc.) when using the actual dataset files.

### FAQ

**Q: Do I need a Kaggle account or API key?**
A: No. This actor uses Kaggle's public API which works without authentication. No credentials needed.

**Q: Can I download the actual dataset files?**
A: No. This actor collects metadata only. To download dataset files, you need a Kaggle account and can use the official Kaggle API or CLI.

**Q: How many results can I get?**
A: `maxResults` supports up to 10,000. In practice, Kaggle's public API typically returns results from the most recent and most popular content. For very broad queries, you may get fewer unique results than requested.

**Q: The actor returned 0 results for my search. Why?**
A: This can happen if:

- The search term is too specific (try a broader keyword)
- For models mode: make sure `modelSortBy` is `hotness` or `downloadCount` (not `votes` or `updated`, which the API does not support)
- Try with no `search` value first to confirm the basic mode works

**Q: Can I get private datasets?**
A: No. This actor only accesses publicly available content. Private datasets require authentication.

**Q: Does this work for Kaggle competitions too?**
A: Not yet. This actor focuses on datasets and models. Competition metadata is on a different API endpoint — contact us if you need that.

**Q: What does `usabilityRating` mean?**
A: Kaggle calculates a usability score (0-1) based on whether a dataset has a description, column descriptions, file documentation, a license, and a proper cover image. A score of 1.0 means fully documented.

### Related Scrapers

Looking for more data from the AI and ML ecosystem? Check out our other automation-lab actors:

- [Hugging Face Models Scraper](https://apify.com/automation-lab/huggingface-models-scraper) — Scrape Hugging Face models and datasets
- [arXiv Paper Scraper](https://apify.com/automation-lab/arxiv-scraper) — Scrape academic papers from arXiv
- [GitHub Repositories Scraper](https://apify.com/automation-lab/github-scraper) — Scrape GitHub repositories and star counts

# Actor input Schema

## `searchMode` (type: `string`):

Choose whether to scrape datasets, models, or both.

## `search` (type: `string`):

Optional keyword to filter results (e.g. 'climate change', 'llama', 'finance').

## `datasetSortBy` (type: `string`):

How to sort datasets. Only applies when scraping datasets.

## `modelSortBy` (type: `string`):

How to sort ML models. Only applies when scraping models.

## `maxResults` (type: `integer`):

Maximum number of items to return. When scraping both types, this limit is split evenly.

## `maxRequestRetries` (type: `integer`):

Number of retry attempts for failed HTTP requests.

## Actor input object example

```json
{
  "searchMode": "datasets",
  "datasetSortBy": "hottest",
  "modelSortBy": "hotness",
  "maxResults": 20,
  "maxRequestRetries": 3
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchMode": "datasets",
    "search": "",
    "datasetSortBy": "hottest",
    "modelSortBy": "hotness",
    "maxResults": 20,
    "maxRequestRetries": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/kaggle-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchMode": "datasets",
    "search": "",
    "datasetSortBy": "hottest",
    "modelSortBy": "hotness",
    "maxResults": 20,
    "maxRequestRetries": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/kaggle-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchMode": "datasets",
  "search": "",
  "datasetSortBy": "hottest",
  "modelSortBy": "hotness",
  "maxResults": 20,
  "maxRequestRetries": 3
}' |
apify call automation-lab/kaggle-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/kaggle-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Kaggle Datasets & Models Scraper",
        "description": "Scrape datasets and ML models from Kaggle including metadata, votes, downloads, and more",
        "version": "0.1",
        "x-build-id": "Ul9YNUWETOEIJFf5y"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~kaggle-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~kaggle-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~kaggle-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchMode": {
                        "title": "What to scrape",
                        "enum": [
                            "datasets",
                            "models",
                            "both"
                        ],
                        "type": "string",
                        "description": "Choose whether to scrape datasets, models, or both.",
                        "default": "datasets"
                    },
                    "search": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Optional keyword to filter results (e.g. 'climate change', 'llama', 'finance')."
                    },
                    "datasetSortBy": {
                        "title": "Dataset sort order",
                        "enum": [
                            "hottest",
                            "votes",
                            "updated",
                            "active",
                            "published"
                        ],
                        "type": "string",
                        "description": "How to sort datasets. Only applies when scraping datasets.",
                        "default": "hottest"
                    },
                    "modelSortBy": {
                        "title": "Model sort order",
                        "enum": [
                            "hotness",
                            "downloadCount"
                        ],
                        "type": "string",
                        "description": "How to sort ML models. Only applies when scraping models.",
                        "default": "hotness"
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of items to return. When scraping both types, this limit is split evenly.",
                        "default": 100
                    },
                    "maxRequestRetries": {
                        "title": "Max request retries",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of retry attempts for failed HTTP requests.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
