# Kaggle Datasets Scraper (`parseforge/kaggle-scraper`) Actor

Extract Kaggle dataset metadata at scale: titles, owners, descriptions, tags, license, file types, sizes, downloads, views, and votes. Filter by search, tag, user, file type, or size.

- **URL**: https://apify.com/parseforge/kaggle-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** AI, Developer tools, Education
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://raw.githubusercontent.com/ParseForge/branding/main/banners/parseforge-banner.png)

## 📊 Kaggle Datasets Scraper

> 🚀 **Surface every public dataset on Kaggle in seconds.** Filter by keyword, file format, license, sort order, and size. No API key, no registration, no manual CSV wrangling.

> 🕒 **Last updated:** 2026-05-06 · **📊 24 fields** per record · **Powered by the Kaggle public API** · **No browser, pure HTTP** · **Up to 1M datasets per run**

Kaggle hosts more than **400,000** public datasets contributed by data scientists, ML researchers, and academic groups, ranging from a 16 KB CSV of medical insurance costs to half-gigabyte historical stock dumps and image corpora used in published competitions. Each dataset has a rich metadata footprint that matters in practice: number of downloads, votes, view counts, the **kernelCount** of public Kaggle notebooks that consume it, the license, the file format, an automated **usabilityRating** for schema clarity, and a long-form Markdown description. This Actor turns that metadata layer into clean dataset rows you can sort, filter, and pipe into downstream tools.

This Actor is built for ML engineers picking training corpora, data scientists benchmarking model results against community baselines, AI researchers tracking which Kaggle datasets are gaining notebook traction, and academic teams sourcing reproducible inputs for coursework or thesis work. It is a pure HTTP scraper against Kaggle's public dataset endpoints, so runs are fast and cheap. It does **not** download the actual dataset files - only the metadata layer that helps you decide which ones to pull next. Output is plain JSON, ready to feed into BigQuery, a Postgres staging table, a notebook, or a Make / Zapier workflow.

### 🎯 Target Audience and Primary Use Cases

| Audience | Use Case |
| --- | --- |
| 🤖 ML engineers | Source training corpora and benchmark datasets by file format, license, and size |
| 📈 Data scientists | Track which datasets are trending, surfacing newly hot competitions and corpora |
| 🎓 AI researchers | Build reproducible bibliographies of community datasets used in papers |
| 🏫 Academic teams | Pull dataset metadata for coursework, dissertations, and lit reviews |
| 🧪 Product builders | Validate that public training data exists for a given vertical before committing engineering |

---

### 📋 What the Kaggle Datasets Scraper does

* 🔎 **Search by keyword.** Free-text query against dataset titles, descriptions, and tags. Pass `finance`, `nlp`, `medical imaging`, etc., or leave blank to browse without a keyword.
* 🗂️ **Filter by file format.** CSV, JSON, SQLite, BigQuery, or all formats. Useful when your downstream tooling only accepts one shape.
* ⚖️ **Filter by license.** Restrict to Creative Commons, GPL, Open Database License, Other, or all licenses.
* 🏷️ **Filter by tag.** Pass any Kaggle tag slug (`classification`, `nlp`, `finance`, `health`) to scope the run to a topic, technique, or domain.
* 📐 **Filter by size.** Set `minSize` / `maxSize` in bytes to keep the result set within memory or storage limits for downstream tools.
* 🥇 **Sort the way Kaggle does.** Hottest, most votes, recently updated, most active, recently published.
* 📜 **Optional full description enrichment.** When enabled, each record is enriched with the dataset's long-form Markdown description, full tag list, and version history. Disable for faster runs when you only need card-level fields.

Each output record represents one public Kaggle dataset. Alongside the title, owner, and URL, the row includes the canonical `ref` (`owner/slug`), license, total bytes, current version number, usability rating, downloads, views, votes, the count of public notebooks (kernels) that reference the dataset, the topic count, last-updated timestamp, an array of tag slugs, a compact version history, and (optionally) the full Markdown description.

> 💡 **Why it matters:** Kaggle is one of the largest public catalogues of curated ML data on the web, and it sits behind a JS-heavy UI that is hard to crawl. A clean metadata feed lets you make data-sourcing decisions at the speed of SQL.

---

### 🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to find every CC0 image-classification dataset over 100 MB and export the list as CSV.

---

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Required</th><th>Description</th></tr>
<tr><td>search</td><td>string</td><td>no</td><td>Free-text query against titles, descriptions, and tags. Leave blank to browse without a keyword.</td></tr>
<tr><td>maxItems</td><td>integer</td><td>no</td><td>Max datasets to return. Free tier capped at 10. Paid up to 1,000,000.</td></tr>
<tr><td>sortBy</td><td>enum</td><td>no</td><td>One of hottest, votes, updated, active, published. Defaults to hottest.</td></tr>
<tr><td>fileType</td><td>enum</td><td>no</td><td>One of all, csv, json, sqlite, bigQuery.</td></tr>
<tr><td>license</td><td>enum</td><td>no</td><td>One of all, cc, gpl, odb, other.</td></tr>
<tr><td>tag</td><td>enum</td><td>no</td><td>Pick a Kaggle tag slug from a dropdown of 400+ canonical taxonomy values (e.g. classification, nlp, finance, health).</td></tr>
<tr><td>user</td><td>string</td><td>no</td><td>Filter to datasets owned by a single Kaggle user or organisation slug (e.g. timoboz, mlg-ulb, organizations/google).</td></tr>
<tr><td>minSize</td><td>integer</td><td>no</td><td>Lower bound on dataset size in bytes.</td></tr>
<tr><td>maxSize</td><td>integer</td><td>no</td><td>Upper bound on dataset size in bytes.</td></tr>
<tr><td>includeDescription</td><td>boolean</td><td>no</td><td>Fetch the dataset detail endpoint per record to add description, full tags, and version history. Defaults to true.</td></tr>
<tr><td>proxyConfiguration</td><td>object</td><td>no</td><td>Apify proxy configuration. Recommended for large jobs.</td></tr>
</table>

Example: top 100 finance datasets ranked by community votes, with full descriptions:

```json
{
    "search": "finance",
    "sortBy": "votes",
    "fileType": "all",
    "license": "all",
    "includeDescription": true,
    "maxItems": 100
}
````

Example: every CSV dataset under 50 MB tagged `nlp`, sorted by recency, no description bodies for a faster run:

```json
{
    "tag": "nlp",
    "fileType": "csv",
    "maxSize": 52428800,
    "sortBy": "updated",
    "includeDescription": false,
    "maxItems": 500
}
```

> ⚠️ **Good to know:** the Kaggle public API tolerates direct calls but rate-limits aggressively under sustained load. For runs above a few thousand datasets, enable Apify Residential proxy in the input.

***

### 📊 Output

Each dataset row is one public Kaggle dataset. The optional detail enrichment adds the long-form Markdown description, the canonical tag list, and the dataset's full version history.

#### 🧾 Schema

| Field | Type | Example |
| --- | --- | --- |
| 🖼️ thumbnailImageUrl | string (URL) | `https://storage.googleapis.com/kaggle-datasets-images/310/684/.../dataset-thumbnail.jpg` |
| 📛 title | string | `Credit Card Fraud Detection` |
| 🆔 ref | string | `mlg-ulb/creditcardfraud` |
| 🔗 url | string (URL) | `https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud` |
| ✏️ subtitle | string | `Anonymized credit card transactions labeled as fraudulent or genuine` |
| 👤 ownerName | string | `Machine Learning Group - ULB` |
| 🪪 ownerRef | string | `organizations/mlg-ulb` |
| 🧑‍🎨 creatorName | string | `Timo Bozsolik` |
| 🔗 creatorUrl | string | `timoboz` |
| 📜 licenseName | string | `Database: Open Database, Contents: Database Contents` |
| 📦 totalBytes | integer | `69155672` |
| 🔢 currentVersionNumber | integer | `3` |
| ⭐ usabilityRating | number | `0.85294` |
| ⬇️ downloadCount | integer | `1132640` |
| 👁️ viewCount | integer | `12618050` |
| ❤️ voteCount | integer | `13166` |
| 📓 kernelCount | integer | `5984` |
| 💬 topicCount | integer | `0` |
| ✨ isFeatured | boolean | `false` |
| 🕒 lastUpdated | string (ISO) | `2018-03-23T01:17:27.913Z` |
| 📝 description | string (Markdown) | `Context\n---\nIt is important...` |
| 🏷️ tags | array of strings | `["finance", "crime"]` |
| 📚 versions | array of objects | `[{"versionNumber":3,"creationDate":"2018-03-23T01:17:27.913Z","status":"Ready",...}]` |
| ⏱️ scrapedAt | string (ISO) | `2026-05-05T23:33:29.575Z` |

#### 📦 Sample records

<details>
<summary>Typical: a high-traffic ML benchmark dataset</summary>

```json
{
    "thumbnailImageUrl": "https://storage.googleapis.com/kaggle-datasets-images/310/684/3503c6c827ca269cc00ffa66f2a9c207/dataset-thumbnail.jpg",
    "title": "Credit Card Fraud Detection",
    "ref": "mlg-ulb/creditcardfraud",
    "url": "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud",
    "subtitle": "Anonymized credit card transactions labeled as fraudulent or genuine",
    "ownerName": "Machine Learning Group - ULB",
    "ownerRef": "organizations/mlg-ulb",
    "creatorName": "Timo Bozsolik",
    "creatorUrl": "timoboz",
    "licenseName": "Database: Open Database, Contents: Database Contents",
    "totalBytes": 69155672,
    "currentVersionNumber": 3,
    "usabilityRating": 0.85294116,
    "downloadCount": 1132640,
    "viewCount": 12618050,
    "voteCount": 13166,
    "kernelCount": 5984,
    "topicCount": 0,
    "isFeatured": false,
    "lastUpdated": "2018-03-23T01:17:27.913Z",
    "description": "Context\n---\nIt is important that credit card companies are able to recognize fraudulent credit card transactions...",
    "tags": ["finance", "crime"],
    "versions": [
        {"versionNumber": 3, "creationDate": "2018-03-23T01:17:27.913Z", "status": "Ready", "versionNotes": "Fixed preview", "creatorName": "Timo Bozsolik"},
        {"versionNumber": 2, "creationDate": "2016-11-05T09:08:46.503Z", "status": "Ready", "versionNotes": "CSV format", "creatorName": "Andrea"},
        {"versionNumber": 1, "creationDate": "2016-11-03T13:21:36.757Z", "status": "Ready", "versionNotes": "Rdata format", "creatorName": "Andrea"}
    ],
    "scrapedAt": "2026-05-05T23:33:29.575Z"
}
```

</details>

<details>
<summary>Edge case: a half-gigabyte historical financial dataset</summary>

```json
{
    "thumbnailImageUrl": "https://storage.googleapis.com/kaggle-datasets-images/4538/7213/0ef205a10621870d2d873557864474ff/dataset-thumbnail.jpg",
    "title": "Huge Stock Market Dataset",
    "ref": "borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
    "url": "https://www.kaggle.com/datasets/borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
    "subtitle": "Historical daily prices and volumes of all U.S. stocks and ETFs",
    "ownerName": "Boris Marjanovic",
    "ownerRef": "borismarjanovic",
    "creatorName": "Boris Marjanovic",
    "creatorUrl": "borismarjanovic",
    "licenseName": "CC0: Public Domain",
    "totalBytes": 515591518,
    "currentVersionNumber": 3,
    "usabilityRating": 0.75,
    "downloadCount": 144622,
    "viewCount": 1238575,
    "voteCount": 4641,
    "kernelCount": 304,
    "topicCount": 0,
    "isFeatured": false,
    "lastUpdated": "2017-11-16T14:53:29.82Z",
    "description": "#### Context\n\nHigh-quality financial data is expensive to acquire and is therefore rarely shared for free...",
    "tags": ["business", "finance", "investing", "economics", "artificial intelligence"],
    "versions": [
        {"versionNumber": 3, "creationDate": "2017-11-16T14:53:29.82Z", "status": "Ready", "versionNotes": "Refreshed prices through 11/10/2017", "creatorName": "Boris Marjanovic"}
    ],
    "scrapedAt": "2026-05-05T23:33:29.892Z"
}
```

</details>

<details>
<summary>Sparse: a tiny tutorial dataset, single version, no community discussion</summary>

```json
{
    "thumbnailImageUrl": "https://storage.googleapis.com/kaggle-datasets-images/13720/18513/71003abbbd54cc65c64065c1de79a9ff/dataset-thumbnail.jpg",
    "title": "Medical Cost Personal Datasets",
    "ref": "mirichoi0218/insurance",
    "url": "https://www.kaggle.com/datasets/mirichoi0218/insurance",
    "subtitle": "Insurance Forecast by using Linear Regression",
    "ownerName": "Miri Choi",
    "ownerRef": "mirichoi0218",
    "creatorName": "Miri Choi",
    "creatorUrl": "mirichoi0218",
    "licenseName": "Database: Open Database, Contents: Database Contents",
    "totalBytes": 16425,
    "currentVersionNumber": 1,
    "usabilityRating": 0.88235295,
    "downloadCount": 402724,
    "viewCount": 1960956,
    "voteCount": 3203,
    "kernelCount": 2089,
    "topicCount": 0,
    "isFeatured": false,
    "lastUpdated": "2018-02-21T00:15:14.117Z",
    "description": "### Context\nMachine Learning with R by Brett Lantz is a book that provides an introduction to machine learning using R...",
    "tags": ["education", "health", "finance", "insurance", "healthcare"],
    "versions": [
        {"versionNumber": 1, "creationDate": "2018-02-21T00:15:14.117Z", "status": "Ready", "versionNotes": "Initial release", "creatorName": "Miri Choi"}
    ],
    "scrapedAt": "2026-05-05T23:33:30.118Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
| --- | --- |
| 🪶 | **No browser overhead.** Pure HTTP against Kaggle's public dataset endpoints. Cheap to run, fast to finish. |
| 🔎 | **Six filter axes.** Keyword, sort order, file type, license, tag, size. Combine freely. |
| 📓 | **Notebook traction signal.** Every record carries `kernelCount`, the number of public Kaggle notebooks that already use the dataset. Use it to spot which datasets practitioners actually adopt. |
| ⭐ | **Usability score included.** Kaggle's automated 0-1 score on metadata completeness, schema clarity, and licensing comes for free with every row. |
| 📚 | **Full version history.** Each record carries the dataset's complete versioning trail: number, creation date, status, release notes, and contributor. |
| 🏷️ | **Tags as flat slugs.** Tags arrive as a clean string array, not nested objects. Drops straight into a SQL `text[]` or BigQuery `ARRAY<STRING>`. |
| 💾 | **Clean dataset shape.** 24 well-typed fields, no nulls on populated records, plus a direct dataset URL and thumbnail. |

> 📊 Kaggle hosts more than 400,000 public datasets. This Actor exposes the full metadata layer behind that catalogue with no manual scraping.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
| --- | --- | --- | --- | --- | --- |
| **⭐ Kaggle Datasets Scraper** *(this Actor)* | Apify usage only | Whole public catalogue | Live | Keyword, sort, file type, license, tag, size | None, run from console |
| Official CLI | Free | Same | Live | Same | Local install, account, API token |
| Manual JSON harvesting | Free | Same | Live | DIY | Pagination, retries, parsing yourself |
| Paid live data marketplaces | High monthly | Curated subsets only | Live | Per-vendor | Account, billing, API key |
| Static community dumps | Free | Stale, partial | Months out of date | Whatever the dump captured | Find and trust the dump |

For most teams the calculus is simple: a hosted scraper that returns clean JSON is worth more than the time spent re-implementing pagination, retries, and detail enrichment.

***

### 🚀 How to use

1. 🆔 **Create a free account.** [Create a free account w/ $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp).
2. 🔎 **Open the Actor.** Find the Kaggle Datasets Scraper on Apify Store.
3. 📝 **Fill the input form.** Set a keyword and pick the filters that matter (file type, license, tag, size, sort).
4. ▶️ **Run.** Click Start. The log streams listing pages and how many datasets have been collected.
5. ⬇️ **Export.** Download as JSON, CSV, Excel, or stream into a Make / Zapier / n8n workflow.

> ⏱️ Total time to first row: under a minute for most filter combinations.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%">

#### 🤖 ML and AI engineering

- Source training corpora that match a target file format and license
- Track which datasets are gaining notebook traction this month
- Build internal data catalogues seeded with Kaggle metadata
- Pre-screen datasets by usability rating before downloading

</td>
<td width="50%">

#### 📈 Data science and analytics

- Benchmark internal models against community baselines
- Discover trending datasets in a vertical (finance, health, NLP)
- Compare licensing terms across candidate datasets at scale
- Pull a fresh top-100 list of community-favourite datasets weekly

</td>
</tr>
<tr>
<td width="50%">

#### 🎓 Academia and research

- Build reproducible bibliographies of Kaggle datasets cited in papers
- Quantify how dataset adoption evolves through `kernelCount` over time
- Seed coursework and capstone projects with curated dataset shortlists
- Track which competition datasets remain active years after the contest

</td>
<td width="50%">

#### 🧪 Product and platform teams

- Validate that public training data exists before committing engineering
- Source seed datasets for AI feature prototypes and evaluations
- Map dataset gaps your platform could fill with proprietary data
- Run weekly sweeps to feed an internal data marketplace

</td>
</tr>
</table>

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

***

### 🔌 Automating Kaggle Datasets Scraper

Run this Actor on a schedule, from your own backend, or as part of a larger pipeline.

- **Node.js** via the [Apify JS client](https://docs.apify.com/api/client/js)
- **Python** via the [Apify Python client](https://docs.apify.com/api/client/python)
- **Docs:** [Apify Actor API reference](https://docs.apify.com/api)

Schedules are first-class on Apify. Set a cron, point it at this Actor's input, and your dataset stays fresh without any glue code.

***

### ❓ Frequently Asked Questions

<details>
<summary>📥 <b>Does this download the actual dataset files?</b></summary>
No. This Actor surfaces the metadata layer only - title, owner, license, size, downloads, votes, notebook count, tags, versions, and the optional Markdown description. Each row links to the Kaggle dataset detail page, where authenticated downloads happen through Kaggle's normal flow.
</details>

<details>
<summary>🔐 <b>Do I need a Kaggle API token?</b></summary>
No. The Actor uses Kaggle's public dataset endpoints, which require no authentication. The Actor handles all the request shaping for you.
</details>

<details>
<summary>📓 <b>What does kernelCount mean?</b></summary>
The number of public Kaggle notebooks (kernels) that reference this dataset. It is one of the strongest signals of practical adoption: a dataset with thousands of notebooks is one the community actively builds on.
</details>

<details>
<summary>⭐ <b>What is usabilityRating?</b></summary>
A 0-1 score Kaggle assigns automatically, based on metadata completeness (description, tags, license), schema clarity (column types and labels), and licensing presence. Anything above 0.8 is generally well-documented.
</details>

<details>
<summary>🏷️ <b>How do I find valid tag slugs?</b></summary>
You don't have to. The tag input is a dropdown of 400+ canonical Kaggle tag slugs (subjects, techniques, tasks, topics) harvested directly from the Kaggle public dataset API. Pick one from the list, no guessing required.
</details>

<details>
<summary>🌍 <b>Do I need a proxy?</b></summary>
For small lookups, no. For sweeps over a few thousand datasets the Kaggle API will start returning HTTP 429. Enable Apify Residential proxy in the input to keep going.
</details>

<details>
<summary>📥 <b>What output formats are supported?</b></summary>
JSON, CSV, Excel, RSS, HTML, and direct streaming via the Apify dataset API.
</details>

<details>
<summary>💼 <b>Can I use the data commercially?</b></summary>
Yes for the metadata returned by this Actor. Each individual dataset on Kaggle has its own license (CC0, ODbL, Apache 2.0, GPL, custom). The licenseName field tells you which one applies, so always check before redistributing or training commercial models on the underlying files.
</details>

<details>
<summary>💳 <b>Do I need a paid Apify plan?</b></summary>
The free tier returns up to 10 datasets per run for testing. A paid plan supports up to 1,000,000 datasets per run.
</details>

<details>
<summary>🚨 <b>What happens if a run fails?</b></summary>
The Actor exits gracefully and writes one row to the dataset with an error field describing what went wrong. Re-run with the same input once the underlying issue (rate limit, proxy, network) is resolved.
</details>

<details>
<summary>⚖️ <b>Is this legal?</b></summary>
Yes. The Kaggle public dataset endpoints are read-only and unauthenticated. This Actor only reads the metadata layer. You are responsible for how you use the dataset metadata and any downstream files you choose to download from Kaggle directly.
</details>

<details>
<summary>🔄 <b>How often is the data refreshed?</b></summary>
Live. Every run hits Kaggle in real time, so the metadata reflects what is on the platform at the moment of the run.
</details>

***

### 🔌 Integrate with any app

- [**Make**](https://www.make.com/en/integrations/apify) - drop the Actor into a no-code automation
- [**Zapier**](https://zapier.com/apps/apify/integrations) - trigger Zaps from each finished run
- [**n8n**](https://docs.n8n.io/integrations/builtin/credentials/apify/) - self-hostable workflow automation
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - send completion notifications and dataset links to a channel
- [**Webhooks**](https://docs.apify.com/platform/integrations/webhooks) - POST run events to any endpoint
- [**Google Sheets**](https://apify.com/apify/google-sheets) - mirror the dataset into a Sheet for collaborators

***

### 🔗 Recommended Actors

- [**🤗 Hugging Face Model Scraper**](https://apify.com/parseforge/hugging-face-model-scraper) - public model catalogue with download counts and licenses
- [**📚 Semantic Scholar Scraper**](https://apify.com/parseforge/semantic-scholar-scraper) - peer-reviewed papers with citation metadata
- [**🧬 medRxiv Scraper**](https://apify.com/parseforge/medrxiv-scraper) - medical preprints for biomedical AI training corpora
- [**🏛️ FRED Economic Data Scraper**](https://apify.com/parseforge/fred-scraper) - public economic time series for finance and macro models
- [**🏥 ClinicalTrials Scraper**](https://apify.com/parseforge/clinicaltrials-scraper) - structured clinical trial registry data

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more public-data scrapers built with the same conventions.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA)

***

> Disclaimer: This Actor is an independent project and is not affiliated with, endorsed by, or sponsored by Kaggle or Google LLC. It only reads public dataset metadata. You are responsible for complying with applicable laws, Kaggle's terms of service, and the per-dataset licenses when using the data downstream.

# Actor input Schema

## `search` (type: `string`):

Free-text query against dataset titles, descriptions, and tags. Leave blank to browse without a keyword (use sortBy / fileType to control discovery).

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `sortBy` (type: `string`):

Order results by Kaggle built-in sort orders.

## `fileType` (type: `string`):

Restrict results to datasets containing the chosen file format. all returns every format.

## `license` (type: `string`):

Restrict results to datasets shared under the chosen license. all returns every license.

## `tag` (type: `string`):

Filter by a Kaggle tag slug. Leave blank for no tag filter. The list is harvested from the Kaggle public dataset API and covers the standard subject, technique, task, and topic taxonomy.

## `user` (type: `string`):

Filter to datasets owned by a single Kaggle user or organisation slug (the part after kaggle.com/, e.g. timoboz, mlg-ulb, organizations/google). Leave blank for no user filter.

## `minSize` (type: `integer`):

Lower bound on the dataset total uncompressed size in bytes.

## `maxSize` (type: `integer`):

Upper bound on the dataset total uncompressed size in bytes.

## `includeDescription` (type: `boolean`):

Fetch the dataset detail endpoint for each record to populate description, tags, and versions. When false, those fields stay null and the run is faster.

## `proxyConfiguration` (type: `object`):

Optional Apify Proxy configuration. The Kaggle API tolerates direct calls but a residential or datacenter pool is recommended for large jobs.

## Actor input object example

```json
{
  "maxItems": 10,
  "sortBy": "hottest",
  "fileType": "all",
  "license": "all",
  "includeDescription": true,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": []
  }
}
```

# Actor output Schema

## `datasets` (type: `string`):

Complete dataset with every Kaggle dataset record and all metadata fields.

## `overview` (type: `string`):

Overview of Kaggle datasets with key fields displayed in a table.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": []
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/kaggle-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": [],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/kaggle-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": []
  }
}' |
apify call parseforge/kaggle-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/kaggle-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Kaggle Datasets Scraper",
        "description": "Extract Kaggle dataset metadata at scale: titles, owners, descriptions, tags, license, file types, sizes, downloads, views, and votes. Filter by search, tag, user, file type, or size.",
        "version": "0.2",
        "x-build-id": "VT1llegeKRXemnZx2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~kaggle-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~kaggle-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~kaggle-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-kaggle-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "search": {
                        "title": "Search keyword",
                        "type": "string",
                        "description": "Free-text query against dataset titles, descriptions, and tags. Leave blank to browse without a keyword (use sortBy / fileType to control discovery)."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "hottest",
                            "votes",
                            "updated",
                            "active",
                            "published"
                        ],
                        "type": "string",
                        "description": "Order results by Kaggle built-in sort orders.",
                        "default": "hottest"
                    },
                    "fileType": {
                        "title": "File type",
                        "enum": [
                            "all",
                            "csv",
                            "json",
                            "sqlite",
                            "bigQuery"
                        ],
                        "type": "string",
                        "description": "Restrict results to datasets containing the chosen file format. all returns every format.",
                        "default": "all"
                    },
                    "license": {
                        "title": "License",
                        "enum": [
                            "all",
                            "cc",
                            "gpl",
                            "odb",
                            "other"
                        ],
                        "type": "string",
                        "description": "Restrict results to datasets shared under the chosen license. all returns every license.",
                        "default": "all"
                    },
                    "tag": {
                        "title": "Tag",
                        "enum": [
                            "1x1 convolution",
                            "accelerators",
                            "advanced",
                            "adversarial learning",
                            "aesthetic quality",
                            "africa",
                            "agriculture",
                            "alcohol",
                            "amharic",
                            "animals",
                            "anime and manga",
                            "antarctica",
                            "arabic",
                            "art",
                            "artificial intelligence",
                            "arts and entertainment",
                            "asia",
                            "assamese",
                            "astronomy",
                            "atmospheric science",
                            "attention dropout",
                            "audio",
                            "audio classification",
                            "audio command detection",
                            "audio event classification",
                            "audio synthesis",
                            "audio-to-audio",
                            "australia",
                            "auto racing",
                            "auto-updating data",
                            "automatic speech recognition",
                            "automl",
                            "automobiles and vehicles",
                            "auxiliary classifier",
                            "aviation",
                            "banking",
                            "baseball",
                            "basketball",
                            "batch normalization",
                            "bayesian statistics",
                            "beginner",
                            "benchmark",
                            "benchmark dataset",
                            "bengali",
                            "bert",
                            "bigan",
                            "bigbigan",
                            "bigquery",
                            "binary classification",
                            "biology",
                            "biotechnology",
                            "board games",
                            "brazil",
                            "business",
                            "canada",
                            "cancer",
                            "card games",
                            "catboost",
                            "categorical",
                            "celebrities",
                            "chemistry",
                            "chichewa",
                            "china",
                            "chinese",
                            "chinese (taiwan)",
                            "cities and urban areas",
                            "classification",
                            "clothing and accessories",
                            "clustering",
                            "cnn",
                            "coding",
                            "comics and animation",
                            "computer science",
                            "computer vision",
                            "convnext",
                            "convolution",
                            "cooking and recipes",
                            "corrplot",
                            "covid19",
                            "cricket",
                            "crime",
                            "crowdfunding",
                            "culture and humanities",
                            "currencies and foreign exchange",
                            "cv2",
                            "cyber security",
                            "cycling",
                            "dailychallenge",
                            "dance",
                            "data analytics",
                            "data cleaning",
                            "data storytelling",
                            "data type",
                            "data visualization",
                            "datetime",
                            "decision tree",
                            "deep learning",
                            "deit",
                            "demographics",
                            "denoising",
                            "densenet",
                            "dentistry",
                            "deserts",
                            "diabetes",
                            "dimensionality reduction",
                            "diseases",
                            "dnn",
                            "doParallel",
                            "dplyr",
                            "dropout",
                            "drugs and medications",
                            "dutch",
                            "e-commerce services",
                            "ears and hearing",
                            "earth and nature",
                            "earth science",
                            "economics",
                            "education",
                            "efficientnet",
                            "efficientnet-b7",
                            "efficientnetv2",
                            "electricity",
                            "electronics",
                            "email and messaging",
                            "employment",
                            "energy",
                            "engineering",
                            "english",
                            "ensembling",
                            "environment",
                            "europe",
                            "evaluation",
                            "exercise",
                            "exploratory data analysis",
                            "eyes and vision",
                            "feature engineering",
                            "feature extraction",
                            "finance",
                            "finnish",
                            "fish and aquaria",
                            "food",
                            "football",
                            "forcats",
                            "forestry",
                            "french",
                            "gambling",
                            "games",
                            "gan",
                            "gender",
                            "general knowledge and reasoning",
                            "genetics",
                            "geography",
                            "geography and places",
                            "geology",
                            "geospatial analysis",
                            "german",
                            "ggplot2",
                            "global",
                            "golf",
                            "government",
                            "gpt2",
                            "gpu",
                            "gradient boosting",
                            "graph",
                            "graph neural network",
                            "greenland",
                            "gymnastics",
                            "health",
                            "health and fitness",
                            "health conditions",
                            "healthcare",
                            "heart conditions",
                            "hindi",
                            "history",
                            "hockey",
                            "holidays and cultural events",
                            "hospitals and treatment centers",
                            "hotels and accommodations",
                            "housing",
                            "hugging face",
                            "human rights",
                            "image",
                            "image augmentation",
                            "image classification",
                            "image classification logits",
                            "image generator",
                            "image segmentation",
                            "image style transfer",
                            "image super resolution",
                            "image text detection",
                            "image text recognition",
                            "image-to-image",
                            "image-to-text",
                            "income",
                            "india",
                            "indonesian",
                            "insurance",
                            "intermediate",
                            "international relations",
                            "internet",
                            "investing",
                            "IPython",
                            "italian",
                            "japan",
                            "japanese",
                            "jobs and career",
                            "json",
                            "k-means",
                            "keras",
                            "knn",
                            "korea",
                            "korean",
                            "language",
                            "languages",
                            "law",
                            "learn",
                            "lending",
                            "lightgbm",
                            "linear regression",
                            "linguistics",
                            "literature",
                            "logistic regression",
                            "lstm",
                            "lubridate",
                            "make-up and cosmetics",
                            "manufacturing",
                            "marketing",
                            "martial arts",
                            "mask r-cnn",
                            "math",
                            "mathematics",
                            "matplotlib",
                            "medicine",
                            "mental health",
                            "mexico",
                            "middle east",
                            "military",
                            "ml ethics",
                            "mobile and wireless",
                            "mobilenet v2",
                            "mobilenetv3",
                            "model comparison",
                            "model explainability",
                            "mortality",
                            "mountains",
                            "movies and tv shows",
                            "multi-head attention",
                            "multiclass classification",
                            "multilabel classification",
                            "multilingual",
                            "multimodal",
                            "museums",
                            "music",
                            "naive bayes",
                            "natural disasters",
                            "neural networks",
                            "neuroscience",
                            "news",
                            "nlp",
                            "nltk",
                            "north america",
                            "numpy",
                            "nutrition",
                            "object detection",
                            "oceania",
                            "oil and gas",
                            "online communities",
                            "optimization",
                            "other",
                            "outlier analysis",
                            "pandas",
                            "pca",
                            "people",
                            "people and society",
                            "persian",
                            "philosophy",
                            "physical science",
                            "physics",
                            "PIL",
                            "pitch extraction",
                            "plants",
                            "plotly",
                            "polish",
                            "politics",
                            "pollution",
                            "popular culture",
                            "portuguese",
                            "pose detection",
                            "pre-trained model",
                            "primary and secondary schools",
                            "programming",
                            "psychology",
                            "public health",
                            "public safety",
                            "puzzles",
                            "python",
                            "pytorch",
                            "question answering",
                            "r",
                            "racial equity",
                            "rail transport",
                            "random forest",
                            "randomForest",
                            "ratings and reviews",
                            "re",
                            "real estate",
                            "recommender systems",
                            "regression",
                            "reinforcement learning",
                            "religion and belief systems",
                            "renewable energy",
                            "research",
                            "residual block",
                            "resnet",
                            "restaurants",
                            "retail and shopping",
                            "retinanet",
                            "retrieval question answering",
                            "retrieval/ranking",
                            "rnn",
                            "roberta",
                            "robotics",
                            "russia",
                            "russian",
                            "sam",
                            "sampling",
                            "science and technology",
                            "scipy",
                            "seaborn",
                            "search engines",
                            "segmentation",
                            "sentence similarity",
                            "signal processing",
                            "simulations",
                            "sklearn",
                            "slovenian",
                            "social issues and advocacy",
                            "social networks",
                            "social science",
                            "socrata",
                            "software",
                            "south america",
                            "spaCy",
                            "spanish",
                            "speech synthesis",
                            "speech-to-text",
                            "sports",
                            "sql",
                            "standardized testing",
                            "statistical analysis",
                            "summarization",
                            "sundanese",
                            "survey analysis",
                            "svm",
                            "swedish",
                            "synthetic",
                            "t5",
                            "tabular",
                            "tabular classification",
                            "tamil",
                            "tennis",
                            "tensorflow",
                            "text",
                            "text classification",
                            "text conversation",
                            "text fill-mask",
                            "text generation",
                            "text mining",
                            "text pre-processing",
                            "text segmentation",
                            "text sequence alignment",
                            "text-to-image",
                            "text-to-speech",
                            "text-to-text generation",
                            "thai",
                            "tibble",
                            "tidyverse",
                            "time series analysis",
                            "token classification",
                            "torchvision",
                            "tpu",
                            "transfer learning",
                            "transformer",
                            "transformers",
                            "translation",
                            "transportation",
                            "travel",
                            "turkish",
                            "twi",
                            "Two Sigma x Kaggle Finance Data Repo",
                            "ukrainian",
                            "unet",
                            "united states",
                            "universities and colleges",
                            "urban planning",
                            "urdu",
                            "uzbek",
                            "vae",
                            "vgg-style",
                            "video",
                            "video classification",
                            "video games",
                            "video generation",
                            "vietnamese",
                            "vision transformer",
                            "water bodies",
                            "water sports",
                            "water transport",
                            "weather and climate",
                            "websites",
                            "whisper",
                            "word2vec skip-gram",
                            "xgboost",
                            "yolo",
                            "yolov5",
                            "yolov8",
                            "zero-shot text classification"
                        ],
                        "type": "string",
                        "description": "Filter by a Kaggle tag slug. Leave blank for no tag filter. The list is harvested from the Kaggle public dataset API and covers the standard subject, technique, task, and topic taxonomy."
                    },
                    "user": {
                        "title": "Kaggle user",
                        "type": "string",
                        "description": "Filter to datasets owned by a single Kaggle user or organisation slug (the part after kaggle.com/, e.g. timoboz, mlg-ulb, organizations/google). Leave blank for no user filter."
                    },
                    "minSize": {
                        "title": "Min size (bytes)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Lower bound on the dataset total uncompressed size in bytes."
                    },
                    "maxSize": {
                        "title": "Max size (bytes)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Upper bound on the dataset total uncompressed size in bytes."
                    },
                    "includeDescription": {
                        "title": "Include full description",
                        "type": "boolean",
                        "description": "Fetch the dataset detail endpoint for each record to populate description, tags, and versions. When false, those fields stay null and the run is faster.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional Apify Proxy configuration. The Kaggle API tolerates direct calls but a residential or datacenter pool is recommended for large jobs."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```