# Hugging Face Datasets Scraper (`automation-lab/huggingface-datasets-scraper`) Actor

Scrape the Hugging Face datasets catalog. Filter by task, language, license, or author. Sort by downloads, likes, or trending. Extracts metadata for 200k+ ML datasets.

- **URL**: https://apify.com/automation-lab/huggingface-datasets-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hugging Face Datasets Scraper

Scrape the Hugging Face datasets catalog — search by keyword, filter by task category, language, license, or author, and sort by downloads, likes, or trending. Extract full metadata for 200,000+ public ML datasets including download counts, tags, descriptions, and direct links.

### What does it do?

This actor calls the Hugging Face Datasets API and returns structured metadata for every matching dataset. You can browse the entire catalog or narrow results by ML task, programming language, license type, or author/organization.

**For each dataset, the actor extracts:**

| Field | Description |
|-------|-------------|
| `id` | Unique dataset ID (e.g. `huggingface/llm-perf-dataset`) |
| `author` | Author or organization name |
| `description` | Dataset card description (up to 500 chars) |
| `downloads` | Number of downloads in the last 30 days |
| `likes` | Number of Hugging Face likes |
| `license` | License type (e.g. `mit`, `apache-2.0`, `cc-by-4.0`) |
| `taskCategories` | ML task categories (e.g. `text-classification`) |
| `languages` | Language codes (e.g. `en`, `fr`, `zh`) |
| `formats` | Data formats (e.g. `parquet`, `csv`, `json`) |
| `modalities` | Data modalities (e.g. `text`, `image`, `audio`) |
| `libraries` | Compatible ML libraries (e.g. `datasets`, `transformers`) |
| `sizeCategory` | Dataset size bucket (e.g. `1K<n<10K`, `n>1T`) |
| `tags` | All raw tags associated with the dataset |
| `lastModified` | ISO 8601 timestamp of the last update |
| `createdAt` | ISO 8601 timestamp of creation |
| `isGated` | Whether access approval is required |
| `isDisabled` | Whether the dataset has been disabled |
| `url` | Direct link to the dataset page |
| `scrapedAt` | ISO 8601 timestamp of when data was collected |

### Who is it for?

🧑‍🔬 **ML Researchers** discovering training and evaluation data — search for datasets matching your task (e.g. `question-answering`) and language (e.g. `en`) in one API call.

🏢 **AI teams building fine-tuning pipelines** — programmatically monitor which datasets are available for a given domain (NLP, computer vision, speech) and track their download trends over time.

📊 **Data scientists cataloging benchmarks** — pull dataset metadata into your own database to compare size categories, licenses, and task coverage across thousands of datasets.

🔬 **Academic researchers** aggregating dataset availability — find all datasets from a specific organization (e.g. `google`, `facebook`, `EleutherAI`) and track their usage.

🛠️ **Developer tool builders** — add HuggingFace dataset search to your app without building your own API integration.

### Why use this actor?

Hugging Face has 200,000+ public datasets. Browsing them manually is slow and doesn't scale. This actor gives you programmatic access with filtering and sorting that would otherwise require multiple API calls with custom pagination logic.

- ✅ **No browser automation** — pure HTTP API calls, extremely fast and reliable
- ✅ **Structured output** — tags are parsed into typed arrays (taskCategories, languages, formats, etc.) so you don't have to post-process raw tag strings
- ✅ **Pagination handled** — fetches all matching datasets automatically, not just the first page
- ✅ **Pay only for what you get** — PPE pricing means you pay per dataset extracted, not per minute

### How to use it

#### Step 1: Set your filters

Choose what to scrape:

- **Search query** — keyword to search across dataset names and descriptions (e.g. `text classification`, `medical imaging`)
- **Filter by task** — ML task category from the [HuggingFace tasks list](https://huggingface.co/tasks) (e.g. `text-generation`, `image-segmentation`)
- **Filter by language** — ISO 639-1 language code (e.g. `en`, `zh`, `de`)
- **Filter by license** — SPDX license ID (e.g. `mit`, `apache-2.0`, `cc-by-4.0`, `cc0-1.0`)
- **Filter by author** — organization or user name (e.g. `huggingface`, `google`, `allenai`)

#### Step 2: Choose sort order

- `downloads` — most downloaded in the last 30 days (default, best for popularity)
- `likes` — most liked by the community
- `lastModified` — recently updated datasets
- `trending` — trending datasets by Hugging Face's trending score

#### Step 3: Set max results

- Use `20` for a quick look
- Use `100` for a comprehensive dataset for a specific niche
- Use `0` for unlimited (all matching datasets)

#### Step 4: Run and export

Results appear in the dataset tab. Export as JSON, CSV, or XLSX. Use the API to integrate with your pipelines.

### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `searchQuery` | string | `""` | Keyword to search across names and descriptions |
| `filterByTask` | string | `""` | ML task category (e.g. `text-classification`) |
| `filterByLanguage` | string | `""` | ISO language code (e.g. `en`, `zh`) |
| `filterByLicense` | string | `""` | License identifier (e.g. `mit`, `apache-2.0`) |
| `filterByAuthor` | string | `""` | Author or organization name |
| `sortBy` | enum | `downloads` | Sort order: `downloads`, `likes`, `lastModified`, `trending` |
| `maxResults` | integer | `100` | Max datasets to return (0 = unlimited) |
| `maxRequestRetries` | integer | `3` | Retry attempts for failed API calls |

### Output example

```json
{
  "id": "rajpurkar/squad",
  "author": "rajpurkar",
  "description": "Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles...",
  "downloads": 284512,
  "likes": 1423,
  "license": "cc-by-sa-4.0",
  "taskCategories": ["question-answering"],
  "languages": ["en"],
  "formats": ["parquet"],
  "modalities": ["text"],
  "libraries": ["datasets", "mlcroissant"],
  "sizeCategory": "100K<n<1M",
  "tags": [
    "task_categories:question-answering",
    "language:en",
    "license:cc-by-sa-4.0",
    "size_categories:100K<n<1M",
    "format:parquet",
    "modality:text",
    "library:datasets",
    "library:mlcroissant"
  ],
  "lastModified": "2024-03-15T10:22:30.000Z",
  "createdAt": "2022-03-02T23:29:22.000Z",
  "isGated": false,
  "isDisabled": false,
  "url": "https://huggingface.co/datasets/rajpurkar/squad",
  "scrapedAt": "2026-04-28T08:00:00.000Z"
}
````

### Supported filter values

#### Task categories (common values for `filterByTask`)

| Task | Filter value |
|------|-------------|
| Text classification | `text-classification` |
| Question answering | `question-answering` |
| Text generation | `text-generation` |
| Token classification | `token-classification` |
| Translation | `translation` |
| Summarization | `summarization` |
| Image classification | `image-classification` |
| Object detection | `object-detection` |
| Image segmentation | `image-segmentation` |
| Automatic speech recognition | `automatic-speech-recognition` |

See the full list at [huggingface.co/tasks](https://huggingface.co/tasks).

#### Common licenses for `filterByLicense`

`mit`, `apache-2.0`, `cc-by-4.0`, `cc-by-sa-4.0`, `cc0-1.0`, `openrail`, `gpl-3.0`, `llama2`, `llama3`

### Tips for best results

💡 **Combine filters** — Task + language + license filters work together. Run `filterByTask=text-classification`, `filterByLanguage=en`, `filterByLicense=mit` to find only MIT-licensed English text classification datasets.

💡 **Use `sortBy=trending`** — Great for discovering newly popular datasets before they get too competitive for fine-tuning.

💡 **Monitor a specific org** — Set `filterByAuthor=google` and `sortBy=lastModified` to track Google's latest dataset releases.

💡 **No search + no filters** — Returns the most popular datasets overall. Great for building a "top datasets" leaderboard.

💡 **`maxResults=0`** — Returns everything matching your filters. For broad queries, this can be thousands of datasets — use specific filters to narrow down first.

### How much does it cost to scrape Hugging Face datasets?

This actor uses pay-per-event (PPE) pricing — you only pay for datasets successfully extracted, not for time or failed runs.

| Plan | Price per dataset |
|------|------------------|
| FREE | $0.00115 |
| BRONZE | $0.001 |
| SILVER | $0.00078 |
| GOLD | $0.0006 |
| PLATINUM | $0.0004 |
| DIAMOND | $0.00028 |

**Example costs:**

- 100 datasets → ~$0.115 on FREE plan
- 1,000 datasets → ~$0.78–$1.00 on BRONZE/SILVER
- 10,000 datasets → ~$4–$6 on GOLD

Since this actor uses direct API calls (no proxies, no browser), compute costs are negligible. You get roughly **1,000 datasets per dollar** on the BRONZE plan.

The free Apify plan includes $5 monthly credit — enough for ~10,000 datasets per month at no charge.

### Integrations

#### 🔗 Connect with other actors

**Build an ML dataset pipeline:**

1. Run Hugging Face Datasets Scraper to find relevant datasets (filter by task, language, license)
2. Feed dataset IDs to a download/processing workflow
3. Store in your vector database for training

**Monitor dataset availability:**

- Schedule weekly runs to track new datasets in your niche
- Compare download counts over time to spot rising datasets
- Alert your team when a gated dataset becomes public

**Research cataloging:**

1. Scrape all datasets from specific organizations
2. Export to Google Sheets or Notion
3. Filter and annotate for your literature review

#### 🤖 Use with other Hugging Face scrapers

| Actor | What it does |
|-------|-------------|
| [Hugging Face Scraper](https://apify.com/automation-lab/huggingface-scraper) | Scrape models, spaces, and papers from HuggingFace Hub |
| [Hugging Face Papers Scraper](https://apify.com/automation-lab/huggingface-papers-scraper) | Get ML papers with authors, abstracts, and citation counts |

### API usage

#### Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/huggingface-datasets-scraper').call({
    searchQuery: 'instruction tuning',
    filterByLanguage: 'en',
    sortBy: 'downloads',
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Fetched ${items.length} datasets`);
items.forEach(d => console.log(d.id, d.downloads, d.license));
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("automation-lab/huggingface-datasets-scraper").call(run_input={
    "searchQuery": "instruction tuning",
    "filterByLanguage": "en",
    "sortBy": "downloads",
    "maxResults": 100,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Fetched {len(items)} datasets")
for d in items:
    print(d["id"], d["downloads"], d["license"])
```

#### cURL

```bash
## Start the actor
curl -X POST "https://api.apify.com/v2/acts/automation-lab~huggingface-datasets-scraper/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "searchQuery": "instruction tuning",
    "filterByLanguage": "en",
    "sortBy": "downloads",
    "maxResults": 100
  }'

## Get results (replace RUN_ID and DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_APIFY_TOKEN"
```

### Use with Claude and other AI assistants (MCP)

You can use this actor directly with Claude via the Model Context Protocol (MCP), making it callable from Claude Code, Claude Desktop, Cursor, and VS Code.

#### Claude Code

```bash
claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-datasets-scraper"
```

#### Claude Desktop / Cursor / VS Code

Add to your MCP config file (`~/.claude/claude_desktop_config.json` or equivalent):

```json
{
  "mcpServers": {
    "apify": {
      "type": "http",
      "url": "https://mcp.apify.com?tools=automation-lab/huggingface-datasets-scraper",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}
```

#### Example prompts

Once connected, try these prompts with Claude:

- *"Find the top 20 most downloaded English text-classification datasets on Hugging Face."*
- *"Get me all datasets from the 'allenai' organization sorted by likes."*
- *"List the 50 most popular MIT-licensed datasets on Hugging Face."*
- *"Find trending image segmentation datasets from this week."*

### Legality

Hugging Face's [Terms of Service](https://huggingface.co/terms-of-service) allow accessing public dataset metadata. This actor calls the official public HuggingFace API (`/api/datasets`) — the same endpoint used by the HuggingFace website and all official client libraries.

- ✅ Only public dataset metadata is collected
- ✅ Gated datasets are listed (metadata only) but their contents are NOT downloaded
- ✅ Private datasets are excluded by default
- ✅ Respects API rate limits with automatic retry and backoff

### FAQ

**Q: Can I scrape private or gated datasets?**
A: The actor lists gated datasets (those requiring access approval) in metadata, but cannot access their content. Fully private datasets are not accessible via the public API.

**Q: How many datasets are on Hugging Face?**
A: There are 200,000+ public datasets as of 2026, growing rapidly. Use `maxResults: 0` with specific filters to see all matches.

**Q: The actor returned fewer results than my maxResults — why?**
A: Your filters matched fewer datasets than requested. Try broadening your filters (e.g., remove the license filter) to get more results.

**Q: What does `sortBy=trending` actually measure?**
A: Hugging Face's trending score is a proprietary algorithm combining recent downloads, likes, and activity. It changes daily and highlights fast-rising datasets.

**Q: Can I filter by multiple tasks at once?**
A: The current version supports one task filter per run. To get results for multiple tasks, run the actor separately for each task and combine the outputs.

**Q: The actor ran but I got 0 results — what happened?**
A: Check that your filter values are valid. Task categories use underscores (e.g., `text-classification` not `text classification`). Language codes are ISO 639-1 lowercase (e.g., `en`, `fr`, `zh`). License IDs must match HuggingFace's exact format (e.g., `apache-2.0`, `cc-by-4.0`).

### Related actors

- [Hugging Face Scraper](https://apify.com/automation-lab/huggingface-scraper) — Scrape HuggingFace models, spaces, and papers
- [Hugging Face Papers Scraper](https://apify.com/automation-lab/huggingface-papers-scraper) — Get ML research papers with authors and abstracts

# Actor input Schema

## `searchQuery` (type: `string`):

Keyword to search across dataset names and descriptions. Leave empty to browse all datasets.

## `filterByTask` (type: `string`):

Filter datasets by ML task category (e.g. text-classification, image-segmentation, question-answering). See https://huggingface.co/tasks for valid values.

## `filterByLanguage` (type: `string`):

Filter datasets by language code (e.g. en, fr, zh, de). Uses ISO 639-1 codes.

## `filterByLicense` (type: `string`):

Filter datasets by license (e.g. mit, apache-2.0, cc-by-4.0, cc0-1.0).

## `filterByAuthor` (type: `string`):

Filter datasets by author or organization name (e.g. huggingface, google, facebook).

## `sortBy` (type: `string`):

How to sort the results.

## `maxResults` (type: `integer`):

Maximum number of datasets to return. Use 0 for unlimited (all datasets matching your filters).

## `maxRequestRetries` (type: `integer`):

Number of retry attempts for failed requests before giving up.

## Actor input object example

```json
{
  "searchQuery": "text classification",
  "sortBy": "downloads",
  "maxResults": 20,
  "maxRequestRetries": 3
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "text classification",
    "filterByTask": "",
    "filterByLanguage": "",
    "filterByLicense": "",
    "filterByAuthor": "",
    "sortBy": "downloads",
    "maxResults": 20,
    "maxRequestRetries": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/huggingface-datasets-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "text classification",
    "filterByTask": "",
    "filterByLanguage": "",
    "filterByLicense": "",
    "filterByAuthor": "",
    "sortBy": "downloads",
    "maxResults": 20,
    "maxRequestRetries": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/huggingface-datasets-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "text classification",
  "filterByTask": "",
  "filterByLanguage": "",
  "filterByLicense": "",
  "filterByAuthor": "",
  "sortBy": "downloads",
  "maxResults": 20,
  "maxRequestRetries": 3
}' |
apify call automation-lab/huggingface-datasets-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/huggingface-datasets-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hugging Face Datasets Scraper",
        "description": "Scrape the Hugging Face datasets catalog. Filter by task, language, license, or author. Sort by downloads, likes, or trending. Extracts metadata for 200k+ ML datasets.",
        "version": "0.1",
        "x-build-id": "r1axgviDJVFzDJryw"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~huggingface-datasets-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-huggingface-datasets-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~huggingface-datasets-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-huggingface-datasets-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~huggingface-datasets-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-huggingface-datasets-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQuery": {
                        "title": "🔍 Search query",
                        "type": "string",
                        "description": "Keyword to search across dataset names and descriptions. Leave empty to browse all datasets."
                    },
                    "filterByTask": {
                        "title": "🏷️ Filter by task category",
                        "type": "string",
                        "description": "Filter datasets by ML task category (e.g. text-classification, image-segmentation, question-answering). See https://huggingface.co/tasks for valid values."
                    },
                    "filterByLanguage": {
                        "title": "🌍 Filter by language",
                        "type": "string",
                        "description": "Filter datasets by language code (e.g. en, fr, zh, de). Uses ISO 639-1 codes."
                    },
                    "filterByLicense": {
                        "title": "📄 Filter by license",
                        "type": "string",
                        "description": "Filter datasets by license (e.g. mit, apache-2.0, cc-by-4.0, cc0-1.0)."
                    },
                    "filterByAuthor": {
                        "title": "👤 Filter by author",
                        "type": "string",
                        "description": "Filter datasets by author or organization name (e.g. huggingface, google, facebook)."
                    },
                    "sortBy": {
                        "title": "📊 Sort by",
                        "enum": [
                            "downloads",
                            "likes",
                            "lastModified",
                            "trending"
                        ],
                        "type": "string",
                        "description": "How to sort the results.",
                        "default": "downloads"
                    },
                    "maxResults": {
                        "title": "Max datasets",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of datasets to return. Use 0 for unlimited (all datasets matching your filters).",
                        "default": 100
                    },
                    "maxRequestRetries": {
                        "title": "Max request retries",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of retry attempts for failed requests before giving up.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
