# Hugging Face Scraper — AI Models, Datasets, Spaces & Papers (`logiover/huggingface-hub-intelligence-scraper`) Actor

Export every AI model, dataset, space and daily paper from the Hugging Face Hub. Filter by task, library (transformers, diffusers, GGUF), language, license, author. Sort by downloads, likes, trending. Sibling files + README. Public HF API, no token. For AI builders, ML research, RAG and VC AI intel.

- **URL**: https://apify.com/logiover/huggingface-hub-intelligence-scraper.md
- **Developed by:** [Logiover](https://apify.com/logiover) (community)
- **Categories:** Automation, Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hugging Face Scraper — AI Models, Datasets, Spaces & Papers Discovery

Discover and export **every AI model, dataset, space, and daily research paper** on the **Hugging Face Hub** — the world's largest open AI repository (~1M+ models, ~200k+ datasets, ~500k+ spaces). Filter by **task** (text-generation, embeddings, ASR, TTS, vision, etc.), **library** (`transformers`, `diffusers`, `sentence-transformers`, `GGUF`, `MLX`, `ONNX`), **language**, **base model**, **license**, **author / organization**. Sort by **downloads**, **likes**, **recently-updated** or **trending**.

Built on the **official open Hugging Face Hub API** — no token, no proxy, no scraping. Per item: full metadata, tag taxonomy, sibling files, README content, model card data and **direct Hub URLs**.

Perfect for **AI tool builders**, **ML researchers**, **RAG / fine-tuning teams**, **AI model marketplaces**, **VC analysts tracking AI talent**, and any **competitive-intelligence workflow** in the 2026 AI ecosystem.

---

### 🚀 What does this Hugging Face scraper do?

Five entity types — all in one normalized schema:

| Entity | What you get | Catalog size |
|--------|--------------|--------------|
| `models` | Model weights + config + tokenizer + adapters | ~1M+ |
| `datasets` | Training, evaluation, instruction-tuning, multimodal datasets | ~200k+ |
| `spaces` | Hosted Gradio / Streamlit / Docker demo apps | ~500k+ |
| `papers` | Curated daily research papers with upvotes & author lists | ~30 new / day |
| `collections` | Curated lists by HF users | dynamic |

Every record carries **author**, **downloads**, **likes**, **task / pipeline tag**, **library**, **license**, **language tags**, **base-model lineage**, **dataset lineage**, **last-modified timestamp** and **direct Hub URL** — ready for a leaderboard, monitoring dashboard, or RAG pipeline.

---

### 💡 Use cases

- **AI tool discovery / model marketplaces** — daily refresh of every new model in a niche (e.g. all `text-generation` GGUF models above 1k downloads)
- **RAG / fine-tuning pipelines** — discover every dataset matching `task_categories:question-answering` + `language:en`
- **VC / talent intel** — pull every model by `author=meta-llama` or `author=mistralai` for portfolio monitoring; track which authors are accumulating likes fastest
- **AI release monitoring** — alert when a new model in your watchlist is uploaded (`sort=createdAt` + `modifiedFrom`)
- **Hub indexing** — feed every README into a vector DB for semantic search over the Hub
- **Competitive analysis** — sort by downloads + filter by `library=diffusers` to map the image-generation landscape
- **Daily paper digests** — pull every Hugging Face Daily Papers entry with upvotes, authors and abstracts
- **Model marketplace seeding** — bulk-export every Apache-2.0 model in a category to bootstrap a model store

---

### ⚙️ Input configuration

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `entityType` | `string` | `"models"` | `models` / `datasets` / `spaces` / `papers` / `collections`. |
| `search` | `string` | `""` | Free-text search over name + description. |
| `author` | `string` | `""` | Author / organization filter (`mistralai`, `meta-llama`, `Qwen`, `stabilityai`). |
| `pipelineTag` | `string` | `""` | Models only: task tag (`text-generation`, `automatic-speech-recognition`, `image-to-text`, etc.). |
| `library` | `string` | `""` | Library filter (`transformers`, `diffusers`, `sentence-transformers`, `gguf`, `mlx`, `onnx`). |
| `language` | `string` | `""` | Language tag (`en`, `tr`, `multilingual`). |
| `tags` | `string[]` | `[]` | Extra HF tag filters (`license:apache-2.0`, `base_model:meta-llama/Llama-3-8B`). |
| `sort` | `string` | `"downloads"` | `downloads` / `likes` / `lastModified` / `createdAt` / `trendingScore`. |
| `sortDirection` | `string` | `"-1"` | `-1` (desc) / `1` (asc). |
| `maxResults` | `integer` | `500` | Hard cap. `0` = unlimited. |
| `fetchDetails` | `boolean` | `false` | Make one extra API call per item to fetch siblings list + cardData + full license + gated/disabled flags. |
| `fetchReadme` | `boolean` | `false` | Requires `fetchDetails`. Pulls the raw README.md content (model / dataset / space card). |
| `minDownloads` | `integer` | `0` | Client-side filter — drop items below this download count. |
| `minLikes` | `integer` | `0` | Client-side filter — drop items below this like count. |
| `modifiedFrom` | `string` | `null` | Drop items last-modified before this date (`YYYY-MM-DD`). |
| `papersStartDate` | `string` | `null` | Papers-only: range start. Defaults to last 30 days. |
| `papersEndDate` | `string` | `null` | Papers-only: range end. Defaults to today. |

---

### 📦 Output fields

#### Common (all entity types)

| Field | Description | Example |
|-------|-------------|---------|
| `entityType` | `model` / `dataset` / `space` / `paper` / `collection` | `"model"` |
| `id` | Full HF ID (`{author}/{name}` or paper ID) | `"Qwen/Qwen3-0.6B"` |
| `internalId` | HF's internal `_id` | `"645d.."` |
| `author` | Author / organization | `"Qwen"` |
| `name` | Short name (post-`/`) | `"Qwen3-0.6B"` |
| `description` | Description (datasets only typically) | `"..."` |
| `downloads` | Total downloads (all time) | `18506640` |
| `likes` | Community likes | `1248` |
| `trendingScore` | HF trending algorithm score | `42.3` |
| `tags` | Full tag list (verbatim from HF) | `["transformers","safetensors","qwen3","..."]` |
| `languages` | Languages parsed from tags | `["en","fr"]` |
| `datasets` | Linked datasets (model only) | `["dataset:teknium/OpenHermes-2.5"]` |
| `baseModel` | Base-model lineage (model only) | `["base_model:meta-llama/Llama-3-8B"]` |
| `license` | License (parsed from tags or cardData) | `"apache-2.0"` |
| `gated` | Gating status (when `fetchDetails`) | `"manual"` / `false` |
| `private` | Private flag | `false` |
| `disabled` | Disabled flag | `false` |
| `lastModified` | Last-modified timestamp | `"2026-05-06T22:..."` |
| `createdAt` | Creation timestamp | `"2024-..."` |
| `sha` | Commit SHA | `"3866cf9..."` |
| `url` | Direct Hub URL | `"https://huggingface.co/Qwen/Qwen3-0.6B"` |
| `scrapedAt` | UTC scrape timestamp | `"2026-05-18T07:30:00Z"` |

#### Model-specific

| Field | Description |
|-------|-------------|
| `pipelineTag` | Primary task tag |
| `libraryName` | `transformers`, `diffusers`, etc. |
| `siblings` | File list (`fetchDetails`) |
| `fileCount` | Number of files |
| `modelIndex` | Evaluation results (`fetchDetails`) |
| `cardData` | Full card data dict (`fetchDetails`) |
| `readme` | Raw README.md (`fetchReadme`) |

#### Space-specific

| Field | Description |
|-------|-------------|
| `sdk` | `gradio` / `streamlit` / `docker` / `static` |
| `spaceRuntime` | Runtime status, hardware, sleep config |

#### Paper-specific

| Field | Description |
|-------|-------------|
| `paperId` | Arxiv-style ID |
| `paperAuthors` | List of author names |
| `paperSummary` | Abstract |
| `paperPublishedAt` | Publication date |
| `paperUpvotes` | HF Daily Papers upvotes |

---

### 🧪 Example inputs

#### 1. Top 200 text-generation models, with details and READMEs

```json
{
  "entityType": "models",
  "pipelineTag": "text-generation",
  "sort": "downloads",
  "sortDirection": "-1",
  "maxResults": 200,
  "fetchDetails": true,
  "fetchReadme": true
}
````

#### 2. Every Llama-3 fine-tune in the last 30 days

```json
{
  "entityType": "models",
  "search": "llama-3",
  "tags": ["base_model:meta-llama/Llama-3-8B"],
  "sort": "lastModified",
  "modifiedFrom": "2026-04-18",
  "maxResults": 1000
}
```

#### 3. Top 500 datasets for instruction tuning (English)

```json
{
  "entityType": "datasets",
  "search": "instruct",
  "language": "en",
  "tags": ["task_categories:text-generation"],
  "sort": "likes",
  "maxResults": 500
}
```

#### 4. Recent Daily Papers (last 14 days, top upvoted)

```json
{
  "entityType": "papers",
  "papersStartDate": "2026-05-04",
  "papersEndDate": "2026-05-18",
  "maxResults": 200
}
```

#### 5. All Gradio spaces from a specific org

```json
{
  "entityType": "spaces",
  "author": "huggingface",
  "sort": "likes",
  "maxResults": 100
}
```

#### 6. Apache-2.0 LLM models over 100k downloads, with full details

```json
{
  "entityType": "models",
  "pipelineTag": "text-generation",
  "tags": ["license:apache-2.0"],
  "sort": "downloads",
  "minDownloads": 100000,
  "maxResults": 500,
  "fetchDetails": true
}
```

#### 7. Turkish-language models

```json
{
  "entityType": "models",
  "language": "tr",
  "sort": "downloads",
  "maxResults": 200
}
```

***

### 🧠 How it works

1. **List endpoint** → `GET https://huggingface.co/api/{models|datasets|spaces}?search=&author=&pipeline_tag=&library=&language=&filter=&sort=&direction=&limit=100&cursor=...`
2. **Cursor pagination** — HF returns `Link: <url>; rel="next"` headers; the actor extracts the `cursor` param and walks forward until the result cap or exhaustion.
3. **Detail enrichment** → `GET https://huggingface.co/api/{type}/{id}` returns sibling file list, cardData, modelIndex, spaceRuntime, gated/disabled flags.
4. **README fetch** → `GET https://huggingface.co/{id}/raw/main/README.md` (or `/datasets/{id}/...` for datasets).
5. **Daily Papers** → `GET https://huggingface.co/api/daily_papers?date=YYYY-MM-DD` per day in the requested range; merged into a single dataset.
6. **Normalization** — every entity is mapped to the same flat record shape so cross-entity analytics are trivial.

No authentication. The Hugging Face Hub API is intentionally open.

***

### 🛑 Limits & notes

- **Hub size is enormous**. Listing all models (~1M) at the default page size takes ~10k API calls. Use filters aggressively.
- **`gated:auto` / `gated:manual` items** return metadata but file downloads from `siblings` require an HF token (out of scope here).
- **`private` items** are not returned by the public API.
- **HF rate limits**: ~600 req/min/IP for anonymous calls in normal conditions. The actor uses retry with backoff.
- **Daily Papers archive** goes back to ~2023-10. Older dates return empty arrays.
- **`fetchReadme`** can be heavy (each README is 5–100KB). For large runs, disable it and pull readmes only for the top items.

***

### 💰 Pricing

Monetized via **pay-per-event** on Apify — pay per item saved. Hugging Face API is free.

***

### ❓ FAQ

**Does this download the actual model weights?**
No — only metadata. The `url` and `siblings` fields give you direct download URLs (`https://huggingface.co/{id}/resolve/main/{file}`) which you can fetch downstream.

**Can I get model evaluation benchmarks (MMLU, ARC, etc.)?**
Yes — enable `fetchDetails`; `modelIndex` returns the structured evaluation results when the author has published them.

**Does it cover Hugging Face Inference Endpoints / Inference API status?**
No — those are gated/paid features. This actor is read-only Hub catalog data.

**Can I scrape user / organization profiles?**
Use `author=<name>` as a filter — every model/dataset/space from that org is enumerated. For dedicated user-page data, open an issue.

**How is this different from the official `huggingface_hub` Python library?**
The library is a thin client for the same endpoints. This actor adds: cross-entity normalization, pagination guardrails, README/sibling enrichment, dataset/collection mode, optional download/like floors, and Apify-native output (CSV/Excel/JSONL export, scheduling, webhook integrations).

**Will it work for private/enterprise models?**
No — token-based scraping is out of scope. The actor targets the public catalog only.

***

### 🔗 Related actors

- `logiover/github-repository-scraper` — combine with HF authors to find their GitHub orgs
- `logiover/substack-newsletter-scraper` — track which AI newsletter authors also publish on HF
- `logiover/apple-podcasts-episode-scraper` — find AI podcasts and join with HF model author names
- `logiover/sitemap-to-url-crawler` — crawl an HF author's portfolio website for full attribution

***

### 🆘 Support

Need user profiles, organization members, or HF Inference Endpoint status? Open an issue on the actor's Apify page.

***

### Changelog

- **2026-05-20** — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

*Last reviewed: 2026-05-20.*

# Actor input Schema

## `entityType` (type: `string`):

Which entity to enumerate from the Hugging Face Hub. 'models' = ML models (~1M+), 'datasets' = training/evaluation datasets (~200k+), 'spaces' = hosted demo apps, 'papers' = daily curated research papers, 'collections' = curated lists of models/datasets/spaces.

## `search` (type: `string`):

Free-text search over name + description. Examples: 'llama', 'whisper', 'mistral', 'phi', 'instruct'. Leave empty to enumerate all entities matching filters.

## `author` (type: `string`):

Restrict to a single author or organization (e.g. 'mistralai', 'meta-llama', 'openai-community', 'stabilityai'). Substring match.

## `pipelineTag` (type: `string`):

Filter models by primary task. Common values: 'text-generation', 'text-classification', 'token-classification', 'feature-extraction', 'sentence-similarity', 'fill-mask', 'translation', 'summarization', 'question-answering', 'image-to-text', 'text-to-image', 'image-classification', 'object-detection', 'audio-classification', 'automatic-speech-recognition' (ASR), 'text-to-speech' (TTS), 'reinforcement-learning'.

## `library` (type: `string`):

Filter by library. Common: 'transformers', 'diffusers', 'sentence-transformers', 'gguf', 'mlx', 'onnx', 'safetensors', 'pytorch', 'jax', 'tensorflow', 'datasets'.

## `language` (type: `string`):

Filter by primary language. ISO 639-1 codes ('en', 'fr', 'de', 'tr', 'zh', 'ja', 'es') or special: 'multilingual'.

## `tags` (type: `array`):

Restrict to items whose tag list contains all of these. Examples: \['safetensors','region:us'], \['license:apache-2.0','llama'], \['gated:false']. Hugging Face supports complex tag filters at the API level.

## `sort` (type: `string`):

Field to sort by. 'downloads' = most downloaded, 'likes' = community favorites, 'lastModified' = recently updated, 'createdAt' = newest creations, 'trendingScore' = HF's trending algorithm.

## `sortDirection` (type: `string`):

Sort direction: '-1' = descending (highest first), '1' = ascending.

## `maxResults` (type: `integer`):

Hard cap on records returned. Set to 0 for unlimited (auto-paginates). Be aware: the full models catalog is ~1M+ items.

## `fetchDetails` (type: `boolean`):

When enabled, for every model / dataset / space / paper, the actor makes an additional call to `/api/{type}/{id}` to fetch richer fields: full README content, sibling file list, model card data, dataset card data, config files, gated status, license, citation. Adds 1 HTTP request per item.

## `fetchReadme` (type: `boolean`):

When enabled (requires 'Fetch Full Details'), the actor also pulls the raw model/dataset card (README.md) for each item. Useful for AI training datasets, model documentation indexing, and RAG over the Hub.

## `minDownloads` (type: `integer`):

Drop entities with fewer than this many downloads. Useful for filtering out abandoned or experimental items.

## `minLikes` (type: `integer`):

Drop entities with fewer than this many likes.

## `modifiedFrom` (type: `string`):

Drop items last-modified before this date (YYYY-MM-DD). Useful for tracking what's new this week / month.

## `papersStartDate` (type: `string`):

For entityType='papers' only. Pull papers from the daily-papers archive starting at this date (YYYY-MM-DD). Defaults to last 30 days.

## `papersEndDate` (type: `string`):

For entityType='papers' only. Pull papers up to this date (YYYY-MM-DD). Defaults to today.

## Actor input object example

```json
{
  "entityType": "models",
  "search": "",
  "author": "",
  "pipelineTag": "",
  "library": "",
  "language": "",
  "tags": [],
  "sort": "downloads",
  "sortDirection": "-1",
  "maxResults": 500,
  "fetchDetails": false,
  "fetchReadme": false,
  "minDownloads": 0,
  "minLikes": 0,
  "modifiedFrom": null,
  "papersStartDate": null,
  "papersEndDate": null
}
```

# Actor output Schema

## `entityType` (type: `string`):

model | dataset | space | paper | collection

## `id` (type: `string`):

HF entity ID

## `author` (type: `string`):

Author/org

## `downloads` (type: `string`):

Total downloads

## `likes` (type: `string`):

Community likes

## `pipelineTag` (type: `string`):

Pipeline tag

## `libraryName` (type: `string`):

Library

## `license` (type: `string`):

License

## `lastModified` (type: `string`):

Last modified

## `createdAt` (type: `string`):

Creation date

## `url` (type: `string`):

Hub URL

## `scrapedAt` (type: `string`):

Scrape timestamp

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("logiover/huggingface-hub-intelligence-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("logiover/huggingface-hub-intelligence-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call logiover/huggingface-hub-intelligence-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=logiover/huggingface-hub-intelligence-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hugging Face Scraper — AI Models, Datasets, Spaces & Papers",
        "description": "Export every AI model, dataset, space and daily paper from the Hugging Face Hub. Filter by task, library (transformers, diffusers, GGUF), language, license, author. Sort by downloads, likes, trending. Sibling files + README. Public HF API, no token. For AI builders, ML research, RAG and VC AI intel.",
        "version": "1.0",
        "x-build-id": "SSfe7zG0uNfXGOkCc"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/logiover~huggingface-hub-intelligence-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-logiover-huggingface-hub-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/logiover~huggingface-hub-intelligence-scraper/runs": {
            "post": {
                "operationId": "runs-sync-logiover-huggingface-hub-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/logiover~huggingface-hub-intelligence-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-logiover-huggingface-hub-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "entityType": {
                        "title": "Entity Type",
                        "enum": [
                            "models",
                            "datasets",
                            "spaces",
                            "papers",
                            "collections"
                        ],
                        "type": "string",
                        "description": "Which entity to enumerate from the Hugging Face Hub. 'models' = ML models (~1M+), 'datasets' = training/evaluation datasets (~200k+), 'spaces' = hosted demo apps, 'papers' = daily curated research papers, 'collections' = curated lists of models/datasets/spaces.",
                        "default": "models"
                    },
                    "search": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Free-text search over name + description. Examples: 'llama', 'whisper', 'mistral', 'phi', 'instruct'. Leave empty to enumerate all entities matching filters.",
                        "default": ""
                    },
                    "author": {
                        "title": "Author / Organization Filter",
                        "type": "string",
                        "description": "Restrict to a single author or organization (e.g. 'mistralai', 'meta-llama', 'openai-community', 'stabilityai'). Substring match.",
                        "default": ""
                    },
                    "pipelineTag": {
                        "title": "Task / Pipeline Tag (models only)",
                        "type": "string",
                        "description": "Filter models by primary task. Common values: 'text-generation', 'text-classification', 'token-classification', 'feature-extraction', 'sentence-similarity', 'fill-mask', 'translation', 'summarization', 'question-answering', 'image-to-text', 'text-to-image', 'image-classification', 'object-detection', 'audio-classification', 'automatic-speech-recognition' (ASR), 'text-to-speech' (TTS), 'reinforcement-learning'.",
                        "default": ""
                    },
                    "library": {
                        "title": "Library Filter (models / datasets)",
                        "type": "string",
                        "description": "Filter by library. Common: 'transformers', 'diffusers', 'sentence-transformers', 'gguf', 'mlx', 'onnx', 'safetensors', 'pytorch', 'jax', 'tensorflow', 'datasets'.",
                        "default": ""
                    },
                    "language": {
                        "title": "Language Tag (models / datasets)",
                        "type": "string",
                        "description": "Filter by primary language. ISO 639-1 codes ('en', 'fr', 'de', 'tr', 'zh', 'ja', 'es') or special: 'multilingual'.",
                        "default": ""
                    },
                    "tags": {
                        "title": "Additional Tags Filter",
                        "type": "array",
                        "description": "Restrict to items whose tag list contains all of these. Examples: ['safetensors','region:us'], ['license:apache-2.0','llama'], ['gated:false']. Hugging Face supports complex tag filters at the API level.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "sort": {
                        "title": "Sort By",
                        "enum": [
                            "downloads",
                            "likes",
                            "lastModified",
                            "createdAt",
                            "trendingScore"
                        ],
                        "type": "string",
                        "description": "Field to sort by. 'downloads' = most downloaded, 'likes' = community favorites, 'lastModified' = recently updated, 'createdAt' = newest creations, 'trendingScore' = HF's trending algorithm.",
                        "default": "downloads"
                    },
                    "sortDirection": {
                        "title": "Sort Direction",
                        "enum": [
                            "-1",
                            "1"
                        ],
                        "type": "string",
                        "description": "Sort direction: '-1' = descending (highest first), '1' = ascending.",
                        "default": "-1"
                    },
                    "maxResults": {
                        "title": "Maximum Results",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Hard cap on records returned. Set to 0 for unlimited (auto-paginates). Be aware: the full models catalog is ~1M+ items.",
                        "default": 500
                    },
                    "fetchDetails": {
                        "title": "Fetch Full Details Per Item",
                        "type": "boolean",
                        "description": "When enabled, for every model / dataset / space / paper, the actor makes an additional call to `/api/{type}/{id}` to fetch richer fields: full README content, sibling file list, model card data, dataset card data, config files, gated status, license, citation. Adds 1 HTTP request per item.",
                        "default": false
                    },
                    "fetchReadme": {
                        "title": "Fetch README Content",
                        "type": "boolean",
                        "description": "When enabled (requires 'Fetch Full Details'), the actor also pulls the raw model/dataset card (README.md) for each item. Useful for AI training datasets, model documentation indexing, and RAG over the Hub.",
                        "default": false
                    },
                    "minDownloads": {
                        "title": "Minimum Downloads (client-side filter)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Drop entities with fewer than this many downloads. Useful for filtering out abandoned or experimental items.",
                        "default": 0
                    },
                    "minLikes": {
                        "title": "Minimum Likes (client-side filter)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Drop entities with fewer than this many likes.",
                        "default": 0
                    },
                    "modifiedFrom": {
                        "title": "Modified From (ISO date)",
                        "type": "string",
                        "description": "Drop items last-modified before this date (YYYY-MM-DD). Useful for tracking what's new this week / month.",
                        "default": null
                    },
                    "papersStartDate": {
                        "title": "Daily Papers — Start Date",
                        "type": "string",
                        "description": "For entityType='papers' only. Pull papers from the daily-papers archive starting at this date (YYYY-MM-DD). Defaults to last 30 days.",
                        "default": null
                    },
                    "papersEndDate": {
                        "title": "Daily Papers — End Date",
                        "type": "string",
                        "description": "For entityType='papers' only. Pull papers up to this date (YYYY-MM-DD). Defaults to today.",
                        "default": null
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
