# Hugging Face Models Scraper - AI/ML Data (`benthepythondev/huggingface-models-scraper`) Actor

Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.

- **URL**: https://apify.com/benthepythondev/huggingface-models-scraper.md
- **Developed by:** [ben](https://apify.com/benthepythondev) (community)
- **Categories:** AI, Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🤗 Hugging Face Models Scraper

Search **Hugging Face** for AI/ML **models** or **datasets** by keyword and get clean, structured data — id, author, task (pipeline tag), downloads, likes, library, tags, license, created/updated dates and URL. Powered by the public Hugging Face Hub API, so it's fast and reliable: no browser, no login, no API key, no blocks.

Built for AI/ML market research, model discovery, trend tracking and building model/dataset catalogs. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.

### 🔎 What is the Hugging Face Models Scraper?

Give it keywords (e.g. "llama", "whisper") and it returns matching models (or datasets) as structured rows, sorted by downloads, likes, trending or last-modified — optionally filtered by task. Perfect for finding the most popular models in a niche and tracking how they move over time.

#### What data does it extract?

- **Id, author and name**
- **Task** (text-generation, ASR, image-classification, …) and **library**
- **Downloads** (recent + all-time) and **likes**
- **Trending score**
- **Tags** and **license**
- **Gated / private** flags
- **Created** and **last-modified** dates and the **URL**

### ⬇️ Input

| Field | Type | Description |
|-------|------|-------------|
| `searchTerms` | array | Keywords to search, e.g. `llama`. |
| `type` | string | `model` or `dataset`. |
| `sort` | string | downloads, likes, lastModified or trendingScore. |
| `task` | string | Optional pipeline tag, e.g. `text-generation`. |
| `maxPerTerm` | integer | Max results per term. Default `25`. |

#### Example input

```json
{
  "searchTerms": ["llama", "mistral"],
  "type": "model",
  "sort": "downloads",
  "maxPerTerm": 50
}
````

### ⬆️ Output

One record per model:

```json
{
  "id": "meta-llama/Llama-3.1-8B-Instruct",
  "type": "model",
  "author": "meta-llama",
  "name": "Llama-3.1-8B-Instruct",
  "task": "text-generation",
  "library": "transformers",
  "downloads": 3120044,
  "likes": 3815,
  "trending_score": 41,
  "tags": ["transformers", "safetensors", "llama", "conversational"],
  "license": "llama3.1",
  "gated": "manual",
  "last_modified": "2026-05-12T10:21:33.000Z",
  "url": "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct",
  "query": "llama"
}
```

### 💡 Use cases

- 🤖 **AI/ML research** — find the most-downloaded models for a task.
- 📈 **Trend tracking** — monitor likes/downloads over time.
- 🗂️ **Catalogs** — build a dataset of models for analysis or a dashboard.
- 🔌 **LLM / app pipelines** — feed structured model metadata into your tools.

### ❓ FAQ

**Do I need an API key or login?** No — it uses the public Hugging Face Hub API.

**Models and datasets?** Both — set `type`.

**Can I filter by task?** Yes — set `task` (pipeline tag) for models.

**How is it sorted?** By downloads, likes, trending or last-modified.

**Does it include license info?** Yes — parsed from the model tags.

**How does pricing work?** Pay per model returned. No subscription.

**Is it legal?** It uses the public Hugging Face Hub API. Use responsibly and within their terms.

### ⚙️ How it works

The scraper calls the Hugging Face Hub API directly and returns clean rows — no browser, no login and no API key to manage. That keeps runs fast, cheap and dependable, and it's why the actor keeps passing its daily health check instead of breaking on an anti-bot wall. You give it keywords, choose a sort and limit, and it requests the full model metadata and de-duplicates as it goes. The same input shape works whether you want the top 10 models or thousands across many queries — only `maxPerTerm` changes.

### 👥 Who uses Hugging Face data?

Model and dataset metadata is valuable to ML engineers, researchers, founders and analysts. A researcher finds the strongest baselines for a task; a founder tracks which open models are gaining traction; an analyst builds a leaderboard of downloads and likes; a tool maker feeds the structured data into a recommender or dashboard. Because every record is plain JSON with consistent fields, it drops straight into a spreadsheet, database, BI tool or LLM pipeline with no custom parsing.

### 📤 Export, schedule & integrate

Every run is saved to a dataset you can export to **JSON, CSV, Excel, XML or RSS**, or pull through the **Apify API**. Wire it into **Make, Zapier, n8n, Google Sheets, Slack** or your **own database**, run it on a **schedule** (hourly, daily or weekly) to keep your data fresh, and call it from AI agents through the **Apify MCP server**.

### 💡 Tips for best results

- Sort by `trendingScore` to catch rising models early.
- Use `task` to focus on one modality (e.g. `automatic-speech-recognition`).
- Schedule recurring runs and diff the output to track download/like growth.
- Combine model + dataset runs to map a whole research area.

### ❓ More FAQ

**How fresh is the data?** It is fetched live on each run — schedule runs to keep it current.

**Can I get more results?** Yes — raise `maxPerTerm`; it requests more from the Hub.

**Can I run it automatically?** Yes — use Apify Schedules (cron) for hands-off runs.

**Which export formats?** JSON, CSV, Excel, XML and RSS, plus the Apify API.

**Can AI agents use it?** Yes — it's available via the Apify API and MCP server.

### 🔗 You might also like

- [GitHub Repository Scraper](https://apify.com/benthepythondev/github-repository-scraper) — repos, stars & topics.
- [PyPI Package Scraper](https://apify.com/benthepythondev/pypi-package-scraper) — Python package data.
- [arXiv Papers Scraper](https://apify.com/benthepythondev/arxiv-papers-scraper) — AI/ML research papers.

***

**Keywords:** hugging face scraper, huggingface api, ai models data, ml model metadata, model downloads, model discovery, llm research, ai market research, huggingface datasets, model leaderboard, transformers, ai trends, machine learning data, model catalog

# Actor input Schema

## `searchTerms` (type: `array`):

Keywords to search models/datasets, e.g. 'llama', 'whisper'.

## `type` (type: `string`):

Models or datasets.

## `sort` (type: `string`):

Ranking.

## `task` (type: `string`):

Pipeline tag, e.g. 'text-generation', 'automatic-speech-recognition'.

## `maxPerTerm` (type: `integer`):

Max results per search term.

## Actor input object example

```json
{
  "searchTerms": [
    "llama"
  ],
  "type": "model",
  "sort": "downloads",
  "maxPerTerm": 25
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchTerms": [
        "llama"
    ],
    "type": "model",
    "sort": "downloads"
};

// Run the Actor and wait for it to finish
const run = await client.actor("benthepythondev/huggingface-models-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchTerms": ["llama"],
    "type": "model",
    "sort": "downloads",
}

# Run the Actor and wait for it to finish
run = client.actor("benthepythondev/huggingface-models-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchTerms": [
    "llama"
  ],
  "type": "model",
  "sort": "downloads"
}' |
apify call benthepythondev/huggingface-models-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=benthepythondev/huggingface-models-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hugging Face Models Scraper - AI/ML Data",
        "description": "Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.",
        "version": "1.0",
        "x-build-id": "2Q7A50LT0vNCWL1wR"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/benthepythondev~huggingface-models-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-benthepythondev-huggingface-models-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/benthepythondev~huggingface-models-scraper/runs": {
            "post": {
                "operationId": "runs-sync-benthepythondev-huggingface-models-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/benthepythondev~huggingface-models-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-benthepythondev-huggingface-models-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchTerms": {
                        "title": "Search terms",
                        "type": "array",
                        "description": "Keywords to search models/datasets, e.g. 'llama', 'whisper'.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "type": {
                        "title": "Type",
                        "enum": [
                            "model",
                            "dataset"
                        ],
                        "type": "string",
                        "description": "Models or datasets.",
                        "default": "model"
                    },
                    "sort": {
                        "title": "Sort by",
                        "enum": [
                            "downloads",
                            "likes",
                            "lastModified",
                            "trendingScore"
                        ],
                        "type": "string",
                        "description": "Ranking.",
                        "default": "downloads"
                    },
                    "task": {
                        "title": "Task filter (optional)",
                        "type": "string",
                        "description": "Pipeline tag, e.g. 'text-generation', 'automatic-speech-recognition'."
                    },
                    "maxPerTerm": {
                        "title": "Max per term",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Max results per search term.",
                        "default": 25
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
