HuggingFace Model Scraper - AI/ML Model Data
Pricing
Pay per event
HuggingFace Model Scraper - AI/ML Model Data
Scrape AI/ML model metadata from the HuggingFace Hub. Extract model names, task types, download counts, likes, libraries, authors, tags, licenses, model sizes, and model card excerpts. Filter by task type, library, author, and search query.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 hours ago
Last modified
Categories
Share
Extract comprehensive AI/ML model metadata from the HuggingFace Hub. The HuggingFace Hub hosts over 1 million public models and is the central repository for the AI/ML community. This actor queries the public HuggingFace API to retrieve model names, task types, download counts, popularity metrics, licenses, libraries, and model card excerpts.
What You Can Do
- Browse top models sorted by total downloads, likes, trending score, or recently modified
- Filter by task type (text-generation, image-classification, sentence-similarity, and 25+ other pipeline tags)
- Filter by ML library (transformers, diffusers, sentence-transformers, GGUF, ONNX, and more)
- Filter by author/organization (meta-llama, google, microsoft, BAAI, Qwen, etc.)
- Search by keyword across model names and descriptions
- Extract model card excerpts — first 500 characters of each model's README
- Get spaces usage — count of HuggingFace Spaces using each model
- Retrieve dataset provenance — datasets referenced in model card metadata
Use Cases
- AI market intelligence — track which models are gaining downloads and likes
- VC and investment research — monitor model ecosystem trends by organization
- Enterprise model evaluation — shortlist foundation models by task type, license, and popularity
- Competitive analysis — compare model adoption across ML libraries and providers
- Dataset discovery — find which training datasets are most commonly used
Input Parameters
| Parameter | Description | Default |
|---|---|---|
searchQuery | Search across model names and descriptions | — |
pipelineTag | Filter by task type (text-generation, image-classification, etc.) | All tasks |
library | Filter by ML framework (transformers, diffusers, gguf, etc.) | All libraries |
author | Filter by author or organization username | All authors |
sortBy | Sort by downloads, likes, lastModified, or trending | downloads |
maxItems | Maximum number of records to return (0 = unlimited) | 10 |
proxyConfiguration | Optional proxy settings | Disabled |
Output Fields
Each record contains:
| Field | Type | Description |
|---|---|---|
model_id | string | Full model identifier (e.g., meta-llama/Llama-3.3-70B-Instruct) |
model_name | string | Short model name without the author prefix |
pipeline_tag | string | Primary task type (text-generation, sentence-similarity, etc.) |
downloads_total | integer | Total all-time download count |
downloads_30d | integer | Download count in the last 30 days (when available) |
likes | integer | Number of likes on HuggingFace |
library | string | Primary ML library (transformers, diffusers, etc.) |
author | string | Model author or organization username |
tags | array | Tags including language, dataset references, and framework tags |
license | string | License identifier (apache-2.0, mit, llama3.3, etc.) |
model_size_params | string | Parameter count if encoded in tags (7B, 13B, 70B, etc.) |
last_modified | string | ISO 8601 timestamp of last update |
readme_excerpt | string | First 500 characters of the model card README |
spaces_count | integer | Number of HuggingFace Spaces using this model |
datasets_used | array | Datasets referenced in model card metadata |
Example Output
{"model_id": "sentence-transformers/all-MiniLM-L6-v2","model_name": "all-MiniLM-L6-v2","pipeline_tag": "sentence-similarity","downloads_total": 262278076,"downloads_30d": null,"likes": 4833,"library": "sentence-transformers","author": "sentence-transformers","tags": ["sentence-transformers", "pytorch", "onnx", "safetensors", "bert", "en"],"license": "apache-2.0","model_size_params": null,"last_modified": "2025-03-06T13:37:44.000Z","readme_excerpt": "# all-MiniLM-L6-v2\nThis is a sentence-transformers model...","spaces_count": 100,"datasets_used": ["s2orc", "ms_marco", "gooaq", "natural_questions"]}
Technical Notes
- No authentication required — uses the public HuggingFace Hub API
- No proxy required — the API is publicly accessible without IP restrictions
- Rate limits — generous unauthenticated limits; a courtesy 100ms delay is applied between detail fetches
- Pagination — handles cursor-based pagination automatically, allowing retrieval of any number of models
- Two-pass enrichment — basic metadata is retrieved from the list endpoint; detailed fields (readme_excerpt, spaces_count, datasets_used) are fetched from the model detail endpoint
Data Source
HuggingFace Hub API — https://huggingface.co/api/models