Pricing

Pay per event

HuggingFace Model Scraper - AI/ML Model Data

Scrape AI/ML model metadata from the HuggingFace Hub. Extract model names, task types, download counts, likes, libraries, authors, tags, licenses, model sizes, and model card excerpts. Filter by task type, library, author, and search query.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

HuggingFace Model Scraper

Scrape AI/ML model metadata from the HuggingFace Hub — over 1M public models. Returns model name, task type, download counts, likes, library, author, tags, license, parameter size, model-card README excerpt, Spaces count, and referenced datasets. Filter by task, library, author, or free-text search.

HuggingFace Scraper Features

Queries HuggingFace's public API directly. No HTML scraping, no authentication.
Filters by task type, ML library, author, or search query — combine them as needed
Sorts by downloads, likes, trending, or last-modified, depending on which axis matters
Enriches each model with a model-card README excerpt, Spaces count, and dataset references from the cardData YAML front matter
Extracts license identifier (apache-2.0, mit, cc-by-4.0, etc.) and parameter size (7B, 13B, 70B) from tag patterns
Handles cursor pagination via the API's RFC 5988 Link headers, so a 10K-result run walks itself
No proxies needed. The Hub is public.

Who Uses HuggingFace Model Data?

ML engineers — find the most-downloaded models for a specific task or framework without browsing the Hub manually
AI tooling builders — feed model metadata into agent platforms, model routers, or evaluation harnesses
Researchers — track adoption signals (downloads, likes, Spaces count) across model families over time
Procurement and licensing teams — pull license identifiers across hundreds of models in one pass for compliance review
Market analysts — monitor which authors and organizations are gaining traction on the Hub

How HuggingFace Scraper Works

Pick your filters: task type (text-generation, image-classification), library (transformers, diffusers), author, or search query. Sort by downloads, likes, trending, or last modified.
The scraper hits the HuggingFace /api/models endpoint with your filters and walks every page of results using cursor pagination from the response Link header.
For each model in the list, a follow-up detail fetch pulls Spaces count, cardData datasets, and the README. The README is stripped of YAML front matter and truncated to a 500-character excerpt.

Input

Top text-generation models by downloads

{
  "pipelineTag": "text-generation",
  "sortBy": "downloads",
  "maxItems": 50
}

All models from a single author

{
  "author": "meta-llama",
  "sortBy": "downloads",
  "maxItems": 100
}

Free-text search

{
  "searchQuery": "llama",
  "library": "transformers",
  "sortBy": "likes",
  "maxItems": 25
}

Field	Type	Default	Description
`searchQuery`	string	`""`	Free-text search across model names, authors, and descriptions. Empty means browse all.
`pipelineTag`	string	`""`	Filter by primary task type (text-generation, image-classification, automatic-speech-recognition, etc.). Empty means all tasks.
`library`	string	`""`	Filter by ML framework (transformers, diffusers, sentence-transformers, timm, etc.). Empty means all libraries.
`author`	string	`""`	Filter by author or organization (e.g. `meta-llama`, `google`, `microsoft`).
`sortBy`	string	`downloads`	One of `downloads`, `likes`, `lastModified`, `trending`.
`maxItems`	integer	`10`	Maximum models to return. Set to `0` for unlimited — though the Hub has 1M+ public models, so filters are recommended.

HuggingFace Scraper Output Fields

{
  "model_name": "Meta-Llama-3-8B-Instruct",
  "model_id": "meta-llama/Meta-Llama-3-8B-Instruct",
  "pipeline_tag": "text-generation",
  "downloads_total": 4823910,
  "downloads_30d": 612400,
  "likes": 3812,
  "library": "transformers",
  "author": "meta-llama",
  "tags": ["text-generation", "conversational", "llama-3", "en"],
  "license": "llama3",
  "model_size_params": "8B",
  "last_modified": "2026-04-12T10:24:33.000Z",
  "readme_excerpt": "Meta Llama 3 is a family of large language models (LLMs) developed by Meta...",
  "spaces_count": 412,
  "datasets_used": ["meta-llama/Meta-Llama-3-eval"]
}

Field	Type	Description
`model_name`	string	Human-readable model name (without the author prefix)
`model_id`	string	Full model identifier in `author/model-name` format
`pipeline_tag`	string	Primary task type (text-generation, image-classification, etc.)
`downloads_total`	integer	All-time download count
`downloads_30d`	integer	Download count in the last 30 days
`likes`	integer	Number of likes on HuggingFace
`library`	string	Primary ML library (transformers, diffusers, etc.)
`author`	string	Model author or organization username
`tags`	string[]	Tags including language, dataset, and custom labels (license tags are stripped)
`license`	string	License identifier (apache-2.0, mit, cc-by-4.0, etc.)
`model_size_params`	string	Parameter count (7B, 13B, 70B, 175B) when present in tags
`last_modified`	string	ISO 8601 timestamp of the model's last update
`readme_excerpt`	string	First 500 characters of the model card README, YAML front matter stripped
`spaces_count`	integer	Number of HuggingFace Spaces that reference this model
`datasets_used`	string[]	Datasets declared in the model card's YAML front matter

FAQ

How do I scrape HuggingFace?

HuggingFace Scraper hits the public Hub API directly — no key, no login, no rate-limit pain at the default settings. Set your filters and the actor handles pagination and enrichment.

Does HuggingFace Scraper need proxies?

HuggingFace Scraper runs without proxies. The Hub API is publicly accessible and the actor stays well under the unauthenticated rate limit with a 100ms courtesy delay between detail fetches.

What data does HuggingFace Scraper return?

HuggingFace Scraper returns 15 fields per model — name, task, downloads (total and 30-day), likes, license, parameter size, tags, library, last-modified timestamp, README excerpt, Spaces count, and dataset references.

Can I filter HuggingFace models by license?

HuggingFace Scraper doesn't filter by license at the API level, but the license field is parsed from each model's tags. Run the scrape with your other filters and post-filter the dataset by license — apache-2.0, mit, or whatever the compliance review allows.

How much does HuggingFace Scraper cost to run?

HuggingFace Scraper uses pay-per-event pricing at the default 1.0 coefficient. You pay per record saved, so a 500-model run costs what 500 records cost. No browser time, no proxy bill.

Need More Features?

Need additional model fields, GGUF or safetensors filter, or model-card body extraction beyond the 500-char excerpt? File an issue or get in touch.

Why Use HuggingFace Scraper?

Direct API access — pulls structured JSON from the official Hub API, no HTML parsing, no breakage when the site redesigns
Enriched output — model-card README excerpt, Spaces count, and dataset references come from a second API call so each record carries more than just list-view metadata
Filter combinations — task + library + author + sort, all in one input, so you don't have to script the cartesian product yourself

HuggingFace Models Scraper

resounding_diplomacy/huggingface-models-scraper

Scrapes HuggingFace Hub for AI/ML models — trending, most downloaded, by task type, by author. Extracts model name, author, downloads, likes, task category, tags, pipeline tag, last modified, model card URL. Uses the HuggingFace JSON API for reliable structured data extraction.

alars num

Huggingface Models

david_flagg/huggingface-models

Scrape model metadata from HuggingFace Hub — the largest open-source ML model registry. Get downloads, likes, trending scores, licenses, tags, and architecture info for 1M+ models. Filter by task type, ML library, or author. Uses the official HF API — no auth required.

David Flagg

HuggingFace Models Scraper

solidcode/huggingface-co-scraper

[💰 $1.00 / 1K] Extract model metadata from the HuggingFace Hub — downloads, likes, trending score, task, library, license, tags, dates, and file lists. Search by keyword, filter by author, task, library, or tag, and sort by popularity or date.

SolidCode

HuggingFace Models Scraper

tzmyk/huggingface-models-scraper

Scrapes AI/ML models from HuggingFace (huggingface.co/models) via the official API. Extracts model ID, downloads, likes, task type, library, tags, and more. Supports search, author/org filter, pipeline tag filter, and sort order.

tzmyk

HuggingFace Model Tracker

optimus-fulcria/huggingface-model-tracker

Track trending, popular, and new AI models on HuggingFace. Monitor downloads, likes, trending scores. Filter by task type, library, or author. No API key required.

Fulcria Labs

HuggingFace Hub Scraper - Models, Datasets, Spaces

wetyr_corporation/huggingface-hub-scraper

Bulk extract AI models, datasets, and Spaces from HuggingFace. Filter by task, library, license, author. Pulls downloads, likes, tags, model cards.

WETYR

Hugging Face Model Explorer

lovely_radiologist/hf-model-explorer

Structured export of HF models with task, library, license, download count, and parsed model-card metadata. Built for AI teams doing model selection at scale.

Vivek Gaur

Hugging Face Models Scraper - Low-cost💲🔥🤖📌

delectable_incubator/hugging-face-models-scraper-low-cost

Scrape Hugging Face model listings 🤖📊 with a powerful AI model scraper. Extract model names, creators, downloads, likes, tags, update dates, model URLs, and popularity metrics from keyword searches. Ideal for AI research, model discovery, ecosystem monitoring and machine learning datasets 🚀