Pricing

from $3.00 / 1,000 results

HuggingFace Hub Scraper

Scrape Hugging Face Hub, search and fetch models, datasets, and spaces with full metadata: downloads, likes, license, pipeline tag, library, tags, files, and more. Pure HTTP, no auth required.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What this actor does

7 modes: search, byModel, byDataset, bySpace, byUser, trending, byUrl
Three entity catalogs: models, datasets, spaces (search & trending pivot on entityType)
Filters: pipeline tag, library, license, language, author/org, min downloads, min likes
Server-side sort: trending score, downloads, likes, last modified, created at
URL auto-detection: paste any huggingface.co/<repo> or /datasets/<id> or /spaces/<id> or /users/<u> URL — the actor figures out the kind
Optional ?full=true: include sibling files, cardData, config metadata
Empty fields are omitted — every record only contains populated fields

Output

The actor emits a flat record per repo / user. Fields you might see (omit-empty applies):

Common

recordType — model / dataset / space / user
repoId — full Hub identifier (e.g. google-bert/bert-base-uncased)
owner — author / organization slug
sha, createdAt, lastModified
downloads, likes, trendingScore
license, tags, languages
scrapedAt

Model-only

modelName, modelType, architectures[]
pipelineTag, libraryName
trainedOnDatasets[], arxivIds[]
maskToken, fileCount, files[]
modelUrl

Dataset-only

datasetName, description
taskCategories[], taskIds[]
modalities[], formats[], sizeCategories[]
paperswithcodeId, fileCount, files[]
datasetUrl

Space-only

spaceName, sdk, runtimeStage
title, emoji, host, subdomain
fileCount, files[]
spaceUrl

User-only

username, fullName, avatarUrl, isPro
numModels, numDatasets, numSpaces
numFollowers, numFollowing, numLikes, numUpvotes
numPapers, numDiscussions
orgs[], profileUrl

Input

Field	Type	Default	Description
`mode`	enum	`search`	One of the 7 modes
`entityType`	enum	`models`	`models` / `datasets` / `spaces` (mode=search/trending)
`searchQuery`	string	`bert`	Free-text query
`repoIds`	array	–	Repo IDs or URLs (mode=byModel/byDataset/bySpace)
`username`	string	–	Username (mode=byUser)
`startUrls`	array	–	Hub URLs (mode=byUrl) — kind auto-detected
`pipelineTag`	enum	–	Filter models by task tag (43 options)
`libraryName`	enum	–	Filter models by library (17 options)
`authorFilter`	string	–	Constrain to org/author slug
`license`	enum	–	License filter (29 options)
`language`	string	–	2-letter language code
`sort`	enum	–	`trendingScore` / `downloads` / `likes` / `lastModified` / `createdAt`
`direction`	enum	`desc`	`desc` or `asc`
`minDownloads`	integer	–	Drop records below this download count
`minLikes`	integer	–	Drop records below this like count
`includeFullDetails`	boolean	`false`	Pass `?full=true` for siblings/config
`maxItems`	integer	`50`	Hard cap (1–10000)

Examples

Search top BERT models

{
  "mode": "search",
  "entityType": "models",
  "searchQuery": "bert",
  "sort": "downloads",
  "maxItems": 50
}

{
  "mode": "trending",
  "entityType": "models",
  "pipelineTag": "text-generation",
  "maxItems": 25
}

Lookup a specific dataset

{
  "mode": "byDataset",
  "repoIds": ["rajpurkar/squad_v2"]
}

Lookup by URL (auto-detect)

{
  "mode": "byUrl",
  "startUrls": [
    "https://huggingface.co/google-bert/bert-base-uncased",
    "https://huggingface.co/datasets/squad",
    "https://huggingface.co/spaces/lmarena-ai/chatbot-arena"
  ]
}

User profile

{
  "mode": "byUser",
  "username": "julien-c"
}

Reliability

Direct calls to the official huggingface.co/api/* endpoints
Exponential backoff retries on 429, 500–504
Page size capped at 100 (the API hard cap); paginated via ?skip=N&limit=N
No proxy needed — works from datacenter IPs
No cookies / API token required for read access

Limitations

Private repos require a user access token; this actor only exposes the public read API.
The language and license filters are forwarded to the Hub API and applied server-side. The Hub's filtering is best-effort: some matching repos lack a language:<code> / license:<id> tag (the metadata lives in the model card). If you need strict tag-based filtering, post-filter the dataset on languages[] / license.
The ?full=true flag is rate-limited harder by the upstream; expect slower runs when enabled at large maxItems.
Single-segment legacy repo IDs (e.g. bert-base-uncased) are auto-resolved by the API to their canonical owner-prefixed form (e.g. google-bert/bert-base-uncased).

FAQ

Do I need a Hugging Face account / API token? No. The Hub's read API is public.

How fresh is the data? Real-time — every run hits the live API.

Can I download model weights? No. This actor exposes Hub metadata — repo info, files list, license, tags, etc. To download weights, use the huggingface_hub Python library with the repoId from this actor's output.

Why are some fields missing? Empty / null fields are omitted — only populated fields appear in the output.

Why does my license filter return fewer results than expected? Many repos don't tag their license. Records without a license:* tag are excluded when the license filter is set.

Huggingface Intelligence Scraper

mattdef/huggingface-intelligence-scraper

Scrape Hugging Face models, datasets, and spaces via public API. Get downloads, likes, trending models, pipeline tags, and more. Perfect for AI market research.

Matthieu Cast

HuggingFace Hub Scraper - Models, Datasets, Spaces

wetyr_corporation/huggingface-hub-scraper

Bulk extract AI models, datasets, and Spaces from HuggingFace. Filter by task, library, license, author. Pulls downloads, likes, tags, model cards.

WETYR

Hugging Face Insights Scraper — Models, Datasets & Spaces

brilliant_gum/huggingface-insights-scraper

Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.

Yuliia Kulakova

HuggingFace Hub Scraper - Models, Datasets, Spaces & Authors

makework36/huggingface-hub-scraper

Scrape HuggingFace Hub: models, datasets, spaces. 30+ fields per record, trending filters, author profiles, parsed tags, web enrichment for emails & websites.

deusex machine

Hugging Face Models Scraper — Search, Downloads, Likes, Tags

seemuapps/huggingface-models-scraper

Search Hugging Face for models by task, tag, or keyword and export downloads, likes, library, license, and tags to a clean dataset.

Andrew

Hugging Face Scraper

straightforward_hydra/huggingface-scraper

AI model intelligence from the open Hugging Face Hub API: trending models, datasets and spaces by task, author and library. No API key.

Dev D

HuggingFace Trending Models, Datasets & Spaces Scraper

outofboundslab/hf-trending-scraper

Scrape trending models, datasets, and spaces from HuggingFace Hub. Get download counts, likes, tags, pipeline types, licenses, and more. Sort by downloads, likes, or trending. Filter by task type.

Julian Bracaglia

Hugging Face Hub API

alizarin_refrigerator-owner/hugging-face-hub

Access the Hugging Face Hub API to search & discover models, datasets & spaces. Search Models: Find ML models by name, task or library Search Datasets: Discover datasets for training & evaluation Search Spaces: Explore ML applications Get Metadata: Retrieve detailed repo information

The Howlers

Hugging Face Scraper: Trending AI Models, Datasets & Spaces

scrapemint/huggingface-ai-models-scraper

Track trending AI models, datasets, and Spaces on Hugging Face. One row per item with downloads, likes, trending score, tags, pipeline type, and license. Search by keyword, author, or tag. No login, no API key. Pay per row.

Ken M

HuggingFace Scraper — Models, Datasets & Spaces

devilscrapes/huggingface-hub-scraper

Export models, datasets, and Spaces from the HuggingFace Hub API — filter by task, library, or author, with a trending snapshot mode — to JSON or CSV. Richer schema than incumbents: downloads, likes, tags, license, last-modified. No login.