HuggingFace Models Scraper avatar

HuggingFace Models Scraper

Pricing

from $2.00 / 1,000 model scrapeds

Go to Apify Store
HuggingFace Models Scraper

HuggingFace Models Scraper

Scrapes AI/ML models from HuggingFace (huggingface.co/models) via the official API. Extracts model ID, downloads, likes, task type, library, tags, and more. Supports search, author/org filter, pipeline tag filter, and sort order.

Pricing

from $2.00 / 1,000 model scrapeds

Rating

0.0

(0)

Developer

tzmyk

tzmyk

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Scrape AI/ML models from HuggingFace — the world's largest repository of open-source machine learning models.

Extracts structured data including model ID, download counts, likes, task type (pipeline tag), ML library, tags, gated status, and timestamps. Powered by the official HuggingFace API — no web scraping, no rate-limit surprises.

What it does

  • Fetches models from the HuggingFace public API with full metadata
  • Supports filtering by keyword search, author/organization, task type, and library
  • Supports sorting by downloads, likes, or date
  • Paginates automatically up to your specified limit (up to 10,000 models)

Use cases

  • AI research: Track which models are trending by downloads or likes
  • Competitive intelligence: Monitor what models a specific organization has published
  • Dataset building: Collect model metadata for ML benchmarks or surveys
  • Lead generation: Find organizations actively publishing models in your domain
  • Content & newsletters: Curate the most popular or newest models by task type

Input

FieldTypeDefaultDescription
searchstringKeyword search to filter models
authorstringFilter by author or organization (e.g. meta-llama)
pipelineTagstringFilter by task type (e.g. text-generation, image-classification)
libraryNamestringFilter by ML library (e.g. transformers, diffusers)
sortselectdownloadsSort by: downloads, likes, createdAt, lastModified
maxModelsinteger100Max models to return (1–10,000)

Example input

{
"search": "llama",
"pipelineTag": "text-generation",
"sort": "downloads",
"maxModels": 50
}

Output

One record per model saved to the default dataset.

FieldTypeDescription
modelIdstringFull model ID (e.g. meta-llama/Llama-3.1-8B-Instruct)
authorstring|nullAuthor or organization name
downloadsnumber|nullTotal download count
likesnumber|nullLike count
pipelineTagstring|nullTask type (e.g. text-generation)
libraryNamestring|nullML library (e.g. transformers)
tagsstring[]All tags including datasets, licenses, frameworks
gatedboolean|nullWhether model access requires approval
createdAtstring|nullCreation date (ISO 8601)
lastModifiedstring|nullLast modified date (ISO 8601)
urlstringDirect URL to the model page
scrapedAtstringTimestamp when this record was scraped

Example output

{
"modelId": "sentence-transformers/all-MiniLM-L6-v2",
"author": "sentence-transformers",
"downloads": 208493944,
"likes": 4598,
"pipelineTag": "sentence-similarity",
"libraryName": "sentence-transformers",
"tags": ["sentence-transformers", "pytorch", "onnx", "license:apache-2.0"],
"gated": false,
"createdAt": "2022-03-02T23:29:05.000Z",
"lastModified": "2025-03-06T13:37:44.000Z",
"url": "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2",
"scrapedAt": "2026-03-22T03:46:43.767Z"
}

Features

  • Official API — Uses the HuggingFace REST API directly; no fragile HTML parsing
  • Automatic pagination — Fetches all pages until your limit is reached
  • Polite rate limiting — 500ms delay between API calls
  • Robust input validation — Clear error messages for invalid inputs

Notes

  • Results are limited to public models only; private models are not accessible
  • The gated field indicates whether a model requires access approval from the author
  • HuggingFace API does not support combining search with all sort orders equally; downloads sort works best for broad searches
  • Download counts are 30-day rolling totals as reported by HuggingFace

Support

Found a bug or have a feature request? Please open an issue or contact the author through the Apify platform.