HuggingFace Model Scraper - AI/ML Model Data avatar

HuggingFace Model Scraper - AI/ML Model Data

Pricing

Pay per event

Go to Apify Store
HuggingFace Model Scraper - AI/ML Model Data

HuggingFace Model Scraper - AI/ML Model Data

Scrape AI/ML model metadata from the HuggingFace Hub. Extract model names, task types, download counts, likes, libraries, authors, tags, licenses, model sizes, and model card excerpts. Filter by task type, library, author, and search query.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 hours ago

Last modified

Share

Extract comprehensive AI/ML model metadata from the HuggingFace Hub. The HuggingFace Hub hosts over 1 million public models and is the central repository for the AI/ML community. This actor queries the public HuggingFace API to retrieve model names, task types, download counts, popularity metrics, licenses, libraries, and model card excerpts.

What You Can Do

  • Browse top models sorted by total downloads, likes, trending score, or recently modified
  • Filter by task type (text-generation, image-classification, sentence-similarity, and 25+ other pipeline tags)
  • Filter by ML library (transformers, diffusers, sentence-transformers, GGUF, ONNX, and more)
  • Filter by author/organization (meta-llama, google, microsoft, BAAI, Qwen, etc.)
  • Search by keyword across model names and descriptions
  • Extract model card excerpts — first 500 characters of each model's README
  • Get spaces usage — count of HuggingFace Spaces using each model
  • Retrieve dataset provenance — datasets referenced in model card metadata

Use Cases

  • AI market intelligence — track which models are gaining downloads and likes
  • VC and investment research — monitor model ecosystem trends by organization
  • Enterprise model evaluation — shortlist foundation models by task type, license, and popularity
  • Competitive analysis — compare model adoption across ML libraries and providers
  • Dataset discovery — find which training datasets are most commonly used

Input Parameters

ParameterDescriptionDefault
searchQuerySearch across model names and descriptions
pipelineTagFilter by task type (text-generation, image-classification, etc.)All tasks
libraryFilter by ML framework (transformers, diffusers, gguf, etc.)All libraries
authorFilter by author or organization usernameAll authors
sortBySort by downloads, likes, lastModified, or trendingdownloads
maxItemsMaximum number of records to return (0 = unlimited)10
proxyConfigurationOptional proxy settingsDisabled

Output Fields

Each record contains:

FieldTypeDescription
model_idstringFull model identifier (e.g., meta-llama/Llama-3.3-70B-Instruct)
model_namestringShort model name without the author prefix
pipeline_tagstringPrimary task type (text-generation, sentence-similarity, etc.)
downloads_totalintegerTotal all-time download count
downloads_30dintegerDownload count in the last 30 days (when available)
likesintegerNumber of likes on HuggingFace
librarystringPrimary ML library (transformers, diffusers, etc.)
authorstringModel author or organization username
tagsarrayTags including language, dataset references, and framework tags
licensestringLicense identifier (apache-2.0, mit, llama3.3, etc.)
model_size_paramsstringParameter count if encoded in tags (7B, 13B, 70B, etc.)
last_modifiedstringISO 8601 timestamp of last update
readme_excerptstringFirst 500 characters of the model card README
spaces_countintegerNumber of HuggingFace Spaces using this model
datasets_usedarrayDatasets referenced in model card metadata

Example Output

{
"model_id": "sentence-transformers/all-MiniLM-L6-v2",
"model_name": "all-MiniLM-L6-v2",
"pipeline_tag": "sentence-similarity",
"downloads_total": 262278076,
"downloads_30d": null,
"likes": 4833,
"library": "sentence-transformers",
"author": "sentence-transformers",
"tags": ["sentence-transformers", "pytorch", "onnx", "safetensors", "bert", "en"],
"license": "apache-2.0",
"model_size_params": null,
"last_modified": "2025-03-06T13:37:44.000Z",
"readme_excerpt": "# all-MiniLM-L6-v2\nThis is a sentence-transformers model...",
"spaces_count": 100,
"datasets_used": ["s2orc", "ms_marco", "gooaq", "natural_questions"]
}

Technical Notes

  • No authentication required — uses the public HuggingFace Hub API
  • No proxy required — the API is publicly accessible without IP restrictions
  • Rate limits — generous unauthenticated limits; a courtesy 100ms delay is applied between detail fetches
  • Pagination — handles cursor-based pagination automatically, allowing retrieval of any number of models
  • Two-pass enrichment — basic metadata is retrieved from the list endpoint; detailed fields (readme_excerpt, spaces_count, datasets_used) are fetched from the model detail endpoint

Data Source

HuggingFace Hub APIhttps://huggingface.co/api/models