Huggingface Model Scraper avatar

Huggingface Model Scraper

Pricing

Pay per usage

Go to Apify Store
Huggingface Model Scraper

Huggingface Model Scraper

Huggingface Model Scraper. Extract structured data with automatic pagination, proxy rotation, and JSON/CSV export. Pay only for results.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny

Donny

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Share

Hugging Face Model Scraper

What it does

Hugging Face Model Scraper extracts detailed information about machine learning models hosted on the Hugging Face Hub. It queries the Hugging Face API to retrieve model metadata including model IDs, authors, download counts, likes, tags, pipeline types, and modification dates. You can search for models by keyword, task type, or any search term supported by the Hugging Face platform. Results are sorted by download count to surface the most popular and widely-used models first.

Why use it

The Hugging Face Hub hosts hundreds of thousands of machine learning models, making manual discovery and comparison time-consuming. This actor automates the process of searching and cataloging models, which is valuable for ML engineers evaluating model options, researchers tracking model popularity trends, and companies conducting competitive analysis in the AI space. By extracting structured data, you can easily compare models across metrics like downloads, likes, and supported tasks without manually browsing through individual model pages.

How it works

  1. The actor accepts a search query and maximum results count as input parameters.
  2. It constructs a request to the Hugging Face API endpoint with the specified search parameters.
  3. Using CheerioCrawler, it fetches and parses the JSON response from the API.
  4. For each model returned, it extracts key metadata fields and formats them into a consistent structure.
  5. Results are sorted by download count (descending) and pushed to the Apify dataset.
  6. If no models match the search query, a fallback record is created to indicate empty results.

Input parameters

ParameterTypeDefaultDescription
searchQueryStringtext-generationSearch term for finding models (e.g., text-generation, sentiment-analysis, gpt)
maxResultsInteger50Maximum number of models to return (1-200)

Output fields

FieldTypeDescription
modelIdStringFull model identifier (author/model-name)
authorStringModel author or organization
downloadsNumberTotal download count
likesNumberNumber of likes on the model page
tagsArrayList of tags associated with the model
pipelineStringPipeline task type (e.g., text-generation, image-classification)
lastModifiedStringDate of the last modification
urlStringDirect link to the model page on Hugging Face

Cost estimate

This actor uses Cheerio-based scraping with minimal resource consumption. A typical run fetching 50 models costs approximately $0.001 in Apify platform credits. The default 1024 MB memory setting provides ample resources for all standard queries.

Tips

  • Use specific pipeline task names like "text-generation", "text-classification", or "image-segmentation" for more targeted results.
  • Increase maxResults to 200 when you need comprehensive coverage of available models for a particular task.
  • Schedule regular runs to track how model popularity changes over time.
  • Combine results with the OpenAI Status Monitor to maintain a full picture of the AI tooling landscape.
  • Check out the ArXiv Paper Search actor to find the research papers behind popular models.