Huggingface Model Scraper
Pricing
Pay per usage
Huggingface Model Scraper
Huggingface Model Scraper. Extract structured data with automatic pagination, proxy rotation, and JSON/CSV export. Pay only for results.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
Hugging Face Model Scraper
What it does
Hugging Face Model Scraper extracts detailed information about machine learning models hosted on the Hugging Face Hub. It queries the Hugging Face API to retrieve model metadata including model IDs, authors, download counts, likes, tags, pipeline types, and modification dates. You can search for models by keyword, task type, or any search term supported by the Hugging Face platform. Results are sorted by download count to surface the most popular and widely-used models first.
Why use it
The Hugging Face Hub hosts hundreds of thousands of machine learning models, making manual discovery and comparison time-consuming. This actor automates the process of searching and cataloging models, which is valuable for ML engineers evaluating model options, researchers tracking model popularity trends, and companies conducting competitive analysis in the AI space. By extracting structured data, you can easily compare models across metrics like downloads, likes, and supported tasks without manually browsing through individual model pages.
How it works
- The actor accepts a search query and maximum results count as input parameters.
- It constructs a request to the Hugging Face API endpoint with the specified search parameters.
- Using CheerioCrawler, it fetches and parses the JSON response from the API.
- For each model returned, it extracts key metadata fields and formats them into a consistent structure.
- Results are sorted by download count (descending) and pushed to the Apify dataset.
- If no models match the search query, a fallback record is created to indicate empty results.
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQuery | String | text-generation | Search term for finding models (e.g., text-generation, sentiment-analysis, gpt) |
maxResults | Integer | 50 | Maximum number of models to return (1-200) |
Output fields
| Field | Type | Description |
|---|---|---|
modelId | String | Full model identifier (author/model-name) |
author | String | Model author or organization |
downloads | Number | Total download count |
likes | Number | Number of likes on the model page |
tags | Array | List of tags associated with the model |
pipeline | String | Pipeline task type (e.g., text-generation, image-classification) |
lastModified | String | Date of the last modification |
url | String | Direct link to the model page on Hugging Face |
Cost estimate
This actor uses Cheerio-based scraping with minimal resource consumption. A typical run fetching 50 models costs approximately $0.001 in Apify platform credits. The default 1024 MB memory setting provides ample resources for all standard queries.
Tips
- Use specific pipeline task names like "text-generation", "text-classification", or "image-segmentation" for more targeted results.
- Increase
maxResultsto 200 when you need comprehensive coverage of available models for a particular task. - Schedule regular runs to track how model popularity changes over time.
- Combine results with the OpenAI Status Monitor to maintain a full picture of the AI tooling landscape.
- Check out the ArXiv Paper Search actor to find the research papers behind popular models.