Hugging Face Models Scraper - AI/ML Data
Pricing
from $2.00 / 1,000 results
Hugging Face Models Scraper - AI/ML Data
Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
ben
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
π€ Hugging Face Models Scraper
Search Hugging Face for AI/ML models or datasets by keyword and get clean, structured data β id, author, task (pipeline tag), downloads, likes, library, tags, license, created/updated dates and URL. Powered by the public Hugging Face Hub API, so it's fast and reliable: no browser, no login, no API key, no blocks.
Built for AI/ML market research, model discovery, trend tracking and building model/dataset catalogs. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.
π What is the Hugging Face Models Scraper?
Give it keywords (e.g. "llama", "whisper") and it returns matching models (or datasets) as structured rows, sorted by downloads, likes, trending or last-modified β optionally filtered by task. Perfect for finding the most popular models in a niche and tracking how they move over time.
What data does it extract?
- Id, author and name
- Task (text-generation, ASR, image-classification, β¦) and library
- Downloads (recent + all-time) and likes
- Trending score
- Tags and license
- Gated / private flags
- Created and last-modified dates and the URL
β¬οΈ Input
| Field | Type | Description |
|---|---|---|
searchTerms | array | Keywords to search, e.g. llama. |
type | string | model or dataset. |
sort | string | downloads, likes, lastModified or trendingScore. |
task | string | Optional pipeline tag, e.g. text-generation. |
maxPerTerm | integer | Max results per term. Default 25. |
Example input
{"searchTerms": ["llama", "mistral"],"type": "model","sort": "downloads","maxPerTerm": 50}
β¬οΈ Output
One record per model:
{"id": "meta-llama/Llama-3.1-8B-Instruct","type": "model","author": "meta-llama","name": "Llama-3.1-8B-Instruct","task": "text-generation","library": "transformers","downloads": 3120044,"likes": 3815,"trending_score": 41,"tags": ["transformers", "safetensors", "llama", "conversational"],"license": "llama3.1","gated": "manual","last_modified": "2026-05-12T10:21:33.000Z","url": "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct","query": "llama"}
π‘ Use cases
- π€ AI/ML research β find the most-downloaded models for a task.
- π Trend tracking β monitor likes/downloads over time.
- ποΈ Catalogs β build a dataset of models for analysis or a dashboard.
- π LLM / app pipelines β feed structured model metadata into your tools.
β FAQ
Do I need an API key or login? No β it uses the public Hugging Face Hub API.
Models and datasets? Both β set type.
Can I filter by task? Yes β set task (pipeline tag) for models.
How is it sorted? By downloads, likes, trending or last-modified.
Does it include license info? Yes β parsed from the model tags.
How does pricing work? Pay per model returned. No subscription.
Is it legal? It uses the public Hugging Face Hub API. Use responsibly and within their terms.
βοΈ How it works
The scraper calls the Hugging Face Hub API directly and returns clean rows β no browser, no login and no API key to manage. That keeps runs fast, cheap and dependable, and it's why the actor keeps passing its daily health check instead of breaking on an anti-bot wall. You give it keywords, choose a sort and limit, and it requests the full model metadata and de-duplicates as it goes. The same input shape works whether you want the top 10 models or thousands across many queries β only maxPerTerm changes.
π₯ Who uses Hugging Face data?
Model and dataset metadata is valuable to ML engineers, researchers, founders and analysts. A researcher finds the strongest baselines for a task; a founder tracks which open models are gaining traction; an analyst builds a leaderboard of downloads and likes; a tool maker feeds the structured data into a recommender or dashboard. Because every record is plain JSON with consistent fields, it drops straight into a spreadsheet, database, BI tool or LLM pipeline with no custom parsing.
π€ Export, schedule & integrate
Every run is saved to a dataset you can export to JSON, CSV, Excel, XML or RSS, or pull through the Apify API. Wire it into Make, Zapier, n8n, Google Sheets, Slack or your own database, run it on a schedule (hourly, daily or weekly) to keep your data fresh, and call it from AI agents through the Apify MCP server.
π‘ Tips for best results
- Sort by
trendingScoreto catch rising models early. - Use
taskto focus on one modality (e.g.automatic-speech-recognition). - Schedule recurring runs and diff the output to track download/like growth.
- Combine model + dataset runs to map a whole research area.
β More FAQ
How fresh is the data? It is fetched live on each run β schedule runs to keep it current.
Can I get more results? Yes β raise maxPerTerm; it requests more from the Hub.
Can I run it automatically? Yes β use Apify Schedules (cron) for hands-off runs.
Which export formats? JSON, CSV, Excel, XML and RSS, plus the Apify API.
Can AI agents use it? Yes β it's available via the Apify API and MCP server.
π You might also like
- GitHub Repository Scraper β repos, stars & topics.
- PyPI Package Scraper β Python package data.
- arXiv Papers Scraper β AI/ML research papers.
Keywords: hugging face scraper, huggingface api, ai models data, ml model metadata, model downloads, model discovery, llm research, ai market research, huggingface datasets, model leaderboard, transformers, ai trends, machine learning data, model catalog