Hugging Face Models Scraper - AI/ML Data avatar

Hugging Face Models Scraper - AI/ML Data

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Hugging Face Models Scraper - AI/ML Data

Hugging Face Models Scraper - AI/ML Data

Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

ben

ben

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

πŸ€— Hugging Face Models Scraper

Search Hugging Face for AI/ML models or datasets by keyword and get clean, structured data β€” id, author, task (pipeline tag), downloads, likes, library, tags, license, created/updated dates and URL. Powered by the public Hugging Face Hub API, so it's fast and reliable: no browser, no login, no API key, no blocks.

Built for AI/ML market research, model discovery, trend tracking and building model/dataset catalogs. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.

πŸ”Ž What is the Hugging Face Models Scraper?

Give it keywords (e.g. "llama", "whisper") and it returns matching models (or datasets) as structured rows, sorted by downloads, likes, trending or last-modified β€” optionally filtered by task. Perfect for finding the most popular models in a niche and tracking how they move over time.

What data does it extract?

  • Id, author and name
  • Task (text-generation, ASR, image-classification, …) and library
  • Downloads (recent + all-time) and likes
  • Trending score
  • Tags and license
  • Gated / private flags
  • Created and last-modified dates and the URL

⬇️ Input

FieldTypeDescription
searchTermsarrayKeywords to search, e.g. llama.
typestringmodel or dataset.
sortstringdownloads, likes, lastModified or trendingScore.
taskstringOptional pipeline tag, e.g. text-generation.
maxPerTermintegerMax results per term. Default 25.

Example input

{
"searchTerms": ["llama", "mistral"],
"type": "model",
"sort": "downloads",
"maxPerTerm": 50
}

⬆️ Output

One record per model:

{
"id": "meta-llama/Llama-3.1-8B-Instruct",
"type": "model",
"author": "meta-llama",
"name": "Llama-3.1-8B-Instruct",
"task": "text-generation",
"library": "transformers",
"downloads": 3120044,
"likes": 3815,
"trending_score": 41,
"tags": ["transformers", "safetensors", "llama", "conversational"],
"license": "llama3.1",
"gated": "manual",
"last_modified": "2026-05-12T10:21:33.000Z",
"url": "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct",
"query": "llama"
}

πŸ’‘ Use cases

  • πŸ€– AI/ML research β€” find the most-downloaded models for a task.
  • πŸ“ˆ Trend tracking β€” monitor likes/downloads over time.
  • πŸ—‚οΈ Catalogs β€” build a dataset of models for analysis or a dashboard.
  • πŸ”Œ LLM / app pipelines β€” feed structured model metadata into your tools.

❓ FAQ

Do I need an API key or login? No β€” it uses the public Hugging Face Hub API.

Models and datasets? Both β€” set type.

Can I filter by task? Yes β€” set task (pipeline tag) for models.

How is it sorted? By downloads, likes, trending or last-modified.

Does it include license info? Yes β€” parsed from the model tags.

How does pricing work? Pay per model returned. No subscription.

Is it legal? It uses the public Hugging Face Hub API. Use responsibly and within their terms.

βš™οΈ How it works

The scraper calls the Hugging Face Hub API directly and returns clean rows β€” no browser, no login and no API key to manage. That keeps runs fast, cheap and dependable, and it's why the actor keeps passing its daily health check instead of breaking on an anti-bot wall. You give it keywords, choose a sort and limit, and it requests the full model metadata and de-duplicates as it goes. The same input shape works whether you want the top 10 models or thousands across many queries β€” only maxPerTerm changes.

πŸ‘₯ Who uses Hugging Face data?

Model and dataset metadata is valuable to ML engineers, researchers, founders and analysts. A researcher finds the strongest baselines for a task; a founder tracks which open models are gaining traction; an analyst builds a leaderboard of downloads and likes; a tool maker feeds the structured data into a recommender or dashboard. Because every record is plain JSON with consistent fields, it drops straight into a spreadsheet, database, BI tool or LLM pipeline with no custom parsing.

πŸ“€ Export, schedule & integrate

Every run is saved to a dataset you can export to JSON, CSV, Excel, XML or RSS, or pull through the Apify API. Wire it into Make, Zapier, n8n, Google Sheets, Slack or your own database, run it on a schedule (hourly, daily or weekly) to keep your data fresh, and call it from AI agents through the Apify MCP server.

πŸ’‘ Tips for best results

  • Sort by trendingScore to catch rising models early.
  • Use task to focus on one modality (e.g. automatic-speech-recognition).
  • Schedule recurring runs and diff the output to track download/like growth.
  • Combine model + dataset runs to map a whole research area.

❓ More FAQ

How fresh is the data? It is fetched live on each run β€” schedule runs to keep it current.

Can I get more results? Yes β€” raise maxPerTerm; it requests more from the Hub.

Can I run it automatically? Yes β€” use Apify Schedules (cron) for hands-off runs.

Which export formats? JSON, CSV, Excel, XML and RSS, plus the Apify API.

Can AI agents use it? Yes β€” it's available via the Apify API and MCP server.

πŸ”— You might also like


Keywords: hugging face scraper, huggingface api, ai models data, ml model metadata, model downloads, model discovery, llm research, ai market research, huggingface datasets, model leaderboard, transformers, ai trends, machine learning data, model catalog