Hugging Face Model Scraper avatar
Hugging Face Model Scraper

Pricing

Pay per event

Go to Apify Store
Hugging Face Model Scraper

Hugging Face Model Scraper

Developed by

ParseForge

ParseForge

Maintained by Community

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames.

0.0 (0)

Pricing

Pay per event

0

2

2

Last modified

5 days ago

πŸ€– Hugging Face Intelligence Scraper (Models)

Collect model intelligence from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames. Built for analysts, researchers, and developers who need fast insights with no browser automation.

🎯 What does it collect?

βœ… Model id, name, URL
βœ… Author
βœ… Downloads, likes
βœ… Last modified, createdAt
βœ… Task (pipeline tag), library
βœ… License, tags

How to use

[YouTube video embed or link]

Example run: query β€œbert”, 20 items, sorted by downloads.

Input

Fields supported:

  • query string β€” free text search
  • task string β€” e.g., text-classification, image-classification, text-generation
  • library string β€” e.g., transformers, diffusers, timm
  • license string β€” e.g., apache-2.0, mit, cc-by-4.0
  • language string β€” e.g., en, zh, multi
  • sort enum β€” downloads | likes | lastModified | trending
  • direction enum β€” asc | desc
  • maxItems integer β€” max models to return

Here's what the filled-out input schema looks like:

Input Configuration

And here it is written in JSON:

{
"query": "bert",
"sort": "downloads",
"direction": "desc",
"maxItems": 100
}

Pro Tip: Combine multiple filters to narrow down results. For example, search for "bert" models with task "text-classification" and library "transformers" for highly targeted results.

Output

After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as CSV, Excel, or JSON.

Here's an example of scraped Hugging Face model data:

Output Example

{
"id": "google-bert/bert-base-uncased",
"name": "google-bert/bert-base-uncased",
"url": "https://huggingface.co/google-bert/bert-base-uncased",
"author": "google-bert",
"downloads": 54018364,
"likes": 2423,
"private": false,
"gated": false,
"disabled": false,
"sha": "86b5e0934494bd15c9632b12f734a8a67f723594",
"lastModified": "2024-02-19T11:06:12.000Z",
"createdAt": "2022-03-02T23:29:04.000Z",
"task": "fill-mask",
"library": "transformers",
"license": "apache-2.0",
"language": ["en"],
"datasets": ["bookcorpus", "wikipedia"],
"tags": ["exbert"],
"files": [
".gitattributes",
"LICENSE",
"README.md",
"config.json",
"model.safetensors",
"pytorch_model.bin",
"tokenizer.json",
"tokenizer_config.json",
"vocab.txt"
]
}

What You Get: Complete model metadata including popularity metrics (downloads, likes), technical details (task, library, license), training information (datasets, language), and available model files.

Download Options: CSV, Excel, or JSON formats for easy analysis in your business tools

⚑ Why choose this scraper?

βœ… API-first, fast: Uses Hugging Face public API endpoints (no browser)
βœ… Flexible filtering: query, task, library, license, language, sorting
βœ… Comprehensive data: Get downloads, likes, tasks, licenses, files, and more
βœ… User-Friendly: No coding neededβ€”just set filters and go

⏰ Time Savings: Save hours compared to manual model research and tracking
πŸ’° Cost Efficiency: Fraction of the cost of maintaining custom tracking infrastructure

πŸ”§ How to use

  1. πŸ“ Sign Up: Create a free Apify account (takes 2 minutes)
  2. πŸ” Find the Scraper: Visit the Hugging Face Intelligence Scraper page
  3. βš™οΈ Set Input: Add your filters and max items
  4. πŸš€ Run It: Click "Start" and let it collect your data
  5. πŸ“₯ Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON

⏱️ Total Time: 5 minutes setup, 10-30 minutes for data collection
🎯 No Technical Skills Required: Everything is point-and-click

Business Use Cases

AI/ML Researchers:

  • Track trending models in your research area
  • Monitor model performance metrics (downloads, likes)
  • Identify popular architectures and libraries
  • Discover datasets used for training

ML Engineers:

  • Find production-ready models for specific tasks
  • Compare models by popularity and recency
  • Identify licensing requirements before deployment
  • Track model updates and new releases

Data Scientists:

  • Build comprehensive model catalogs
  • Analyze AI/ML trends and adoption patterns
  • Identify suitable pre-trained models for projects
  • Monitor emerging techniques and libraries

Product Managers:

  • Track competitive AI/ML landscape
  • Monitor adoption of different model types
  • Identify popular solutions for product features
  • Support AI strategy with market intelligence

Integrate with any app and automate your workflow

Hugging Face Intelligence Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.

These includes:

Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever a run successfully finishes.

Using with the Apify API

For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing business tools.

  • Node.js: Install the apify-client NPM package
  • Python: Use the apify-client PyPI package
  • See the Apify API reference for full details

πŸ’° Pricing

  • Start price: $0.005 per run
  • Price per 1,000 results: $5.00 (i.e., $0.005 per result)

Non-paying users must set maxItems (max 100). Paying users can set up to 1,000,000, and if not defined, maxItems is unlimited.

Frequently Asked Questions

Q: How accurate is the data? A: We collect data directly from Hugging Face's public API in real-time, ensuring the most up-to-date and accurate information available.

Q: Can I schedule regular runs? A: Yes! Use the Apify scheduler or API to schedule daily, weekly, or monthly runs automatically. Perfect for tracking model trends over time.

Q: What's the rate limit? A: We respect Hugging Face's API limits. The scraper handles rate limiting automatically.

Q: Can I get model descriptions and READMEs? A: Currently, the scraper focuses on metadata. For full READMEs, you can use the model URLs provided in the output.

Q: What if I need help? A: Our support team is available. Contact us through the Apify platform.

Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.

Need Help? Our support team is here to help you get the most out of this tool.