Pricing

from $2.00 / 1,000 results

Hugging Face Scraper - Models, Datasets, Papers

Hugging Face data export tool: scrape models, datasets & daily papers without a token. Export to CSV/JSON. A no-login Hugging Face API alternative.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Actor stats

Bookmarked

Total users

Monthly active users

6 days ago

Last modified

Hugging Face Scraper — Models, Datasets, Spaces & Daily Papers (No Token)

Scrape and export every AI model, dataset, space and daily research paper from the Hugging Face Hub — the world's largest open AI repository (1M+ models, 200k+ datasets). Filter by task, library, language, license and author; sort by downloads, likes or trending; and get flat, export-ready rows with downloads, likes, license, tags, timestamps and optional full model cards. Reads the public HF Hub API — no token, no login, no proxy.

🏆 Why this Hugging Face scraper?

20+ fields per record · thousands of models/datasets per run · reads the public Hugging Face Hub API (no token) · filter by task / library / language / license / author · optional README + sibling files · export to JSON / CSV / Excel. The no-login Hugging Face API alternative for AI/ML tracking, RAG corpora and VC AI intelligence.

✨ What this Actor does / Key features

🤗 5 entity types — models (~1M+), datasets (~200k+), spaces (hosted demos), daily curated research papers and collections.
🔎 Deep filtering — free-text search, author/org, task (pipeline tag), library (transformers, diffusers, GGUF, MLX, ONNX…), language and arbitrary tags.
📊 Popularity signals — downloads, likes and HF trending score, plus minDownloads / minLikes client-side thresholds to drop abandoned items.
🔀 Flexible sorting — by downloads, likes, recently updated, recently created or trending, ascending or descending.
🗓️ Recency & date windows — modifiedFrom for "what's new", plus dedicated papersStartDate / papersEndDate for the daily-papers archive.
📄 Rich per-item detail — enable fetchDetails and fetchReadme to add the full model/dataset card, sibling file list, license, gated status and citation.
🏷️ Parsed HF tag system — library, languages, base model, datasets and license extracted from the Hub's tag list.
⚡ Auto-pagination — set maxResults to 0 and the Actor walks the entire matching catalog for you.
🔑 No token, no login, no proxy — the public HF Hub API serves metadata anonymously.

🚀 Quick start (3 steps)

Configure — choose an Entity Type (models, datasets, spaces, daily papers, collections) and add filters (search, author, task, library, language).
Run — click Start. The Actor auto-paginates the matching catalog and streams rows into your dataset.
Get your data — open the Output tab and export to JSON, CSV, Excel or XML, or pull it via the Apify API.

📥 Input

Pick an entityType and add whichever filters you need — everything else is optional.

Example — top text-generation GGUF models by downloads

{
  "entityType": "models",
  "pipelineTag": "text-generation",
  "library": "gguf",
  "sort": "downloads",
  "sortDirection": "-1",
  "maxResults": 500
}

Example — track one org's latest releases

{
  "entityType": "models",
  "author": "mistralai",
  "sort": "lastModified",
  "sortDirection": "-1",
  "fetchDetails": true,
  "fetchReadme": true,
  "maxResults": 200
}

Example — daily research papers this month

{
  "entityType": "papers",
  "papersStartDate": "2026-06-06",
  "papersEndDate": "2026-07-06",
  "sort": "trendingScore",
  "maxResults": 0
}

Field	Type	Description
`entityType`	string (enum)	`models`, `datasets`, `spaces`, `papers` or `collections`.
`search`	string	Free-text search over name + description (e.g. `llama`, `whisper`, `mistral`).
`author`	string	Restrict to a single author/org (e.g. `mistralai`, `meta-llama`, `stabilityai`).
`pipelineTag`	string	Filter models by task, e.g. `text-generation`, `automatic-speech-recognition`, `text-to-image`.
`library`	string	Filter by library, e.g. `transformers`, `diffusers`, `gguf`, `mlx`, `onnx`.
`language`	string	Filter by language (ISO 639-1: `en`, `fr`, `de`, `tr`, `zh`… or `multilingual`).
`tags`	array	Restrict to items whose tag list contains all of these, e.g. `["license:apache-2.0","llama"]`.
`sort`	string (enum)	`downloads`, `likes`, `lastModified`, `createdAt` or `trendingScore`.
`sortDirection`	string (enum)	`-1` descending (default) or `1` ascending.
`maxResults`	integer	Hard cap on records. `0` = unlimited (auto-paginates the full catalog).
`fetchDetails`	boolean	Fetch richer per-item fields (file list, model/dataset card data, gated status, license).
`fetchReadme`	boolean	Also pull the raw README/model card (requires `fetchDetails`).
`minDownloads` / `minLikes`	integer	Drop entities below this download/like threshold.
`modifiedFrom`	string	Drop items last-modified before this date (`YYYY-MM-DD`).
`papersStartDate` / `papersEndDate`	string	Daily-papers date window (`papers` only; defaults to last 30 days).

📤 Output

One row per model / dataset / space / paper — exportable to JSON, CSV, Excel or XML. The Output tab also ships a ready-made Overview table. Here is a trimmed sample record:

{
  "entityType": "model",
  "id": "meta-llama/Meta-Llama-3-8B-Instruct",
  "name": "Meta-Llama-3-8B-Instruct",
  "author": "meta-llama",
  "title": "Meta Llama 3 8B Instruct",
  "description": "Meta Llama 3 family of instruction-tuned models.",
  "downloads": 1893421,
  "likes": 3921,
  "trendingScore": 128,
  "pipelineTag": "text-generation",
  "libraryName": "transformers",
  "tags": ["text-generation", "safetensors", "llama", "license:llama3"],
  "languages": ["en"],
  "baseModel": "meta-llama/Meta-Llama-3-8B",
  "license": "llama3",
  "gated": true,
  "private": false,
  "disabled": false,
  "fileCount": 17,
  "createdAt": "2024-04-17T09:35:12.000Z",
  "lastModified": "2026-06-28T14:02:41.000Z",
  "url": "https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct",
  "scrapedAt": "2026-07-06T12:00:00.000Z"
}

💡 Use cases

AI / ML model tracking — monitor new and trending models for a task or library (e.g. all text-generation GGUF models) over time.
Competitive AI intelligence — watch a specific org's releases (mistralai, meta-llama, stabilityai) by downloads and likes.
Dataset discovery — enumerate training/evaluation datasets by language, task or license for sourcing.
RAG / fine-tuning corpora — pull model and dataset cards (READMEs) at scale to index the Hub.
Research & VC scouting — track daily papers and rising models to spot emerging AI trends early.
License auditing — filter by license to build a compliant shortlist of models for commercial use.

👥 Who uses it

AI/ML engineers & researchers · MLOps and platform teams · RAG and fine-tuning builders indexing the Hub · VC and market analysts scouting AI trends · data journalists tracking the open-model landscape · developer-tool vendors powering model-search features.

💰 Pricing

This Actor runs on a simple pay-per-result model — you pay for the records you extract, with no separate Apify platform fees to calculate. Try it on the free tier first, then scale up. See the Pricing tab on this page for the current rate.

❓ Frequently Asked Questions

Is it legal to scrape Hugging Face? The Actor reads publicly available Hub metadata served by the official public API. You are responsible for using the data in compliance with Hugging Face's terms and each repo's model/dataset license.

Does Hugging Face have a public API? Is this a Hugging Face API alternative? Yes — this Actor wraps the public Hugging Face Hub API and returns flat, export-ready rows, so it works as a no-login Hugging Face API alternative for teams that just want bulk metadata without writing API calls or handling pagination.

Do I need a Hugging Face token or login? No. The public Hub API serves model, dataset, space and paper metadata anonymously — no token, account or proxy required, only an Apify account.

Can I scrape Hugging Face without a token? Yes. The Actor reads the public Hub API anonymously, so you can scrape Hugging Face without a token, login or proxy — as a dataset-list scraper, model scraper or daily-papers scraper.

How do I export Hugging Face data to CSV or JSON? Run the Actor with your filters, then download the resulting dataset as CSV, JSON, JSONL, Excel or XML from the run page or via the Apify API.

How much data can I get? The full models catalog is ~1M+ items and datasets ~200k+. Set maxResults to 0 to auto-paginate the entire matching catalog, or narrow with filters to keep runs fast and focused.

Can I scrape all models for a specific task or library? Yes. Set the pipeline tag (e.g. text-generation) and/or library (e.g. gguf, diffusers), set maxResults to 0, and the Actor paginates the whole matching catalog.

Can I pull model cards / READMEs? Yes. Enable fetchDetails and fetchReadme to add the raw model/dataset card and richer fields (file list, license, gated status) to each record — ideal for RAG and indexing.

How do I track a specific organization's model releases? Enter the org name in the author field (e.g. mistralai or meta-llama) and sort by lastModified to monitor that team's latest models and datasets by date.

🔗 More AI & research intelligence scrapers by logiover

Building an AI-intelligence pipeline? Pair the HF Hub scraper with the rest of the suite:

Scraper	Actor
📦 npm packages	npm Package Intelligence Scraper
🐙 GitHub	GitHub Repository Scraper · GitHub Activity Stream
📄 Papers	arXiv Paper Scraper · Semantic Scholar Research Scraper
📰 News	Google News Scraper · News Intelligence Scraper
🔬 Research	AI Deep Research · Company Deep Research Scraper
🛡️ Security	CVE Security Advisory Monitor
🔎 Web extract	AI Web Extract · AI Web Search
📚 Docs	Docs Knowledge Base Scraper
🔑 SEO	SERP Keyword Research

👉 Browse all logiover scrapers on Apify Store — 180+ actors across real estate, jobs, crypto, social media & B2B data.

⏰ Scheduling & integration

Schedule this Actor on Apify to keep an always-fresh view of the open-AI landscape — new models this week, a rival org's releases, or the daily-papers feed. Export results to JSON, CSV or Excel, pull them through the Apify API, or connect the dataset to Google Sheets, webhooks or your data warehouse via Make, n8n or Zapier.

⭐ Support & feedback

Found a bug or need an extra field? Open an issue on the Issues tab — response is usually fast. If this Actor saves you time, a ★★★★★ review on the Store page genuinely helps and is hugely appreciated. 🙏

⚖️ Legal

This Actor extracts only publicly available Hub metadata and is intended for legitimate research, analytics and tooling use. You are responsible for complying with Hugging Face's terms of service, each repository's license, and any applicable local laws.

📝 Changelog

2026-07-06

✨ README overhaul: richer output sample with the real schema, ready-to-run example scenarios, full field reference, AI-intelligence suite cross-links, and clearer quick-start.

2026-07-01

Maintenance pass: re-verified end-to-end on live data within the 5-minute quality window on the default input. Sharpened Store SEO metadata and expanded the FAQ with high-intent long-tail questions. Added ready-to-run example tasks.

2026-06-15

Reliability pass: re-verified end-to-end on live data with real-world inputs. Routine maintenance build.

2026-06-07

Docs: added coverage for Hugging Face API alternative, exporting Hub data to CSV/JSON, and scraping Hugging Face without a token.

Hugging Face Scraper — AI Models & Datasets

hichemdev/huggingface-scraper

Scrape Hugging Face models and datasets: downloads, likes, task, library, tags, author and dates. Track trending AI models via the official Hugging Face API.

Hichem Ben Moussa

Hugging Face Trending Scraper

funny_electrician/Korak1903

Hugging Face Trending Scraper: Tracks daily trending models and datasets to provide market intelligence.

Milton Gardener

Hugging Face Insights Scraper — Models, Datasets & Spaces

brilliant_gum/huggingface-insights-scraper

Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.

Yuliia Kulakova

Hugging Face Models Scraper

fetch_cat/hugging-face-models-scraper

🤗 Scrape public Hugging Face model metadata, downloads, likes, tags, licenses, and update signals for AI market research.

Hanna Nosova

Hugging Face Scraper

straightforward_hydra/huggingface-scraper

AI model intelligence from the open Hugging Face Hub API: trending models, datasets and spaces by task, author and library. No API key.

Dev D

Hugging Face Models Scraper — Search, Downloads, Likes, Tags

seemuapps/huggingface-models-scraper

Search Hugging Face for models by task, tag, or keyword and export downloads, likes, library, license, and tags to a clean dataset.

Andrew

Hugging Face Models Scraper - AI/ML Data

benthepythondev/huggingface-models-scraper

Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.

Ben

Hugging Face Scraper - Trending Models, Datasets & Spaces

arjunannamalai/huggingface-trending-scraper

Scrape trending, most-downloaded and most-liked Hugging Face models, datasets and spaces. Filter by author, task or keyword. No token required.

Arjun Annamalai

Hugging Face Model & Dataset Scraper

cloud9_ai/huggingface-scraper

Search and extract ML models and datasets from Hugging Face Hub. Get model cards, download stats, tasks, and architectures. No API key needed.

cloud9

Hugging Face Scraper - Models Datasets Spaces

openclawmara/huggingface-scraper

Scrape Hugging Face models, datasets, and Spaces. Extracts metadata, downloads, likes, tags, and usage stats. Ideal for AI model discovery, competitive analysis, and tracking trending ML resources.