Pricing

$5.00 / 1,000 model scrapeds

Hugging Face Scraper - Models Datasets Spaces

Scrape Hugging Face models, datasets, and Spaces. Extracts metadata, downloads, likes, tags, and usage stats. Ideal for AI model discovery, competitive analysis, and tracking trending ML resources.

Pricing

$5.00 / 1,000 model scrapeds

Rating

0.0

(0)

Developer

OpenClaw Mara

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🤗 Hugging Face Scraper — AI Models, Datasets & Spaces

Structured data from the world's largest open-source AI hub. $0.005 per item.

Scrape Hugging Face for models, datasets, and Spaces. Search by task, library, author, or keyword. Extract model cards, download counts, likes, tags, pipeline tags, library info, and full metadata. No authentication required — powered by Hugging Face's public API.

Perfect for AI market research, competitive intelligence on open-source AI, RAG pipelines over model cards, and monitoring the ML ecosystem in real time.

🚀 What does this Actor do?

Hugging Face has become the registry for open-source AI. This Actor turns it into a structured data source you can automate in four modes:

models — Browse the full model registry. Filter by task (text-generation, image-classification, automatic-speech-recognition and 16 more), author, search query, and sort order (trending, downloads, likes, lastModified, created).
datasets — Discover ML datasets with metadata: size, downloads, tags, likes.
spaces — List deployed ML demos and apps on HF Spaces.
model_details — Deep-dive into specific models by ID. Returns full model cards, pipeline tag, library info, tensor types, and download statistics.

Everything comes back as clean JSON, ready to drop into a vector DB, a dashboard, or a fine-tuning pipeline.

💡 Use Cases

1. AI market research & trend tracking

Track which open-source models are gaining traction week-over-week. Run weekly against sort: "trending" and compare deltas.

{
  "mode": "models",
  "task": "text-generation",
  "sort": "trending",
  "limit": 100
}

2. Competitive monitoring of AI labs

Watch specific organizations — Meta, Google, Mistral, Stability AI, Alibaba, DeepSeek — for new releases.

{
  "mode": "models",
  "author": "meta-llama",
  "sort": "lastModified",
  "limit": 50
}

3. RAG / fine-tuning corpus from model cards

Pull full model cards for a curated list of models and feed them into a vector store as an "AI knowledge assistant."

{
  "mode": "model_details",
  "modelIds": [
    "meta-llama/Llama-3.1-8B",
    "mistralai/Mistral-7B-v0.3",
    "google/gemma-2-9b"
  ]
}

4. ML dataset discovery for training pipelines

Find datasets by task and download volume — great for auto-selecting candidates for fine-tuning or evaluation.

{
  "mode": "datasets",
  "search": "instruction",
  "sort": "downloads",
  "limit": 50
}

📊 Output Example

{
  "id": "meta-llama/Llama-3.1-8B",
  "author": "meta-llama",
  "pipeline_tag": "text-generation",
  "downloads": 4523891,
  "likes": 1253,
  "tags": ["pytorch", "safetensors", "llama", "text-generation", "en"],
  "created": "2024-06-18T00:00:00.000Z",
  "lastModified": "2025-01-15T12:30:00.000Z",
  "library_name": "transformers",
  "modelCard": "Llama 3.1 is a family of large language models...",
  "task": "text-generation"
}

⚙️ Input Parameters

Parameter	Type	Description
`mode`	enum	`models`, `datasets`, `spaces`, or `model_details` (required)
`search`	string	Keyword search — e.g. `"llama"`, `"sentiment"`, `"bert"`
`author`	string	Filter by org/user — `"meta-llama"`, `"google"`, `"mistralai"`, `"openai-community"`
`task`	enum	19 ML tasks: `text-generation`, `image-classification`, `translation`, `summarization`, `fill-mask`, `text-to-image`, `automatic-speech-recognition`, and more
`sort`	enum	`trending`, `downloads`, `likes`, `lastModified`, `created`
`limit`	int	1–1000 (default 50)
`modelIds`	array	For `model_details` mode: `["meta-llama/Llama-3-8B", "google/gemma-7b"]`

📤 Output Fields

Field	Description
`id`	Full model/dataset/space ID (`author/name`)
`author`	Organization or user that published it
`pipeline_tag`	Primary ML task
`downloads`	Total download count
`likes`	Community likes
`tags`	Array of framework, license, language, and architecture tags
`library_name`	Primary library (`transformers`, `diffusers`, `sentence-transformers`, etc.)
`created` / `lastModified`	ISO timestamps for monitoring freshness
`modelCard`	Full README content (in `model_details` mode)

💰 Pricing & Performance

Pay-per-event: $0.005 per item scraped (model, dataset, space, or model detail).
Typical monthly cost: $1.50–$5 for weekly tracking of 100–250 top models.
Speed: ~100 items/minute in list modes, ~30 items/minute in model_details (each call fetches the full model card).
No HF account / token required — uses the public API.

🔌 Integrations

Zapier / Make / n8n — schedule weekly trend scans and push deltas to Slack, Notion, or Airtable.
LangChain / LlamaIndex — feed model_details output straight into a RAG pipeline to build an "AI model advisor."
Vector DBs (Pinecone, Weaviate, Qdrant, pgvector) — embed modelCard content for semantic search over the open-source AI landscape.
Apify SDK / webhooks — run on a schedule and POST new trending entries to your own endpoint.
Google Sheets / BigQuery — export to CSV via Apify's dataset export and build dashboards on top.

❓ FAQ

Do I need a Hugging Face account or token? No. The Actor uses the public HF API — no auth, no rate-limit headaches from token scoping.

How fresh is the data? Real-time. Every run hits the HF API live. Trending rankings, download counts, and new releases appear as soon as HF publishes them.

Can I get the full model card text? Yes — use mode: "model_details" with modelIds. The Actor fetches each model's full README/model card.

What's the difference between downloads and trending? downloads = all-time cumulative. trending = HF's internal momentum signal (recent downloads + likes velocity). Use trending to catch rising stars before they hit top-downloads lists.

Can I filter by license (Apache, MIT, Llama-license)? Not directly in input, but license shows up in the tags array of each result — you can filter client-side.

Why are some model cards empty? A small fraction of models on HF don't ship a README. Those come back with modelCard: "". Everything else is populated.

🔑 Keywords

Hugging Face scraper, AI model database, ML model tracker, open source AI data, LLM directory, Hugging Face API alternative, model cards extraction, AI trending models, Hugging Face datasets scraper, Hugging Face Spaces scraper, transformer models data, AI ecosystem monitoring, ML model comparison, fine-tuning dataset discovery, AI competitive intelligence, RAG over model cards.

📝 Changelog

v1.0 — Initial release. 4 modes (models, datasets, spaces, model_details), 19 task filters, 5 sort options, up to 1000 results per run.

Hugging Face Insights Scraper — Models, Datasets & Spaces

brilliant_gum/huggingface-insights-scraper

Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.

Yuliia Kulakova

Hugging Face Scraper: Trending AI Models, Datasets & Spaces

scrapemint/huggingface-ai-models-scraper

Track trending AI models, datasets, and Spaces on Hugging Face. One row per item with downloads, likes, trending score, tags, pipeline type, and license. Search by keyword, author, or tag. No login, no API key. Pay per row.

Ken M

Hugging Face Scraper - Trending Models, Datasets & Spaces

arjunannamalai/huggingface-trending-scraper

Scrape trending, most-downloaded and most-liked Hugging Face models, datasets and spaces. Filter by author, task or keyword. No token required.

Arjun Annamalai

Huggingface Intelligence Scraper

mattdef/huggingface-intelligence-scraper

Scrape Hugging Face models, datasets, and spaces via public API. Get downloads, likes, trending models, pipeline tags, and more. Perfect for AI market research.

Matthieu Cast

Hugging Face Models Scraper - AI/ML Data

benthepythondev/huggingface-models-scraper

Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.

Ben

Hugging Face Models Scraper

fetch_cat/hugging-face-models-scraper

🤗 Scrape public Hugging Face model metadata, downloads, likes, tags, licenses, and update signals for AI market research.

Hanna Nosova

Hugging Face Hub API

alizarin_refrigerator-owner/hugging-face-hub

Access the Hugging Face Hub API to search & discover models, datasets & spaces. Search Models: Find ML models by name, task or library Search Datasets: Discover datasets for training & evaluation Search Spaces: Explore ML applications Get Metadata: Retrieve detailed repo information

The Howlers

HuggingFace Models Datasets Spaces Scraper - Low-cost💲🔥🤖🤗

delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost

Scrape Hugging Face Models, Datasets & Spaces 🤖📊 with a powerful AI ecosystem scraper. Extract repository names, owners, tags, downloads, likes, update dates, source URLs and more from keyword searches. Ideal for AI research, model discovery, dataset analysis and machine learning intelligence 🚀🌐