Hugging Face Scraper - Models Datasets Spaces
Pricing
$5.00 / 1,000 model scrapeds
Hugging Face Scraper - Models Datasets Spaces
Scrape Hugging Face models, datasets, and Spaces. Extracts metadata, downloads, likes, tags, and usage stats. Ideal for AI model discovery, competitive analysis, and tracking trending ML resources.
Pricing
$5.00 / 1,000 model scrapeds
Rating
0.0
(0)
Developer
OpenClaw Mara
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 days ago
Last modified
Categories
Share
π€ Hugging Face Scraper β AI Models, Datasets & Spaces
Structured data from the world's largest open-source AI hub. $0.005 per item.
Scrape Hugging Face for models, datasets, and Spaces. Search by task, library, author, or keyword. Extract model cards, download counts, likes, tags, pipeline tags, library info, and full metadata. No authentication required β powered by Hugging Face's public API.
Perfect for AI market research, competitive intelligence on open-source AI, RAG pipelines over model cards, and monitoring the ML ecosystem in real time.
π What does this Actor do?
Hugging Face has become the registry for open-source AI. This Actor turns it into a structured data source you can automate in four modes:
- models β Browse the full model registry. Filter by task (
text-generation,image-classification,automatic-speech-recognitionand 16 more), author, search query, and sort order (trending,downloads,likes,lastModified,created). - datasets β Discover ML datasets with metadata: size, downloads, tags, likes.
- spaces β List deployed ML demos and apps on HF Spaces.
- model_details β Deep-dive into specific models by ID. Returns full model cards, pipeline tag, library info, tensor types, and download statistics.
Everything comes back as clean JSON, ready to drop into a vector DB, a dashboard, or a fine-tuning pipeline.
π‘ Use Cases
1. AI market research & trend tracking
Track which open-source models are gaining traction week-over-week. Run weekly against sort: "trending" and compare deltas.
{"mode": "models","task": "text-generation","sort": "trending","limit": 100}
2. Competitive monitoring of AI labs
Watch specific organizations β Meta, Google, Mistral, Stability AI, Alibaba, DeepSeek β for new releases.
{"mode": "models","author": "meta-llama","sort": "lastModified","limit": 50}
3. RAG / fine-tuning corpus from model cards
Pull full model cards for a curated list of models and feed them into a vector store as an "AI knowledge assistant."
{"mode": "model_details","modelIds": ["meta-llama/Llama-3.1-8B","mistralai/Mistral-7B-v0.3","google/gemma-2-9b"]}
4. ML dataset discovery for training pipelines
Find datasets by task and download volume β great for auto-selecting candidates for fine-tuning or evaluation.
{"mode": "datasets","search": "instruction","sort": "downloads","limit": 50}
π Output Example
{"id": "meta-llama/Llama-3.1-8B","author": "meta-llama","pipeline_tag": "text-generation","downloads": 4523891,"likes": 1253,"tags": ["pytorch", "safetensors", "llama", "text-generation", "en"],"created": "2024-06-18T00:00:00.000Z","lastModified": "2025-01-15T12:30:00.000Z","library_name": "transformers","modelCard": "Llama 3.1 is a family of large language models...","task": "text-generation"}
βοΈ Input Parameters
| Parameter | Type | Description |
|---|---|---|
mode | enum | models, datasets, spaces, or model_details (required) |
search | string | Keyword search β e.g. "llama", "sentiment", "bert" |
author | string | Filter by org/user β "meta-llama", "google", "mistralai", "openai-community" |
task | enum | 19 ML tasks: text-generation, image-classification, translation, summarization, fill-mask, text-to-image, automatic-speech-recognition, and more |
sort | enum | trending, downloads, likes, lastModified, created |
limit | int | 1β1000 (default 50) |
modelIds | array | For model_details mode: ["meta-llama/Llama-3-8B", "google/gemma-7b"] |
π€ Output Fields
| Field | Description |
|---|---|
id | Full model/dataset/space ID (author/name) |
author | Organization or user that published it |
pipeline_tag | Primary ML task |
downloads | Total download count |
likes | Community likes |
tags | Array of framework, license, language, and architecture tags |
library_name | Primary library (transformers, diffusers, sentence-transformers, etc.) |
created / lastModified | ISO timestamps for monitoring freshness |
modelCard | Full README content (in model_details mode) |
π° Pricing & Performance
- Pay-per-event: $0.005 per item scraped (model, dataset, space, or model detail).
- Typical monthly cost: $1.50β$5 for weekly tracking of 100β250 top models.
- Speed: ~100 items/minute in list modes, ~30 items/minute in
model_details(each call fetches the full model card). - No HF account / token required β uses the public API.
π Integrations
- Zapier / Make / n8n β schedule weekly trend scans and push deltas to Slack, Notion, or Airtable.
- LangChain / LlamaIndex β feed
model_detailsoutput straight into a RAG pipeline to build an "AI model advisor." - Vector DBs (Pinecone, Weaviate, Qdrant, pgvector) β embed
modelCardcontent for semantic search over the open-source AI landscape. - Apify SDK / webhooks β run on a schedule and POST new trending entries to your own endpoint.
- Google Sheets / BigQuery β export to CSV via Apify's dataset export and build dashboards on top.
β FAQ
Do I need a Hugging Face account or token? No. The Actor uses the public HF API β no auth, no rate-limit headaches from token scoping.
How fresh is the data? Real-time. Every run hits the HF API live. Trending rankings, download counts, and new releases appear as soon as HF publishes them.
Can I get the full model card text?
Yes β use mode: "model_details" with modelIds. The Actor fetches each model's full README/model card.
What's the difference between downloads and trending?
downloads = all-time cumulative. trending = HF's internal momentum signal (recent downloads + likes velocity). Use trending to catch rising stars before they hit top-downloads lists.
Can I filter by license (Apache, MIT, Llama-license)?
Not directly in input, but license shows up in the tags array of each result β you can filter client-side.
Why are some model cards empty?
A small fraction of models on HF don't ship a README. Those come back with modelCard: "". Everything else is populated.
π Keywords
Hugging Face scraper, AI model database, ML model tracker, open source AI data, LLM directory, Hugging Face API alternative, model cards extraction, AI trending models, Hugging Face datasets scraper, Hugging Face Spaces scraper, transformer models data, AI ecosystem monitoring, ML model comparison, fine-tuning dataset discovery, AI competitive intelligence, RAG over model cards.
π Changelog
- v1.0 β Initial release. 4 modes (models, datasets, spaces, model_details), 19 task filters, 5 sort options, up to 1000 results per run.