Hugging Face Spaces Scraper
Pricing
Pay per event
Hugging Face Spaces Scraper
Scrape Hugging Face Spaces: get space IDs, SDKs, likes, tags, authors, descriptions, and live URLs. Filter by SDK, author, or tag. Sort by likes or trending.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract data from Hugging Face Spaces — AI demos, interactive apps, and machine learning tools built with Gradio, Streamlit, Docker, and more. Filter by SDK, author, or tags. Sort by likes, trending score, or recency. No API key required.
What does it do?
This actor scrapes the public Hugging Face Spaces catalog using the official Hugging Face REST API. It extracts metadata about AI demo spaces: who built them, what SDK they use, how many likes they have, their tags, descriptions, and direct URLs — both to the HuggingFace profile page and to the live running space.
You can filter results by:
- 🔍 Search query — keyword matching across space names and descriptions
- 🛠 SDK — Gradio, Streamlit, Docker, or Static
- 👤 Author — spaces from a specific user or organization (e.g.
stabilityai,google) - 🏷 Tag — any HuggingFace tag (e.g.
language:en,task:text-generation,license:mit) - 📊 Sort — by likes, trending score, creation date, or last modified date
Who is it for?
🧑🔬 AI researchers tracking the ecosystem
Scan the Spaces catalog for specific task types or SDKs. Monitor which new demos are trending in your research area — computer vision, NLP, audio, or multimodal AI.
🏢 Product teams and competitive analysts
Track AI tool releases from competitor organizations. Watch what spaces a specific company (e.g. Meta, Google, Stability AI) is publishing and when they go live.
📊 Data scientists building ML registries
Populate databases of publicly available AI demos. Build search engines, comparison tools, or leaderboards on top of the extracted space metadata.
🤖 Developers building AI-powered pipelines
Feed space metadata into LLM workflows, vector databases, or recommendation systems that surface relevant AI tools to end users.
🎓 Educators and AI course creators
Discover teaching demos and Gradio apps for specific ML tasks. Find spaces with the most likes for a given topic to recommend to students.
Why use it?
- ✅ No API key needed — the Hugging Face public API is open
- ✅ Full metadata — ID, author, SDK, likes, all tags, description, live space URL, dates
- ✅ Flexible filtering — combine search, SDK, author, and tag filters simultaneously
- ✅ Fast and cheap — HTTP-only, no browser, low compute cost
- ✅ Pagination handled — fetch hundreds or thousands of spaces in one run
- ✅ Follows rate limits — exponential backoff and retry logic built in
Data extracted
| Field | Type | Description |
|---|---|---|
id | string | Full space ID in owner/space-name format |
author | string | Author or organization username |
spaceName | string | Space name without owner prefix |
sdk | string | Framework: gradio, streamlit, docker, or static |
likes | number | Number of likes on this space |
tags | array | All tags (SDK, region, tasks, languages, etc.) |
description | string | Short description from the space README card |
cardTitle | string | Display title from the README card |
license | string | License identifier (e.g. apache-2.0, mit) |
createdAt | string | ISO 8601 creation timestamp |
lastModified | string | ISO 8601 last modified timestamp |
spaceUrl | string | Live running space URL (e.g. https://xxx.hf.space) |
url | string | HuggingFace profile page URL |
scrapedAt | string | ISO 8601 timestamp of when data was scraped |
How to use it
Step 1 — Open the actor
Go to Hugging Face Spaces Scraper on Apify Store and click Try for free.
Step 2 — Configure your input
Set your filters in the input form:
- Search query — enter a keyword (e.g. "text to speech") or leave empty to browse all spaces
- SDK — choose a framework or leave as "All SDKs"
- Author — enter an organization name to scrape their spaces (e.g.
facebook) - Tag — enter a HuggingFace tag filter (e.g.
language:en) - Sort by — choose likes, trending, or date
- Max results — set how many spaces to extract
Step 3 — Run and export
Click Start. When the run completes, download results as JSON, CSV, or Excel from the Dataset tab.
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQuery | string | "" | Keyword to search across space names and descriptions |
sdk | string | "" | Filter by SDK: gradio, streamlit, docker, static, or empty for all |
author | string | "" | Filter by author/organization username |
tag | string | "" | Filter by tag (e.g. license:mit, language:fr) |
sortBy | string | likes | Sort by: likes, createdAt, lastModified, trendingScore |
maxResults | integer | 100 | Maximum spaces to extract (use 0 for unlimited) |
batchSize | integer | 100 | Items per API request (max 100) |
maxRetries | integer | 3 | Retries on failed requests |
Output example
{"id": "stabilityai/stable-diffusion","author": "stabilityai","spaceName": "stable-diffusion","sdk": "gradio","likes": 8432,"tags": ["gradio", "region:us", "license:creativeml-openrail-m"],"private": false,"description": "Stable Diffusion is a state-of-the-art text-to-image model","cardTitle": "Stable Diffusion","license": "creativeml-openrail-m","createdAt": "2022-08-22T13:00:00.000Z","lastModified": "2024-01-15T10:32:00.000Z","spaceUrl": "https://stabilityai-stable-diffusion.hf.space","url": "https://huggingface.co/spaces/stabilityai/stable-diffusion","scrapedAt": "2026-04-28T09:00:00.000Z"}
Tips and tricks
- 💡 Trending spaces — use
sortBy: trendingScoreto discover what the community is excited about right now - 💡 Combine filters — all filters are applied simultaneously (AND logic), so
sdk=gradio+author=googlereturns only Google's Gradio spaces - 💡 Scrape an org's whole portfolio — set
author=huggingface(or any org) andmaxResults=0to get all their public spaces - 💡 Tag syntax — HuggingFace tags use colon notation:
language:en,license:apache-2.0,task:image-classification - 💡 Monitor new releases — sort by
createdAtto see the newest spaces first - 💡 Find live demos — the
spaceUrlfield gives you the running app URL you can embed or test directly
How much does it cost to scrape Hugging Face Spaces?
The actor uses pay-per-event (PPE) pricing — you are charged per space extracted, not per run minute.
| Plan | Price per space |
|---|---|
| Free | $0.00115 |
| Bronze | $0.001 |
| Silver | $0.00078 |
| Gold | $0.00060 |
| Platinum | $0.00040 |
| Diamond | $0.00028 |
Estimate:
- 100 spaces → ~$0.10 (Bronze)
- 1,000 spaces → ~$1.00 (Bronze)
- 10,000 spaces → ~$10 (Bronze) or ~$2.80 (Diamond)
There is also a small one-time start fee of $0.005 per run.
HuggingFace's public Spaces catalog has 500,000+ spaces. A full catalog scrape at Diamond tier costs approximately $140.
Free plan: Apify's free plan includes $5 in monthly credits — enough to scrape ~4,300 spaces at Bronze pricing.
Integrations
Export to Google Sheets
Use the Export to Google Sheets integration in Apify Console to automatically write extracted spaces to a spreadsheet. Perfect for tracking an organization's space portfolio or building a weekly trending report.
Airtable database of AI demos
Connect to Airtable using Apify's built-in webhooks. Each run can append new spaces to a base, enabling you to build a curated database of AI tools with custom fields and views.
Slack notifications on new trending spaces
Schedule a daily run filtering by sortBy: trendingScore and pipe results to Slack using an Apify webhook → Zapier → Slack flow. Get alerted to new viral demos every morning.
LLM-powered space categorization
Export JSON results and feed to an LLM (Claude, GPT-4) to auto-categorize spaces by use case, assign difficulty ratings, or summarize what each space does — then import back to your own database.
Vector search over space descriptions
Embed the description and cardTitle fields using an embedding model and store in Pinecone or Weaviate. Enable semantic search over the full HuggingFace Spaces catalog.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('automation-lab/huggingface-spaces-scraper').call({searchQuery: 'text to image',sdk: 'gradio',sortBy: 'likes',maxResults: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Extracted ${items.length} spaces`);console.log(items[0]);
Python
from apify_client import ApifyClientclient = ApifyClient(token='YOUR_APIFY_TOKEN')run = client.actor('automation-lab/huggingface-spaces-scraper').call(run_input={'searchQuery': 'text to image','sdk': 'gradio','sortBy': 'likes','maxResults': 100,})dataset = client.dataset(run['defaultDatasetId']).list_items()for item in dataset['items']:print(item['id'], item['likes'])
cURL
# Start a runcurl -X POST "https://api.apify.com/v2/acts/automation-lab~huggingface-spaces-scraper/runs" \-H "Authorization: Bearer YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQuery":"text to image","sdk":"gradio","maxResults":50}'# Get results (replace DATASET_ID with run's defaultDatasetId)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?limit=100" \-H "Authorization: Bearer YOUR_APIFY_TOKEN"
Use with Claude and MCP
You can query this scraper directly from Claude Code, Claude Desktop, Cursor, or VS Code using the Apify MCP server.
Claude Code (terminal)
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-spaces-scraper"
Claude Desktop / Cursor / VS Code
Add to your MCP config:
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/mcp-server"],"env": {"APIFY_TOKEN": "YOUR_APIFY_TOKEN","ACTORS": "automation-lab/huggingface-spaces-scraper"}}}}
Example prompts
- "Find the top 50 most-liked Gradio spaces for image generation"
- "Get all spaces published by stabilityai and export as CSV"
- "Show me spaces that were created in the last 7 days, sorted by trending score"
- "List all Docker-based spaces with more than 1000 likes"
Legality — Is it legal to scrape Hugging Face Spaces?
This actor uses the official, publicly documented Hugging Face REST API (https://huggingface.co/api/spaces) — the same API that powers the HuggingFace website itself. There is no scraping of HTML, no bypassing of rate limits, and no collection of private or authenticated data.
Only public spaces are accessible (private spaces require authentication which this actor does not use). The extracted metadata is publicly visible to any visitor on huggingface.co.
Always review HuggingFace's Terms of Service and their API usage policies before using this data commercially.
FAQ
Q: Does this require a HuggingFace API key? A: No. The Spaces API is publicly accessible without authentication. No API key is needed.
Q: Can I scrape private spaces? A: No. This actor only accesses public spaces. Private spaces require authentication which is not supported.
Q: How many spaces can I scrape?
A: HuggingFace has 500,000+ public spaces. Set maxResults: 0 for an unlimited run that will fetch all available spaces matching your filters.
Q: Why are some descriptions empty?
A: Not all spaces have a short_description in their README card. Many spaces (especially older ones) were created without a card description. The cardTitle field is more consistently populated.
Q: I'm getting fewer results than expected — what's wrong?
A: The HuggingFace API may return fewer results when combining strict filters. Try relaxing one filter at a time. Also check that your tag syntax uses HuggingFace's colon format (e.g. license:mit not just mit).
Q: The actor timed out — what should I do?
A: Increase the timeout in Advanced Settings, or reduce maxResults to scrape in smaller batches. For very large runs (10,000+ spaces), consider splitting by SDK type and merging results.
Q: How do I get spaces from a specific task category?
A: Use the tag filter with a task tag (e.g. tag: task:text-generation). You can find valid tag values by browsing the HuggingFace Spaces UI and checking the filter panel.
Related scrapers
Looking for more HuggingFace data? Check out our other automation-lab scrapers:
- 🤖 Hugging Face Scraper — models, datasets, spaces, and papers in one actor
- 📄 Hugging Face Papers Scraper — research papers with abstracts, authors, and upvotes
- 📦 Hugging Face Datasets Scraper — ML datasets with downloads, licenses, and metadata