Hugging Face Spaces Scraper avatar

Hugging Face Spaces Scraper

Pricing

Pay per event

Go to Apify Store
Hugging Face Spaces Scraper

Hugging Face Spaces Scraper

Scrape Hugging Face Spaces: get space IDs, SDKs, likes, tags, authors, descriptions, and live URLs. Filter by SDK, author, or tag. Sort by likes or trending.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Extract data from Hugging Face Spaces — AI demos, interactive apps, and machine learning tools built with Gradio, Streamlit, Docker, and more. Filter by SDK, author, or tags. Sort by likes, trending score, or recency. No API key required.

What does it do?

This actor scrapes the public Hugging Face Spaces catalog using the official Hugging Face REST API. It extracts metadata about AI demo spaces: who built them, what SDK they use, how many likes they have, their tags, descriptions, and direct URLs — both to the HuggingFace profile page and to the live running space.

You can filter results by:

  • 🔍 Search query — keyword matching across space names and descriptions
  • 🛠 SDK — Gradio, Streamlit, Docker, or Static
  • 👤 Author — spaces from a specific user or organization (e.g. stabilityai, google)
  • 🏷 Tag — any HuggingFace tag (e.g. language:en, task:text-generation, license:mit)
  • 📊 Sort — by likes, trending score, creation date, or last modified date

Who is it for?

🧑‍🔬 AI researchers tracking the ecosystem

Scan the Spaces catalog for specific task types or SDKs. Monitor which new demos are trending in your research area — computer vision, NLP, audio, or multimodal AI.

🏢 Product teams and competitive analysts

Track AI tool releases from competitor organizations. Watch what spaces a specific company (e.g. Meta, Google, Stability AI) is publishing and when they go live.

📊 Data scientists building ML registries

Populate databases of publicly available AI demos. Build search engines, comparison tools, or leaderboards on top of the extracted space metadata.

🤖 Developers building AI-powered pipelines

Feed space metadata into LLM workflows, vector databases, or recommendation systems that surface relevant AI tools to end users.

🎓 Educators and AI course creators

Discover teaching demos and Gradio apps for specific ML tasks. Find spaces with the most likes for a given topic to recommend to students.

Why use it?

  • No API key needed — the Hugging Face public API is open
  • Full metadata — ID, author, SDK, likes, all tags, description, live space URL, dates
  • Flexible filtering — combine search, SDK, author, and tag filters simultaneously
  • Fast and cheap — HTTP-only, no browser, low compute cost
  • Pagination handled — fetch hundreds or thousands of spaces in one run
  • Follows rate limits — exponential backoff and retry logic built in

Data extracted

FieldTypeDescription
idstringFull space ID in owner/space-name format
authorstringAuthor or organization username
spaceNamestringSpace name without owner prefix
sdkstringFramework: gradio, streamlit, docker, or static
likesnumberNumber of likes on this space
tagsarrayAll tags (SDK, region, tasks, languages, etc.)
descriptionstringShort description from the space README card
cardTitlestringDisplay title from the README card
licensestringLicense identifier (e.g. apache-2.0, mit)
createdAtstringISO 8601 creation timestamp
lastModifiedstringISO 8601 last modified timestamp
spaceUrlstringLive running space URL (e.g. https://xxx.hf.space)
urlstringHuggingFace profile page URL
scrapedAtstringISO 8601 timestamp of when data was scraped

How to use it

Step 1 — Open the actor

Go to Hugging Face Spaces Scraper on Apify Store and click Try for free.

Step 2 — Configure your input

Set your filters in the input form:

  1. Search query — enter a keyword (e.g. "text to speech") or leave empty to browse all spaces
  2. SDK — choose a framework or leave as "All SDKs"
  3. Author — enter an organization name to scrape their spaces (e.g. facebook)
  4. Tag — enter a HuggingFace tag filter (e.g. language:en)
  5. Sort by — choose likes, trending, or date
  6. Max results — set how many spaces to extract

Step 3 — Run and export

Click Start. When the run completes, download results as JSON, CSV, or Excel from the Dataset tab.

Input parameters

ParameterTypeDefaultDescription
searchQuerystring""Keyword to search across space names and descriptions
sdkstring""Filter by SDK: gradio, streamlit, docker, static, or empty for all
authorstring""Filter by author/organization username
tagstring""Filter by tag (e.g. license:mit, language:fr)
sortBystringlikesSort by: likes, createdAt, lastModified, trendingScore
maxResultsinteger100Maximum spaces to extract (use 0 for unlimited)
batchSizeinteger100Items per API request (max 100)
maxRetriesinteger3Retries on failed requests

Output example

{
"id": "stabilityai/stable-diffusion",
"author": "stabilityai",
"spaceName": "stable-diffusion",
"sdk": "gradio",
"likes": 8432,
"tags": ["gradio", "region:us", "license:creativeml-openrail-m"],
"private": false,
"description": "Stable Diffusion is a state-of-the-art text-to-image model",
"cardTitle": "Stable Diffusion",
"license": "creativeml-openrail-m",
"createdAt": "2022-08-22T13:00:00.000Z",
"lastModified": "2024-01-15T10:32:00.000Z",
"spaceUrl": "https://stabilityai-stable-diffusion.hf.space",
"url": "https://huggingface.co/spaces/stabilityai/stable-diffusion",
"scrapedAt": "2026-04-28T09:00:00.000Z"
}

Tips and tricks

  • 💡 Trending spaces — use sortBy: trendingScore to discover what the community is excited about right now
  • 💡 Combine filters — all filters are applied simultaneously (AND logic), so sdk=gradio + author=google returns only Google's Gradio spaces
  • 💡 Scrape an org's whole portfolio — set author=huggingface (or any org) and maxResults=0 to get all their public spaces
  • 💡 Tag syntax — HuggingFace tags use colon notation: language:en, license:apache-2.0, task:image-classification
  • 💡 Monitor new releases — sort by createdAt to see the newest spaces first
  • 💡 Find live demos — the spaceUrl field gives you the running app URL you can embed or test directly

How much does it cost to scrape Hugging Face Spaces?

The actor uses pay-per-event (PPE) pricing — you are charged per space extracted, not per run minute.

PlanPrice per space
Free$0.00115
Bronze$0.001
Silver$0.00078
Gold$0.00060
Platinum$0.00040
Diamond$0.00028

Estimate:

  • 100 spaces → ~$0.10 (Bronze)
  • 1,000 spaces → ~$1.00 (Bronze)
  • 10,000 spaces → ~$10 (Bronze) or ~$2.80 (Diamond)

There is also a small one-time start fee of $0.005 per run.

HuggingFace's public Spaces catalog has 500,000+ spaces. A full catalog scrape at Diamond tier costs approximately $140.

Free plan: Apify's free plan includes $5 in monthly credits — enough to scrape ~4,300 spaces at Bronze pricing.

Integrations

Export to Google Sheets

Use the Export to Google Sheets integration in Apify Console to automatically write extracted spaces to a spreadsheet. Perfect for tracking an organization's space portfolio or building a weekly trending report.

Airtable database of AI demos

Connect to Airtable using Apify's built-in webhooks. Each run can append new spaces to a base, enabling you to build a curated database of AI tools with custom fields and views.

Schedule a daily run filtering by sortBy: trendingScore and pipe results to Slack using an Apify webhook → Zapier → Slack flow. Get alerted to new viral demos every morning.

LLM-powered space categorization

Export JSON results and feed to an LLM (Claude, GPT-4) to auto-categorize spaces by use case, assign difficulty ratings, or summarize what each space does — then import back to your own database.

Vector search over space descriptions

Embed the description and cardTitle fields using an embedding model and store in Pinecone or Weaviate. Enable semantic search over the full HuggingFace Spaces catalog.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/huggingface-spaces-scraper').call({
searchQuery: 'text to image',
sdk: 'gradio',
sortBy: 'likes',
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Extracted ${items.length} spaces`);
console.log(items[0]);

Python

from apify_client import ApifyClient
client = ApifyClient(token='YOUR_APIFY_TOKEN')
run = client.actor('automation-lab/huggingface-spaces-scraper').call(run_input={
'searchQuery': 'text to image',
'sdk': 'gradio',
'sortBy': 'likes',
'maxResults': 100,
})
dataset = client.dataset(run['defaultDatasetId']).list_items()
for item in dataset['items']:
print(item['id'], item['likes'])

cURL

# Start a run
curl -X POST "https://api.apify.com/v2/acts/automation-lab~huggingface-spaces-scraper/runs" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"searchQuery":"text to image","sdk":"gradio","maxResults":50}'
# Get results (replace DATASET_ID with run's defaultDatasetId)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?limit=100" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN"

Use with Claude and MCP

You can query this scraper directly from Claude Code, Claude Desktop, Cursor, or VS Code using the Apify MCP server.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-spaces-scraper"

Claude Desktop / Cursor / VS Code

Add to your MCP config:

{
"mcpServers": {
"apify": {
"command": "npx",
"args": ["-y", "@apify/mcp-server"],
"env": {
"APIFY_TOKEN": "YOUR_APIFY_TOKEN",
"ACTORS": "automation-lab/huggingface-spaces-scraper"
}
}
}
}

Example prompts

  • "Find the top 50 most-liked Gradio spaces for image generation"
  • "Get all spaces published by stabilityai and export as CSV"
  • "Show me spaces that were created in the last 7 days, sorted by trending score"
  • "List all Docker-based spaces with more than 1000 likes"

This actor uses the official, publicly documented Hugging Face REST API (https://huggingface.co/api/spaces) — the same API that powers the HuggingFace website itself. There is no scraping of HTML, no bypassing of rate limits, and no collection of private or authenticated data.

Only public spaces are accessible (private spaces require authentication which this actor does not use). The extracted metadata is publicly visible to any visitor on huggingface.co.

Always review HuggingFace's Terms of Service and their API usage policies before using this data commercially.

FAQ

Q: Does this require a HuggingFace API key? A: No. The Spaces API is publicly accessible without authentication. No API key is needed.

Q: Can I scrape private spaces? A: No. This actor only accesses public spaces. Private spaces require authentication which is not supported.

Q: How many spaces can I scrape? A: HuggingFace has 500,000+ public spaces. Set maxResults: 0 for an unlimited run that will fetch all available spaces matching your filters.

Q: Why are some descriptions empty? A: Not all spaces have a short_description in their README card. Many spaces (especially older ones) were created without a card description. The cardTitle field is more consistently populated.

Q: I'm getting fewer results than expected — what's wrong? A: The HuggingFace API may return fewer results when combining strict filters. Try relaxing one filter at a time. Also check that your tag syntax uses HuggingFace's colon format (e.g. license:mit not just mit).

Q: The actor timed out — what should I do? A: Increase the timeout in Advanced Settings, or reduce maxResults to scrape in smaller batches. For very large runs (10,000+ spaces), consider splitting by SDK type and merging results.

Q: How do I get spaces from a specific task category? A: Use the tag filter with a task tag (e.g. tag: task:text-generation). You can find valid tag values by browsing the HuggingFace Spaces UI and checking the filter panel.

Looking for more HuggingFace data? Check out our other automation-lab scrapers: