Hugging Face Collections Scraper
DeprecatedPricing
Pay per usage
Hugging Face Collections Scraper
DeprecatedScrape Hugging Face curated collections of AI models, datasets & spaces. Browse trending, top-voted or filter by organization. No API key required.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Extract curated collections of AI models, datasets, and spaces from Hugging Face β trending, top-voted, or filtered by organization. No API key required.
π€ What does it do?
This actor scrapes Hugging Face Collections β curated groupings of models, datasets, and spaces organized by researchers, companies, and the community. You can:
- π₯ Browse trending or most-upvoted public collections
- π€ Fetch all collections from a specific user or organization (e.g., Google, Meta, Mistral AI)
- π Retrieve specific collections by slug for targeted extraction
For each collection, you get the title, description, owner info, upvotes, theme, all contained items (model/dataset/space IDs, types, authors, likes, downloads), and more.
The actor calls the official HuggingFace public API β no login, no API key, no browser automation required.
π₯ Who is it for?
π§βπ¬ AI Researchers & Data Scientists
Monitor which model collections are gaining traction in the community. Track when leading labs (Google, Meta, Mistral, Cohere) publish new curated groupings of models. Use collection membership as a signal for quality filtering.
π’ Enterprise AI Teams
Audit what models competitors have curated. Build automated pipelines to import model metadata from industry-relevant collections into your internal model registry or catalog.
π Market Intelligence Analysts
Track the AI ecosystem by monitoring trending collections over time. Identify which organizations are publishing the most influential model groupings. Measure upvote velocity as a proxy for community interest.
π οΈ MLOps & Tooling Developers
Build collection-aware model discovery tools. Automatically fetch and sync collection contents into your model management platform. Power recommendation engines with social signals from HF collections.
π‘ Why use this actor?
- No authentication required β Hugging Face Collections API is fully public
- Fast & cheap β pure HTTP, no browser overhead, near-zero compute cost
- Pagination handled β automatically follows cursor-based pagination to fetch any number of collections
- Multiple modes β browse globally, filter by owner, or fetch specific slugs
- Structured output β each item in a collection includes type, author, likes, downloads, pipeline tag
π¦ Data extracted
| Field | Description |
|---|---|
slug | Unique collection identifier (e.g., google/gemma-2-...) |
title | Collection display title |
description | Collection description |
ownerName | HuggingFace username or org |
ownerType | user or org |
ownerUrl | Link to owner's HF profile |
collectionUrl | Full URL to the collection |
upvotes | Number of upvotes |
theme | Collection theme color |
private | Whether private |
gating | Whether gated access |
lastUpdated | ISO timestamp of last update |
itemCount | Number of items in collection |
itemTypes | Comma-separated item types (model, dataset, space) |
items | Array of collection items with id, type, author, likes, downloads, pipelineTag |
scrapedAt | Extraction timestamp |
π° How much does it cost to scrape Hugging Face collections?
This actor uses Pay-Per-Event (PPE) pricing β you pay only for what you extract.
| Tier | Start fee | Per collection |
|---|---|---|
| FREE | $0.001 | $0.00115 |
| BRONZE | $0.001 | $0.001 |
| SILVER | $0.001 | $0.00078 |
| GOLD | $0.001 | $0.0006 |
| PLATINUM | $0.001 | $0.0004 |
| DIAMOND | $0.001 | $0.00028 |
Example costs (BRONZE tier):
- 20 trending collections β ~$0.021
- 100 collections from Meta β ~$0.101
- 500 top collections β ~$0.501
With a free Apify account (up to $5 free compute/month), you can extract approximately 4,000+ collections per month at no cost.
π How to use
Step 1 β Choose your mode
Select from three scraping modes:
- Browse β gets trending or most-voted public collections
- Owner β fetches all collections by a specific user or organization
- Slugs β retrieves exact collections you specify
Step 2 β Configure limits
Set maxCollections to control how many collections to extract. Default is 100.
Step 3 β Run and export
Click Start and wait for results. Export to JSON, CSV, or connect to downstream workflows.
βοΈ Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
mode | string | browse, owner, or slugs | browse |
sort | string | trending or upvotes (browse mode only) | trending |
owner | string | Username or org name (owner mode only) | β |
collectionSlugs | array | List of collection slugs (slugs mode only) | [] |
maxCollections | integer | Max collections to extract | 100 |
includeItems | boolean | Include item details in output | true |
maxRequestRetries | integer | Retry attempts per failed request | 3 |
Example inputs
Browse trending collections:
{"mode": "browse","sort": "trending","maxCollections": 50}
Collections from a specific organization:
{"mode": "owner","owner": "google","maxCollections": 100}
Fetch specific collections:
{"mode": "slugs","collectionSlugs": ["google/gemma-2-665d5624d9e0312f5dfb1a1a","meta-llama/llama-3-1-669233f0b30c5aa8b7b40b52"]}
π€ Output example
{"slug": "google/gemma-2-665d5624d9e0312f5dfb1a1a","title": "Gemma 2","description": "Google's Gemma 2 open models collection.","ownerName": "google","ownerType": "org","ownerUrl": "https://huggingface.co/google","collectionUrl": "https://huggingface.co/collections/google/gemma-2-665d5624d9e0312f5dfb1a1a","upvotes": 1245,"theme": "blue","private": false,"gating": false,"lastUpdated": "2025-12-01T10:00:00.000Z","itemCount": 5,"itemTypes": "model","items": [{"id": "google/gemma-2-2b","type": "model","author": "google","position": 0,"likes": 1892,"downloads": 554321,"pipelineTag": "text-generation","lastModified": "2025-11-20T08:00:00.000Z"}],"scrapedAt": "2026-05-04T12:00:00.000Z"}
π‘ Tips & tricks
- Use
sort: "upvotes"for quality signals β collections with many upvotes tend to contain high-quality, vetted models - Owner mode is ideal for competitive intelligence β fetch all collections from
google,meta-llama,mistralai,cohereregularly - Disable
includeItemsfor fast metadata-only runs β useful when you just need collection counts and upvote rankings - Slugs mode for targeted monitoring β watch specific high-value collections (e.g., official Llama 3 collection) for new additions
- Combine with HuggingFace Models Scraper β use collection item IDs as seeds to fetch full model details
π Integrations
Google Sheets β AI model tracking dashboard
Use the Apify Google Sheets integration to append trending collection data weekly. Build a dashboard tracking which organizations are publishing new model groupings and their upvote velocity.
Slack alerts on new trending collections
Chain this actor with Slack integration to send a weekly digest of top trending collections. Set maxCollections: 10 and sort: trending as the alert input.
Model catalog enrichment pipeline
Run this actor nightly for a curated list of organization slugs. Feed the output into your internal model registry to automatically tag models that appear in official company collections.
Make / Zapier automation
Use collection membership changes to trigger downstream workflows β e.g., automatically download or evaluate newly added models when a watched collection is updated.
π₯οΈ API usage
Node.js (Apify SDK)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/huggingface-collections-scraper').call({mode: 'browse',sort: 'trending',maxCollections: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient(token="YOUR_API_TOKEN")run = client.actor("automation-lab/huggingface-collections-scraper").call(run_input={"mode": "owner","owner": "google","maxCollections": 100,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["title"], item["upvotes"])
cURL
curl -X POST \"https://api.apify.com/v2/acts/automation-lab~huggingface-collections-scraper/runs" \-H "Authorization: Bearer YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"mode": "browse","sort": "upvotes","maxCollections": 100}'
π€ MCP (Claude, Cursor, VS Code)
Use this actor directly from AI assistants via the Apify MCP server.
Claude Code:
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-collections-scraper"
Claude Desktop / Cursor / VS Code β add to your MCP config:
{"mcpServers": {"apify": {"type": "http","url": "https://mcp.apify.com?tools=automation-lab/huggingface-collections-scraper","headers": {"Authorization": "Bearer YOUR_API_TOKEN"}}}}
Example prompts:
- "Fetch the top 20 trending Hugging Face collections and list them by upvotes"
- "Get all collections published by google on HuggingFace"
- "Scrape the Llama 3 collection from meta-llama and show me the models it contains"
βοΈ Legality & terms of service
This actor uses Hugging Face's public API (huggingface.co/api/collections), which is freely accessible without authentication and intended for programmatic access. All data returned is publicly visible on the HuggingFace website.
- Only public collections are accessible
- Private or gated collections cannot be accessed
- No login credentials are used or stored
- Respects the public API rate limits
- Use responsibly in accordance with Hugging Face Terms of Service
β FAQ
Q: Do I need a Hugging Face API key? No. The Collections API is fully public and requires no authentication.
Q: Can I scrape private collections? No. The public API only returns public collections. Private collections require HF authentication which this actor does not support.
Q: How many collections can I extract? Theoretically unlimited β the actor paginates through all available results. In practice, HuggingFace has tens of thousands of public collections.
Q: Why am I getting fewer results than expected?
Some owners may have no public collections, or the keyword may match fewer collections than your maxCollections limit. Check the actor logs for details.
Q: The actor ran but returned 0 results β what happened?
For owner mode, verify the username/org name is correct (case-sensitive, e.g., google not Google). For slugs mode, verify the full slug including the owner prefix (e.g., google/gemma-2-abc123).
Q: Can I get the full model details for items in a collection? This actor returns the core item metadata (ID, type, author, likes, downloads). For full model cards and metadata, use the Hugging Face Models Scraper with the model IDs as input.
π Related scrapers
- Hugging Face Scraper β scrape model cards, parameters, and full metadata
- Hugging Face Datasets Scraper β extract dataset metadata and download links
- Hugging Face Papers Scraper β scrape ML research papers with AI summaries
- Hugging Face Spaces Scraper β discover and extract AI demo spaces