Hugging Face Collections Scraper avatar

Hugging Face Collections Scraper

Deprecated

Pricing

Pay per usage

Go to Apify Store
Hugging Face Collections Scraper

Hugging Face Collections Scraper

Deprecated

Scrape Hugging Face curated collections of AI models, datasets & spaces. Browse trending, top-voted or filter by organization. No API key required.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Extract curated collections of AI models, datasets, and spaces from Hugging Face β€” trending, top-voted, or filtered by organization. No API key required.

πŸ€” What does it do?

This actor scrapes Hugging Face Collections β€” curated groupings of models, datasets, and spaces organized by researchers, companies, and the community. You can:

  • πŸ”₯ Browse trending or most-upvoted public collections
  • πŸ‘€ Fetch all collections from a specific user or organization (e.g., Google, Meta, Mistral AI)
  • πŸ”— Retrieve specific collections by slug for targeted extraction

For each collection, you get the title, description, owner info, upvotes, theme, all contained items (model/dataset/space IDs, types, authors, likes, downloads), and more.

The actor calls the official HuggingFace public API β€” no login, no API key, no browser automation required.

πŸ‘₯ Who is it for?

πŸ§‘β€πŸ”¬ AI Researchers & Data Scientists

Monitor which model collections are gaining traction in the community. Track when leading labs (Google, Meta, Mistral, Cohere) publish new curated groupings of models. Use collection membership as a signal for quality filtering.

🏒 Enterprise AI Teams

Audit what models competitors have curated. Build automated pipelines to import model metadata from industry-relevant collections into your internal model registry or catalog.

πŸ“Š Market Intelligence Analysts

Track the AI ecosystem by monitoring trending collections over time. Identify which organizations are publishing the most influential model groupings. Measure upvote velocity as a proxy for community interest.

πŸ› οΈ MLOps & Tooling Developers

Build collection-aware model discovery tools. Automatically fetch and sync collection contents into your model management platform. Power recommendation engines with social signals from HF collections.

πŸ’‘ Why use this actor?

  • No authentication required β€” Hugging Face Collections API is fully public
  • Fast & cheap β€” pure HTTP, no browser overhead, near-zero compute cost
  • Pagination handled β€” automatically follows cursor-based pagination to fetch any number of collections
  • Multiple modes β€” browse globally, filter by owner, or fetch specific slugs
  • Structured output β€” each item in a collection includes type, author, likes, downloads, pipeline tag

πŸ“¦ Data extracted

FieldDescription
slugUnique collection identifier (e.g., google/gemma-2-...)
titleCollection display title
descriptionCollection description
ownerNameHuggingFace username or org
ownerTypeuser or org
ownerUrlLink to owner's HF profile
collectionUrlFull URL to the collection
upvotesNumber of upvotes
themeCollection theme color
privateWhether private
gatingWhether gated access
lastUpdatedISO timestamp of last update
itemCountNumber of items in collection
itemTypesComma-separated item types (model, dataset, space)
itemsArray of collection items with id, type, author, likes, downloads, pipelineTag
scrapedAtExtraction timestamp

πŸ’° How much does it cost to scrape Hugging Face collections?

This actor uses Pay-Per-Event (PPE) pricing β€” you pay only for what you extract.

TierStart feePer collection
FREE$0.001$0.00115
BRONZE$0.001$0.001
SILVER$0.001$0.00078
GOLD$0.001$0.0006
PLATINUM$0.001$0.0004
DIAMOND$0.001$0.00028

Example costs (BRONZE tier):

  • 20 trending collections β†’ ~$0.021
  • 100 collections from Meta β†’ ~$0.101
  • 500 top collections β†’ ~$0.501

With a free Apify account (up to $5 free compute/month), you can extract approximately 4,000+ collections per month at no cost.

πŸš€ How to use

Step 1 β€” Choose your mode

Select from three scraping modes:

  • Browse β€” gets trending or most-voted public collections
  • Owner β€” fetches all collections by a specific user or organization
  • Slugs β€” retrieves exact collections you specify

Step 2 β€” Configure limits

Set maxCollections to control how many collections to extract. Default is 100.

Step 3 β€” Run and export

Click Start and wait for results. Export to JSON, CSV, or connect to downstream workflows.

βš™οΈ Input parameters

ParameterTypeDescriptionDefault
modestringbrowse, owner, or slugsbrowse
sortstringtrending or upvotes (browse mode only)trending
ownerstringUsername or org name (owner mode only)β€”
collectionSlugsarrayList of collection slugs (slugs mode only)[]
maxCollectionsintegerMax collections to extract100
includeItemsbooleanInclude item details in outputtrue
maxRequestRetriesintegerRetry attempts per failed request3

Example inputs

Browse trending collections:

{
"mode": "browse",
"sort": "trending",
"maxCollections": 50
}

Collections from a specific organization:

{
"mode": "owner",
"owner": "google",
"maxCollections": 100
}

Fetch specific collections:

{
"mode": "slugs",
"collectionSlugs": [
"google/gemma-2-665d5624d9e0312f5dfb1a1a",
"meta-llama/llama-3-1-669233f0b30c5aa8b7b40b52"
]
}

πŸ“€ Output example

{
"slug": "google/gemma-2-665d5624d9e0312f5dfb1a1a",
"title": "Gemma 2",
"description": "Google's Gemma 2 open models collection.",
"ownerName": "google",
"ownerType": "org",
"ownerUrl": "https://huggingface.co/google",
"collectionUrl": "https://huggingface.co/collections/google/gemma-2-665d5624d9e0312f5dfb1a1a",
"upvotes": 1245,
"theme": "blue",
"private": false,
"gating": false,
"lastUpdated": "2025-12-01T10:00:00.000Z",
"itemCount": 5,
"itemTypes": "model",
"items": [
{
"id": "google/gemma-2-2b",
"type": "model",
"author": "google",
"position": 0,
"likes": 1892,
"downloads": 554321,
"pipelineTag": "text-generation",
"lastModified": "2025-11-20T08:00:00.000Z"
}
],
"scrapedAt": "2026-05-04T12:00:00.000Z"
}

πŸ’‘ Tips & tricks

  • Use sort: "upvotes" for quality signals β€” collections with many upvotes tend to contain high-quality, vetted models
  • Owner mode is ideal for competitive intelligence β€” fetch all collections from google, meta-llama, mistralai, cohere regularly
  • Disable includeItems for fast metadata-only runs β€” useful when you just need collection counts and upvote rankings
  • Slugs mode for targeted monitoring β€” watch specific high-value collections (e.g., official Llama 3 collection) for new additions
  • Combine with HuggingFace Models Scraper β€” use collection item IDs as seeds to fetch full model details

πŸ”Œ Integrations

Google Sheets β€” AI model tracking dashboard

Use the Apify Google Sheets integration to append trending collection data weekly. Build a dashboard tracking which organizations are publishing new model groupings and their upvote velocity.

Chain this actor with Slack integration to send a weekly digest of top trending collections. Set maxCollections: 10 and sort: trending as the alert input.

Model catalog enrichment pipeline

Run this actor nightly for a curated list of organization slugs. Feed the output into your internal model registry to automatically tag models that appear in official company collections.

Make / Zapier automation

Use collection membership changes to trigger downstream workflows β€” e.g., automatically download or evaluate newly added models when a watched collection is updated.

πŸ–₯️ API usage

Node.js (Apify SDK)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/huggingface-collections-scraper').call({
mode: 'browse',
sort: 'trending',
maxCollections: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient(token="YOUR_API_TOKEN")
run = client.actor("automation-lab/huggingface-collections-scraper").call(run_input={
"mode": "owner",
"owner": "google",
"maxCollections": 100,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["upvotes"])

cURL

curl -X POST \
"https://api.apify.com/v2/acts/automation-lab~huggingface-collections-scraper/runs" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"mode": "browse",
"sort": "upvotes",
"maxCollections": 100
}'

πŸ€– MCP (Claude, Cursor, VS Code)

Use this actor directly from AI assistants via the Apify MCP server.

Claude Code:

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-collections-scraper"

Claude Desktop / Cursor / VS Code β€” add to your MCP config:

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/huggingface-collections-scraper",
"headers": {
"Authorization": "Bearer YOUR_API_TOKEN"
}
}
}
}

Example prompts:

  • "Fetch the top 20 trending Hugging Face collections and list them by upvotes"
  • "Get all collections published by google on HuggingFace"
  • "Scrape the Llama 3 collection from meta-llama and show me the models it contains"

βš–οΈ Legality & terms of service

This actor uses Hugging Face's public API (huggingface.co/api/collections), which is freely accessible without authentication and intended for programmatic access. All data returned is publicly visible on the HuggingFace website.

  • Only public collections are accessible
  • Private or gated collections cannot be accessed
  • No login credentials are used or stored
  • Respects the public API rate limits
  • Use responsibly in accordance with Hugging Face Terms of Service

❓ FAQ

Q: Do I need a Hugging Face API key? No. The Collections API is fully public and requires no authentication.

Q: Can I scrape private collections? No. The public API only returns public collections. Private collections require HF authentication which this actor does not support.

Q: How many collections can I extract? Theoretically unlimited β€” the actor paginates through all available results. In practice, HuggingFace has tens of thousands of public collections.

Q: Why am I getting fewer results than expected? Some owners may have no public collections, or the keyword may match fewer collections than your maxCollections limit. Check the actor logs for details.

Q: The actor ran but returned 0 results β€” what happened? For owner mode, verify the username/org name is correct (case-sensitive, e.g., google not Google). For slugs mode, verify the full slug including the owner prefix (e.g., google/gemma-2-abc123).

Q: Can I get the full model details for items in a collection? This actor returns the core item metadata (ID, type, author, likes, downloads). For full model cards and metadata, use the Hugging Face Models Scraper with the model IDs as input.