Hugging Face Insights Scraper — Models, Datasets & Spaces avatar

Hugging Face Insights Scraper — Models, Datasets & Spaces

Pricing

from $0.005 / model scraped

Go to Apify Store
Hugging Face Insights Scraper — Models, Datasets & Spaces

Hugging Face Insights Scraper — Models, Datasets & Spaces

Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.

Pricing

from $0.005 / model scraped

Rating

0.0

(0)

Developer

Yuliia Kulakova

Yuliia Kulakova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Hugging Face Insights Scraper

Scrape AI models, datasets, Spaces, and daily research papers from Hugging Face — with downloads, likes, parameters, growth tracking, and smart filters.

Hugging Face Insights Scraper


Why this scraper

Hugging Face is where AI happens — 1M+ models, 300K+ datasets, trending research papers every day. But the site gives you a search bar and infinite scroll. No way to bulk-export, no way to compare models by parameter count, no way to track which models are gaining traction this week vs. last.

This scraper turns Hugging Face into a structured intelligence feed. Filter by pipeline task, ML library, author, or keyword. Get model sizes, architecture details, and popularity analytics. Track download and like growth between scheduled runs. Export to CSV, JSON, or pipe directly into your dashboard.


What you get

Models — the full picture

  • Name, author, downloads, likes, pipeline task, ML library
  • Parameter count and size tier (tiny / small / medium / large / xlarge / massive)
  • Architecture details (LlamaForCausalLM, MistralForCausalLM, etc.)
  • License, language tags, base model, gated/private status
  • Inference status (warm/cold)
  • Popularity score, engagement ratio, downloads per day, model age

Datasets — structured metadata

  • Name, author, downloads, likes, license
  • Task categories (text-generation, question-answering, etc.)
  • Size category (1K–10K, 10K–100K, 100K–1M, etc.)
  • Language tags, creation date, last modified

Spaces — AI demos and apps

  • Name, author, likes, SDK (Gradio, Streamlit, Docker)
  • Runtime info, tags, creation date

Daily Papers — cutting-edge research

  • Title, full abstract, AI-generated summary and keywords
  • Authors, upvotes, comment count
  • GitHub repo link and star count
  • Arxiv URL, thumbnail, publication date

Smart filters — get exactly what you need

  • Filter by keyword, author/org, pipeline task, ML library
  • Minimum downloads and likes thresholds
  • Parameter range (e.g., only 1B–10B models)
  • Exclude gated or private items
  • Sort by downloads, likes, trending, recently created, or recently modified

Growth tracking between runs

  • Persistent snapshot store tracks downloads and likes over time
  • On subsequent runs: downloadsDelta, downloadsPerHour, likesDelta, trend (up/down/flat)
  • See which models are gaining or losing momentum
  • Perfect for scheduled monitoring of AI model trends

Detailed enrichment (optional)

  • Fetch full model details: exact parameter count, architectures, model type
  • Size tier classification: tiny (<500M) → massive (100B+)
  • Popularity score combining downloads and community engagement
  • Downloads per day normalized by model age

Example use cases

  • AI researchers: Track trending models in your field, monitor new papers daily
  • ML engineers: Find the best model for your task — filter by pipeline, size, and popularity
  • Investors: Monitor which AI companies are gaining traction on Hugging Face
  • Data teams: Build a dataset catalog filtered by task, size, and license
  • Content creators: Track what's hot in AI this week for newsletters and reports
  • Competitive intelligence: Monitor specific orgs (OpenAI, Meta, Google) and their model releases

Input examples

Trending models right now:

{
"resourceType": "models",
"sort": "trending",
"maxResults": 50
}

LLMs from Meta with full details:

{
"resourceType": "models",
"author": "meta-llama",
"pipeline_tag": "text-generation",
"sort": "downloads",
"maxResults": 20,
"fetchDetails": true
}

Popular code datasets:

{
"resourceType": "datasets",
"search": "code",
"sort": "likes",
"minLikes": 50,
"maxResults": 30
}

Today's research papers:

{
"resourceType": "papers",
"maxResults": 50
}

Image generation models with 10K+ downloads:

{
"resourceType": "models",
"pipeline_tag": "text-to-image",
"sort": "downloads",
"minDownloads": 10000,
"maxResults": 20
}

Output sample (model)

{
"type": "model",
"id": "meta-llama/Llama-3.1-8B-Instruct",
"author": "meta-llama",
"downloads": 9980754,
"likes": 6137,
"pipeline": "text-generation",
"library": "transformers",
"parameters": 8030261248,
"sizeTier": "medium (3B-10B)",
"architectures": ["LlamaForCausalLM"],
"modelType": "llama",
"license": "llama3.1",
"language": ["en", "de", "fr", "it", "pt", "hi", "es", "th"],
"popularityScore": 3208,
"downloadsPerDay": 14157,
"engagementRatio": 61.49,
"ageDays": 705,
"url": "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct"
}

Integrations

Connect this scraper to any tool in your stack:

  • Google Sheets — auto-sync model rankings weekly
  • Slack / Discord — get alerts when a new trending model appears
  • Webhooks — trigger your pipeline when new data lands
  • API — fetch results programmatically from any language
  • Zapier / Make — connect to 5000+ apps without code

Cost

This actor uses pay-per-result pricing at $5.00 per 1,000 results ($0.005 per item). You only pay for the data you get — no platform usage fees on top.

Example runResultsCost
Top 50 trending models50$0.25
All meta-llama models with details~20$0.10
100 text-to-image models100$0.50
Today's research papers~50$0.25
1,000 most downloaded models1,000$5.00

Platform compute costs are minimal — a typical 100-item run finishes in under 10 seconds.


Limitations

  • Hugging Face API rate limit: 500 requests per 5 minutes (handled automatically with throttling)
  • Parameter count requires fetchDetails: true and is only available for models with safetensors weights
  • Papers endpoint returns daily papers only (no historical archive search)