Deprecated

Pricing

from $0.00005 / actor start

See alternative Actors

Go to Apify Store

Huggingface Model Page Scraper

Deprecated

See alternative Actors

Scrapes metadata and model card text from Hugging Face model pages.

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

Akash Kumar Naik

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

Hugging Face Model Page Scraper

Scrape Hugging Face model pages and return structured model metadata for analytics, monitoring, and downstream pipelines.

What This Actor Does

This Actor collects metadata from Hugging Face model pages such as:

model ID, title, and author
likes and download counters
pipeline tag, library, license, languages, tags
creation/last-modified timestamps
model card text (optional)

It uses CheerioCrawler for fast HTTP scraping and reads model metadata from the page's ModelHeader payload.

Who It Is For

ML platform teams tracking model popularity and changes
data engineers building model catalogs
research teams collecting model metadata at scale

Key Features

Accepts full model URLs, owner/model IDs, and listing URLs like https://huggingface.co/models
Deduplicates model inputs automatically
Optional model card text extraction
Emits structured dataset output for each model
Stores failures as output rows with an error field

Typical Use Cases

Daily monitoring of selected Hugging Face models
Building internal dashboards for likes/download trends
Enriching model registry records with tags and license data

Input Parameters

Provide at least one URL in startUrls.

Parameter	Type	Required	Default	Description
`startUrls`	array	Yes	`[{ "url": "https://huggingface.co/models" }]`	Model page URLs and/or listing URLs.
`maxItems`	integer	No	`30`	Maximum model pages to scrape.
`includeModelCardText`	boolean	No	`false`	Include extracted model card text in output.

Example input:

{
  "startUrls": [
    { "url": "https://huggingface.co/models" },
    { "url": "https://huggingface.co/google-bert/bert-base-uncased" }
  ],
  "maxItems": 30,
  "includeModelCardText": false
}

Output Format

Each dataset item includes fields such as:

{
  "url": "https://huggingface.co/google-bert/bert-base-uncased",
  "canonicalUrl": "https://huggingface.co/google-bert/bert-base-uncased",
  "modelId": "google-bert/bert-base-uncased",
  "title": "google-bert/bert-base-uncased · Hugging Face",
  "author": "google-bert",
  "likes": 2577,
  "downloads": 60873623,
  "downloadsAllTime": 2734678037,
  "pipelineTag": "fill-mask",
  "libraryName": "transformers",
  "license": "apache-2.0",
  "languages": ["en"],
  "tags": ["transformers", "pytorch"],
  "createdAt": "2022-03-02T23:29:04.000Z",
  "lastModified": "2024-02-19T11:06:12.000Z",
  "private": false,
  "gated": false,
  "cardExists": true,
  "inference": "warm",
  "statusCode": 200,
  "modelCardText": null,
  "scrapedAt": "2026-02-28T00:00:00.000Z",
  "error": null
}

Edge cases:

Invalid inputs are skipped with warnings in logs.
Request failures are still pushed to dataset with error populated.
modelCardText can be null when disabled or unavailable.

Quick Start

Run Locally

Install dependencies:

$npm install

Create input file:

storage/key_value_stores/default/INPUT.json

Run the Actor:

$apify run

Deploy To Apify

$apify push

Pricing Expectations

This Actor supports Pay Per Event charging via Actor.charge(...) per successful model scrape when:

Actor monetization is set to PAY_PER_EVENT on Apify.
Event name result exists in Actor monetization settings.

If the actor is not on PAY_PER_EVENT pricing, it continues scraping without charging.

Project PPE reference file: .actor/pay-per-event.json.

To reduce cost:

lower maxItems
lower maxConcurrency
increase requestDelayMs when large batches are not urgent

FAQ

Why did I get no results?

Make sure at least one valid model input exists in modelUrls or modelIds, for example google-bert/bert-base-uncased.

Do I need a browser crawler?

No. This Actor uses HTTP + Cheerio for speed because target metadata is available in server-rendered HTML payloads.

Why do I see rows with `error`?

Failed requests are intentionally emitted so batch runs keep a complete audit trail of success/failure per model.

Can I scrape private or gated models?

Public page metadata is scraped. Access-restricted content may be limited depending on model gating and authentication state.

Legal And Compliance

Respect Hugging Face Terms of Service and robots policy.
Only scrape data you are allowed to collect and store.
Review licensing fields (license) before downstream usage of model artifacts.

Support

For issues or enhancements, open an issue in your project repository or update this Actor code directly in this project.

Apify Actors docs: https://docs.apify.com/platform/actors/development
Apify SDK for JavaScript: https://docs.apify.com/sdk/js
Crawlee docs: https://crawlee.dev

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

mick_

Hugging Face Model Scraper

parseforge/hugging-face-model-scraper

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames.

ParseForge

5.0

(3)

Find Linkedin Company Page Urls

sbzh/domain-names-or-website-urls-to-linkedin-company-page-urls

Use this tool to retrieve the LinkedIn URLs from websites. Simply enter a list of domain names or website URLs and, when available, retrieve the LinkedIn URL of the company page in the format https://www.linkedin.com/company/...

Sambzh

100

Hugging Face Scraper

automation-lab/huggingface-scraper

Extract AI models, datasets, spaces, and papers from Hugging Face Hub. Filter by task, library, author, or language. Sort by downloads, likes, or trending. No API key needed.

Stas Persiianenko

HuggingFace Models Scraper

tzmyk/huggingface-models-scraper

Scrapes AI/ML models from HuggingFace (huggingface.co/models) via the official API. Extracts model ID, downloads, likes, task type, library, tags, and more. Supports search, author/org filter, pipeline tag filter, and sort order.

tzmyk

Huggingface Models Scraper

klondikeking/huggingface-models-scraper

Pierrick McD0nald

Hugging Face Models Scraper - Cheap 🤗🤖🔎

scrapestorm/hugging-face-models-scraper---cheap

🟠 Easily collect Models from Hugging Face Provide one or multiple search keywords and extract structured model data including model name, owner, likes, downloads, tags, last update date, match count & more 🤖📊 Perfect for AI model research, popularity tracking & model ecosystem monitoring 🚀

Storm_Scraper

5.0

(1)

Otomoto Search Scraper - Tani Cheap 🚗🇵🇱🔎

scrapestorm/otomoto-search-scraper---tani-cheap

Looking to collect cars from Otomoto? 🚗🇵🇱 With this Otomoto scraper 🔎, gather vehicle results from Otomoto search URLs including car title, price, year, mileage, fuel type, gearbox, location, URL & more. Perfect for automotive market analysis, dealership research & used car market insights 📊

Storm_Scraper

5.0

(1)

Website Services Finder

rigelbytes/website-services-finder

Automatically extract and analyze company services from any business website using advanced AI. Choose from 5 AI providers and 23+ models to intelligently identify and categorize a company’s offered services — perfect for lead generation, market research, and competitive analysis.

Rigel Bytes

HolidayCheck Reviews Scraper - Billing Cheap⭐🏨💬

scrapestorm/holidaycheck-reviews-scraper---billing-cheap

Looking to collect reviews from HolidayCheck.de ? ⭐ With the HolidayCheck Reviews Scraper 🌍, gather detailed hotel or attraction reviews from specific URLs, including author name, travel type, date, rating, full review text Bewertungen & more Perfect for sentiment analysis & tourism insights ✈️📊

Storm_Scraper

5.0

(1)

Find a Job (DWP) Scraper - Cheap 💼🔎🇬🇧

scrapestorm/find-a-job-dwp-scraper---cheap

Looking to collect job listings from Findajob.dwp.gov.uk ? 💼🔎🇬🇧 With this Scraper you can extract job information directly from Find a Job including job title, employer name, location, salary, contract type, job URL & more Perfect for labour market analysis & hiring trend monitoring 📊

Storm_Scraper

5.0

(1)

Huggingface Model Page Scraper

Hugging Face Model Page Scraper

What This Actor Does

Who It Is For

Key Features

Typical Use Cases

Input Parameters

Output Format

Quick Start

Run Locally

Deploy To Apify

Pricing Expectations

FAQ

Why did I get no results?

Do I need a browser crawler?

Why do I see rows with error?

Can I scrape private or gated models?

Legal And Compliance

Support

Related Links

You might also like

Ai-ML-scraper

Hugging Face Model Scraper

Find Linkedin Company Page Urls

Hugging Face Scraper

HuggingFace Models Scraper

Huggingface Models Scraper

Hugging Face Models Scraper - Cheap 🤗🤖🔎

Otomoto Search Scraper - Tani Cheap 🚗🇵🇱🔎

Website Services Finder

HolidayCheck Reviews Scraper - Billing Cheap⭐🏨💬

Find a Job (DWP) Scraper - Cheap 💼🔎🇬🇧

Why do I see rows with `error`?