Pricing

from $2.00 / 1,000 results

Go to Apify Store

Hugging Face Datasets Scraper - AI Dataset Metadata

Try for free

Scrape Hugging Face dataset search results: dataset IDs, authors, downloads, likes, tags and update timestamps.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Ben

Actor stats

Bookmarked

Total users

Monthly active users

6 days ago

Last modified

What does this actor do?

The Hugging Face Datasets Scraper - AI Dataset Metadata fetches live records from Hugging Face Datasets, normalizes the response, and pushes one dataset item per result. Instead of handling raw nested API responses, rate-limit retries, field cleanup, pagination details, and export formatting yourself, you get a maintained Apify actor with predictable input and output fields.

The actor is designed for practical data work. It keeps the default run small so Apify's daily checks finish quickly, but you can raise maxResults for production. It does not use a browser, residential proxy, login session, or paid unblocker. That keeps the cost low and the reliability high.

Why use this actor?

Teams often need public metadata in repeatable workflows: package monitoring, research discovery, AI dataset collection, market mapping, compliance review, enrichment, trend tracking, or competitive analysis. The hard part is not running one request; it is keeping the connector consistent, scheduled, documented, exportable, and easy for non-developers to reuse.

This actor gives you that connector. You can run it manually, schedule it, call it from the Apify API, connect it to Make, Zapier, n8n, or download the dataset as JSON, CSV, Excel, XML, RSS, or HTML table.

Input

{
  "query": "sentiment",
  "maxResults": 25
}

query is the keyword, package name, topic, entity, organization, model, or dataset term to search for. maxResults controls how many rows are pushed to the dataset. Start with a small value while testing and increase it once your workflow is stable.

Output

Every result is pushed to the default Apify dataset as a flat JSON object. Field names vary by source, but typical rows include identifiers, names or titles, descriptions, URLs, timestamps, counts, scores, owners, tags, references, and the original search term.

{
  "name": "example result",
  "description": "Normalized metadata from Hugging Face Datasets",
  "url": "https://example.com/result",
  "source": "Hugging Face Datasets",
  "search": "sentiment"
}

Common use cases

Use this actor for source monitoring, public data enrichment, internal search indexes, lead and account research, AI/RAG dataset preparation, technical due diligence, package ecosystem reports, research discovery, SEO content research, dashboards, and recurring CSV exports.

For developer and package sources, it helps track projects, packages, maintainers, download signals, repository links, and descriptions. For research and data sources, it helps collect papers, datasets, organizations, entities, taxonomy records, and metadata that can be joined with your own systems.

Data quality

The actor reads live public data at runtime. HTML snippets are cleaned, nested fields are flattened where useful, and each row includes a source and search field so scheduled runs can be merged safely. Lists are capped to practical sizes in the output to avoid creating oversized records.

Because this actor uses public endpoints, data availability depends on the upstream service. If a result is removed, renamed, or updated upstream, the next run reflects that change. This is useful for monitoring workflows where freshness matters more than static snapshots.

Reliability

This is a direct HTTP actor. It avoids browser automation, CAPTCHA workflows, cookie state, and proxy dependencies. Requests include timeouts, redirects, retries, and a normal browser-like user agent. That makes the actor suitable for Apify scheduled runs and daily store reliability tests.

If the upstream API changes, the actor can be patched while keeping the same Apify input and output workflow for users. Downstream systems should rely on stable identifiers and URLs where available.

Pricing

The actor uses pay-per-event pricing. A small run-start fee covers orchestration, and the result event is charged per dataset item. This keeps small monitoring jobs affordable while allowing larger exports when needed.

FAQ

Does this require an API key?

No. The default workflow uses public endpoints and does not require user credentials.

Can I run it on a schedule?

Yes. Create a saved task with your query and schedule it hourly, daily, weekly, or monthly.

Can I export the data?

Yes. Apify datasets export to JSON, CSV, Excel, XML, RSS, and HTML table. You can also consume results through the Apify API.

Is this a browser scraper?

No. It uses direct HTTP requests for speed, low cost, and reliability.

Can I use it for enrichment?

Yes. Keep the identifier, URL, source, and search fields in your warehouse so you can join results with internal records.

You might also like: StepStone Scraper, Open VSX Extensions Scraper, NuGet Package Scraper, CISA KEV Scraper, NVD CVE Scraper, Crossref Papers Scraper, Hugging Face Models Scraper, GitLab Projects Scraper, and Wikipedia Scraper.

Keywords

Hugging Face Datasets scraper, Hugging Face Datasets API, public data scraper, metadata scraper, Apify Hugging Face Datasets, JSON export, CSV export, research data, developer tools data, dataset scraper, monitoring actor, no-code data extraction, automation workflow, business data enrichment

Production workflow tips

For recurring monitoring, create separate saved tasks for your most important topics instead of one giant run. Smaller scheduled runs are easier to compare over time, easier to retry, and cheaper to debug. If you use the output for alerts, compare stable IDs first, then compare descriptions, counts, timestamps, URLs, or status fields.

For data warehouse workflows, store the Apify run ID and the search field with each record. That makes it easy to trace where a row came from, rebuild a historical snapshot, or merge multiple actor outputs into one table without losing source context.

Maintenance approach

This actor intentionally uses official or public endpoints with simple response formats. There is no login to expire and no browser fingerprint to maintain. That makes it a practical part of a larger portfolio of reliable data connectors rather than a fragile one-off script.

Hugging Face Models Scraper

fetch_cat/hugging-face-models-scraper

🤗 Scrape public Hugging Face model metadata, downloads, likes, tags, licenses, and update signals for AI market research.

Hanna Nosova

Hugging Face Models Scraper — Search, Downloads, Likes, Tags

seemuapps/huggingface-models-scraper

Search Hugging Face for models by task, tag, or keyword and export downloads, likes, library, license, and tags to a clean dataset.

Andrew

Hugging Face Scraper — AI Models & Datasets

hichemdev/huggingface-scraper

Scrape Hugging Face models and datasets: downloads, likes, task, library, tags, author and dates. Track trending AI models via the official Hugging Face API.

Hichem Ben Moussa

Hugging Face Scraper - Models, Datasets, Papers

logiover/huggingface-hub-intelligence-scraper

Hugging Face data export tool: scrape models, datasets & daily papers without a token. Export to CSV/JSON. A no-login Hugging Face API alternative.

Logiover

Hugging Face Datasets Scraper

parseforge/hugging-face-datasets-scraper

Scrape dataset metadata from Hugging Face Hub. Extract names, authors, download counts, likes, trending scores, task categories, size categories, languages, licenses, tags and descriptions. Filter by search query, task type, language, or license. Sort by trending, downloads, likes, or last modified.

ParseForge

Hugging Face Models Scraper - AI/ML Data

benthepythondev/huggingface-models-scraper

Search Hugging Face for AI/ML models or datasets by keyword and get structured data: id, author, task, downloads, likes, library, tags, license and dates. Fast and reliable via the public Hugging Face Hub API. For AI/ML market research, model discovery and trend tracking.

Ben

Hugging Face Model & Dataset Scraper

cloud9_ai/huggingface-scraper

Search and extract ML models and datasets from Hugging Face Hub. Get model cards, download stats, tasks, and architectures. No API key needed.

cloud9

Hugging Face Scraper - Models Datasets Spaces

openclawmara/huggingface-scraper

Scrape Hugging Face models, datasets, and Spaces. Extracts metadata, downloads, likes, tags, and usage stats. Ideal for AI model discovery, competitive analysis, and tracking trending ML resources.

OpenClaw Mara

Hugging Face Scraper

straightforward_hydra/huggingface-scraper

AI model intelligence from the open Hugging Face Hub API: trending models, datasets and spaces by task, author and library. No API key.

Dev D

Hugging Face Model Scraper

parseforge/hugging-face-model-scraper

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames.

ParseForge

5.0

Hugging Face Datasets Scraper - AI Dataset Metadata

What does this actor do?

Why use this actor?

Input

Output

Common use cases

Data quality

Reliability

Pricing

FAQ

Does this require an API key?

Can I run it on a schedule?

Can I export the data?

Is this a browser scraper?

Can I use it for enrichment?

Related actors

Keywords

Production workflow tips

Maintenance approach

You might also like

Hugging Face Models Scraper

Hugging Face Models Scraper — Search, Downloads, Likes, Tags

Hugging Face Scraper — AI Models & Datasets

Hugging Face Scraper - Models, Datasets, Papers

Hugging Face Datasets Scraper

Hugging Face Models Scraper - AI/ML Data

Hugging Face Model & Dataset Scraper

Hugging Face Scraper - Models Datasets Spaces

Hugging Face Scraper

Hugging Face Model Scraper