Hugging Face Datasets Scraper avatar

Hugging Face Datasets Scraper

Pricing

Pay per event

Go to Apify Store
Hugging Face Datasets Scraper

Hugging Face Datasets Scraper

Scrape dataset metadata from Hugging Face Hub. Extract names, authors, download counts, likes, trending scores, task categories, size categories, languages, licenses, tags and descriptions. Filter by search query, task type, language, or license. Sort by trending, downloads, likes, or last modified.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

14 days ago

Last modified

Share

ParseForge Banner

πŸ“Š Hugging Face Datasets Scraper

πŸ•’ Last updated: 2026-05-05

Collect comprehensive dataset metadata from Hugging Face Hub without any coding required. Perfect for researchers building ML benchmarks, companies monitoring trending datasets, or analysts comparing dataset sizes and download trends. Search by keyword, filter by task category and language, and export dataset metadata as CSV, JSON, or Excel in minutes.

The Hugging Face Datasets Scraper collects structured dataset metadata from Hugging Face Hub up to 1,000,000 datasets per run with no authentication needed.

✨ What Does It Do

  • 🎯 Dataset ID - Identify each dataset uniquely in author/name format for tracking and cross-referencing
  • πŸ“ Description - Capture full dataset descriptions to understand what each dataset contains
  • πŸ‘€ Author - Track which teams or creators published each dataset
  • πŸ“₯ Download Count - Monitor how many times each dataset has been downloaded to gauge adoption
  • πŸ‘ Like Count - See community engagement metrics for each dataset
  • 🏷️ Tags - Collect descriptive tags to classify datasets by topic and use case
  • πŸ’Ύ Size Categories - Get size category metrics to understand dataset scale and requirements
  • πŸ“… Created and Modified Dates - Track when datasets were published and last updated
  • πŸ”’ Access Status - Identify private or gated datasets requiring special permissions
  • βš–οΈ License - Collect license types to ensure legal compliance for your use case
  • 🌍 Languages - See which languages are covered in each dataset
  • πŸ”₯ Trending Score - Track which datasets are gaining momentum on the platform

πŸ”§ Input

  • Max Items - Set how many datasets to collect (free users up to 100, paid users up to 1,000,000)
  • Search Query - Enter keywords to filter datasets by name and description
  • Task Category - Narrow results to specific ML task types like text-classification or image-classification
  • Language - Filter datasets by language code like "en" for English or "zh" for Chinese
  • License - Filter datasets by license type like "apache-2.0", "mit", or "cc-by-4.0"
  • Sort By - Choose how to rank results: Trending Score, Downloads, Likes, or Last Modified
  • Sort Direction - Order results descending (highest first) or ascending (lowest first)

Example input:

{
"maxItems": 100,
"query": "text classification",
"taskCategory": "text-classification",
"language": "en",
"license": "apache-2.0",
"sort": "downloads",
"direction": "desc"
}

πŸ“Š Output

Each dataset includes up to 20 data fields. Download as JSON, CSV, or Excel.

🎯 Dataset IDπŸ“ Dataset NameπŸ‘€ Author
πŸ“„ Description🏷️ Task CategoriesπŸ’Ύ Size Categories
🌍 Languagesβš–οΈ LicenseπŸ“₯ Downloads
πŸ‘ LikesπŸ”₯ Trending ScoreπŸ“… Created At
πŸ“… Last ModifiedπŸ”— Dataset URLπŸ”’ Is Private
πŸ”“ Is Gated🎫 Tagsβœ… Is Disabled
πŸ” Latest SHAπŸ“‹ Scraped At⚠️ Error Messages

πŸ’Ž Why Choose the Hugging Face Datasets Scraper?

FeatureOur ActorSimilar Tools
No authentication requiredβœ”οΈβŒ
Direct access to live data (fastest)βœ”οΈβŒ
Filter by task categoryβœ”οΈβŒ
Filter by languageβœ”οΈβŒ
Filter by licenseβœ”οΈβŒ
Sort by trending, likes, downloads, or dateβœ”οΈPartial
Download counts includedβœ”οΈβŒ
Like counts includedβœ”οΈβŒ
Up to 1,000,000 datasets per runβœ”οΈβŒ
Free tier up to 100 datasetsβœ”οΈβœ”οΈ
Export to CSV, JSON, Excelβœ”οΈβœ”οΈ

πŸ“‹ How to Use

No technical skills required. Follow these simple steps:

  1. Sign Up: Create a free account with $5 credit
  2. Find the Tool: Search for "Hugging Face Datasets Scraper" in the Apify Store and configure your input
  3. Run It: Click "Start" and watch your results appear

That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.

🎯 Business Use Cases

  • πŸ“Š ML Researchers - Search for datasets by task category to find training data for natural language processing projects, saving weeks of manual research
  • 🏒 Data Scientists - Monitor download trends on trending datasets to identify emerging benchmarks that competitors are using
  • πŸ“ˆ Product Managers - Track which datasets have the most community engagement to understand which domains are hot in AI development


✨ Why choose this Actor

Capability
🎯Built for the job. Scoped specifically to this data source so you skip the parser engineering entirely.
πŸ”–Structured output. Clean, typed fields ready for analysis, dashboards, or downstream pipelines.
⚑Fast. Optimized request patterns return results in seconds, not minutes.
πŸ”Always fresh. Every run pulls live data, so the dataset reflects the source as of run time.
🌐No infra to manage. Apify handles proxies, retries, scaling, scheduling, and storage.
πŸ›‘οΈReliable. Battle-tested across many runs and edge cases, with graceful error handling.
🚫No code required. Configure in the UI, run from CLI, schedule via cron, or call from any language with the Apify SDK.

πŸ“Š Production-grade structured data without the engineering overhead of building and maintaining your own scraper.


πŸ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Hugging Face Datasets Scraper (this Actor)$5 free credit, then pay-per-useFull source coverageLive per runSource-native filters supported⚑ 2 min
Build your own scraperEngineering hoursFull once builtWhenever you maintain itCustom code🐒 Days to weeks
Paid managed APIs$$$ monthlyVendor-definedLiveVendor-defined⏳ Hours
Third-party data dumpsVariesSubset, often stalePeriodicNoneπŸ•’ Variable

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.


πŸš€ How to use

  1. πŸ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Hugging Face Datasets Scraper page on the Apify Store.
  3. 🎯 Set input. Configure the input fields in the form (or paste a JSON), then set maxItems.
  4. πŸš€ Run it. Click Start and let the Actor collect your data.
  5. πŸ“₯ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


πŸ’Ό Business use cases

πŸ“Š Data & Analytics

  • Build trend reports and dashboards from live source data
  • Feed BI tools, warehouses, and ML pipelines with structured records
  • Run periodic snapshots to track changes over time
  • Compare segments, regions, or categories with consistent fields

🏒 Operations & Strategy

  • Monitor competitor moves, pricing, and inventory shifts
  • Build internal directories and lookup tools backed by current data
  • Power workflows that depend on fresh source records
  • Cut manual data-gathering time from hours to minutes

🎯 Marketing & Growth

  • Identify market opportunities and trending topics
  • Research target audiences and customer personas at scale
  • Power lead-generation pipelines with verified records
  • Track sentiment, reviews, or social signals over time

πŸ› οΈ Engineering & Product

  • Prototype features that need real-world data without owning a crawler
  • Replace fragile in-house scrapers with a managed Actor
  • Wire datasets into your apps via the Apify API or webhooks
  • Skip the proxy, retry, and parsing maintenance entirely

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

πŸŽ“ Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🀝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

πŸ§ͺ Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

πŸ€– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

πŸ”Œ Integrate with any app

Hugging Face Datasets Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe results into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh data into your product backend, or alert your team in Slack.


πŸ’‘ More ParseForge Actors

Browse our complete collection of data extraction tools for more.

πŸš€ Ready to Start?

Create a free account with $5 credit and collect your first 100 datasets for free. No coding, no setup.

πŸ†˜ Need Help?

  • Check the FAQ section above for common questions
  • Visit the Apify support page for documentation and tutorials
  • Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.


πŸ’‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.