Hugging Face Datasets Scraper avatar

Hugging Face Datasets Scraper

Pricing

$5.00/month + usage

Go to Apify Store
Hugging Face Datasets Scraper

Hugging Face Datasets Scraper

Scrape dataset metadata from Hugging Face Hub. Extract names, authors, download counts, likes, trending scores, task categories, size categories, languages, licenses, tags and descriptions. Filter by search query, task type, language, or license. Sort by trending, downloads, likes, or last modified.

Pricing

$5.00/month + usage

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 days ago

Last modified

Share

ParseForge Banner

๐Ÿ“Š Hugging Face Datasets Scraper

Collect comprehensive dataset metadata from Hugging Face Hub without any coding required. Perfect for researchers building ML benchmarks, companies monitoring trending datasets, or analysts comparing dataset sizes and download trends. Search by keyword, filter by task category and language, and export dataset metadata as CSV, JSON, or Excel in minutes.

The Hugging Face Datasets Scraper collects structured dataset metadata from Hugging Face Hub up to 1,000,000 datasets per run with no authentication needed.

โœจ What Does It Do

  • ๐ŸŽฏ Dataset ID - Identify each dataset uniquely in author/name format for tracking and cross-referencing
  • ๐Ÿ“ Description - Capture full dataset descriptions to understand what each dataset contains
  • ๐Ÿ‘ค Author - Track which teams or creators published each dataset
  • ๐Ÿ“ฅ Download Count - Monitor how many times each dataset has been downloaded to gauge adoption
  • ๐Ÿ‘ Like Count - See community engagement metrics for each dataset
  • ๐Ÿท๏ธ Tags - Collect descriptive tags to classify datasets by topic and use case
  • ๐Ÿ’พ Size Categories - Get size category metrics to understand dataset scale and requirements
  • ๐Ÿ“… Created and Modified Dates - Track when datasets were published and last updated
  • ๐Ÿ”’ Access Status - Identify private or gated datasets requiring special permissions
  • โš–๏ธ License - Collect license types to ensure legal compliance for your use case
  • ๐ŸŒ Languages - See which languages are covered in each dataset
  • ๐Ÿ”ฅ Trending Score - Track which datasets are gaining momentum on the platform

๐Ÿ”ง Input

  • Max Items - Set how many datasets to collect (free users up to 100, paid users up to 1,000,000)
  • Search Query - Enter keywords to filter datasets by name and description
  • Task Category - Narrow results to specific ML task types like text-classification or image-classification
  • Language - Filter datasets by language code like "en" for English or "zh" for Chinese
  • License - Filter datasets by license type like "apache-2.0", "mit", or "cc-by-4.0"
  • Sort By - Choose how to rank results: Trending Score, Downloads, Likes, or Last Modified
  • Sort Direction - Order results descending (highest first) or ascending (lowest first)

Example input:

{
"maxItems": 100,
"query": "text classification",
"taskCategory": "text-classification",
"language": "en",
"license": "apache-2.0",
"sort": "downloads",
"direction": "desc"
}

๐Ÿ“Š Output

Each dataset includes up to 20 data fields. Download as JSON, CSV, or Excel.

๐ŸŽฏ Dataset ID๐Ÿ“ Dataset Name๐Ÿ‘ค Author
๐Ÿ“„ Description๐Ÿท๏ธ Task Categories๐Ÿ’พ Size Categories
๐ŸŒ Languagesโš–๏ธ License๐Ÿ“ฅ Downloads
๐Ÿ‘ Likes๐Ÿ”ฅ Trending Score๐Ÿ“… Created At
๐Ÿ“… Last Modified๐Ÿ”— Dataset URL๐Ÿ”’ Is Private
๐Ÿ”“ Is Gated๐ŸŽซ Tagsโœ… Is Disabled
๐Ÿ” Latest SHA๐Ÿ“‹ Scraped Atโš ๏ธ Error Messages

๐Ÿ’Ž Why Choose the Hugging Face Datasets Scraper?

FeatureOur ActorSimilar Tools
No authentication requiredโœ”๏ธโŒ
Direct access to live data (fastest)โœ”๏ธโŒ
Filter by task categoryโœ”๏ธโŒ
Filter by languageโœ”๏ธโŒ
Filter by licenseโœ”๏ธโŒ
Sort by trending, likes, downloads, or dateโœ”๏ธPartial
Download counts includedโœ”๏ธโŒ
Like counts includedโœ”๏ธโŒ
Up to 1,000,000 datasets per runโœ”๏ธโŒ
Free tier up to 100 datasetsโœ”๏ธโœ”๏ธ
Export to CSV, JSON, Excelโœ”๏ธโœ”๏ธ

๐Ÿ“‹ How to Use

No technical skills required. Follow these simple steps:

  1. Sign Up: Create a free account with $5 credit
  2. Find the Tool: Search for "Hugging Face Datasets Scraper" in the Apify Store and configure your input
  3. Run It: Click "Start" and watch your results appear

That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.

๐ŸŽฏ Business Use Cases

  • ๐Ÿ“Š ML Researchers - Search for datasets by task category to find training data for natural language processing projects, saving weeks of manual research
  • ๐Ÿข Data Scientists - Monitor download trends on trending datasets to identify emerging benchmarks that competitors are using
  • ๐Ÿ“ˆ Product Managers - Track which datasets have the most community engagement to understand which domains are hot in AI development

โ“ FAQ

๐Ÿ” How does this scraper work? The Hugging Face Datasets Scraper connects to Hugging Face's public dataset listing, so no account is needed and data is always current.

๐Ÿ“Š Is the data accurate? Yes, you get real time, verified dataset metadata including accurate download and like counts directly from Hugging Face.

๐Ÿ“… Can I schedule regular runs? Absolutely. You can set up recurring collections on any schedule, hourly, daily, or weekly, to track how datasets grow over time.

โš–๏ธ Is it legal to collect this data? Yes, you are collecting public metadata from Hugging Face. The data is freely available to anyone. Make sure you comply with the licenses of individual datasets if you plan to use the underlying data.

๐Ÿ›ก๏ธ Will Hugging Face block me? No, this scraper uses Hugging Face's public dataset listing, so there's no blocking risk. You're using the same interface that powers the web.

โšก How long does a run take? A typical run collecting 100 datasets takes about 10-30 seconds depending on network speed. Larger runs collecting 1,000+ datasets take 2-5 minutes.

โš ๏ธ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.

๐Ÿ”— Integrate Hugging Face Datasets Scraper with any app

๐Ÿ’ก More ParseForge Actors

Browse our complete collection of data extraction tools for more.

๐Ÿš€ Ready to Start?

Create a free account with $5 credit and collect your first 100 datasets for free. No coding, no setup.

๐Ÿ†˜ Need Help?

  • Check the FAQ section above for common questions
  • Visit the Apify support page for documentation and tutorials
  • Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form

โš ๏ธ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.