Hugging Face Datasets Scraper
Pricing
$5.00/month + usage
Hugging Face Datasets Scraper
Scrape dataset metadata from Hugging Face Hub. Extract names, authors, download counts, likes, trending scores, task categories, size categories, languages, licenses, tags and descriptions. Filter by search query, task type, language, or license. Sort by trending, downloads, likes, or last modified.
Pricing
$5.00/month + usage
Rating
0.0
(0)
Developer

ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 days ago
Last modified
Categories
Share

๐ Hugging Face Datasets Scraper
Collect comprehensive dataset metadata from Hugging Face Hub without any coding required. Perfect for researchers building ML benchmarks, companies monitoring trending datasets, or analysts comparing dataset sizes and download trends. Search by keyword, filter by task category and language, and export dataset metadata as CSV, JSON, or Excel in minutes.
The Hugging Face Datasets Scraper collects structured dataset metadata from Hugging Face Hub up to 1,000,000 datasets per run with no authentication needed.
โจ What Does It Do
- ๐ฏ Dataset ID - Identify each dataset uniquely in author/name format for tracking and cross-referencing
- ๐ Description - Capture full dataset descriptions to understand what each dataset contains
- ๐ค Author - Track which teams or creators published each dataset
- ๐ฅ Download Count - Monitor how many times each dataset has been downloaded to gauge adoption
- ๐ Like Count - See community engagement metrics for each dataset
- ๐ท๏ธ Tags - Collect descriptive tags to classify datasets by topic and use case
- ๐พ Size Categories - Get size category metrics to understand dataset scale and requirements
- ๐ Created and Modified Dates - Track when datasets were published and last updated
- ๐ Access Status - Identify private or gated datasets requiring special permissions
- โ๏ธ License - Collect license types to ensure legal compliance for your use case
- ๐ Languages - See which languages are covered in each dataset
- ๐ฅ Trending Score - Track which datasets are gaining momentum on the platform
๐ง Input
- Max Items - Set how many datasets to collect (free users up to 100, paid users up to 1,000,000)
- Search Query - Enter keywords to filter datasets by name and description
- Task Category - Narrow results to specific ML task types like text-classification or image-classification
- Language - Filter datasets by language code like "en" for English or "zh" for Chinese
- License - Filter datasets by license type like "apache-2.0", "mit", or "cc-by-4.0"
- Sort By - Choose how to rank results: Trending Score, Downloads, Likes, or Last Modified
- Sort Direction - Order results descending (highest first) or ascending (lowest first)
Example input:
{"maxItems": 100,"query": "text classification","taskCategory": "text-classification","language": "en","license": "apache-2.0","sort": "downloads","direction": "desc"}
๐ Output
Each dataset includes up to 20 data fields. Download as JSON, CSV, or Excel.
| ๐ฏ Dataset ID | ๐ Dataset Name | ๐ค Author |
|---|---|---|
| ๐ Description | ๐ท๏ธ Task Categories | ๐พ Size Categories |
| ๐ Languages | โ๏ธ License | ๐ฅ Downloads |
| ๐ Likes | ๐ฅ Trending Score | ๐ Created At |
| ๐ Last Modified | ๐ Dataset URL | ๐ Is Private |
| ๐ Is Gated | ๐ซ Tags | โ Is Disabled |
| ๐ Latest SHA | ๐ Scraped At | โ ๏ธ Error Messages |
๐ Why Choose the Hugging Face Datasets Scraper?
| Feature | Our Actor | Similar Tools |
|---|---|---|
| No authentication required | โ๏ธ | โ |
| Direct access to live data (fastest) | โ๏ธ | โ |
| Filter by task category | โ๏ธ | โ |
| Filter by language | โ๏ธ | โ |
| Filter by license | โ๏ธ | โ |
| Sort by trending, likes, downloads, or date | โ๏ธ | Partial |
| Download counts included | โ๏ธ | โ |
| Like counts included | โ๏ธ | โ |
| Up to 1,000,000 datasets per run | โ๏ธ | โ |
| Free tier up to 100 datasets | โ๏ธ | โ๏ธ |
| Export to CSV, JSON, Excel | โ๏ธ | โ๏ธ |
๐ How to Use
No technical skills required. Follow these simple steps:
- Sign Up: Create a free account with $5 credit
- Find the Tool: Search for "Hugging Face Datasets Scraper" in the Apify Store and configure your input
- Run It: Click "Start" and watch your results appear
That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.
๐ฏ Business Use Cases
- ๐ ML Researchers - Search for datasets by task category to find training data for natural language processing projects, saving weeks of manual research
- ๐ข Data Scientists - Monitor download trends on trending datasets to identify emerging benchmarks that competitors are using
- ๐ Product Managers - Track which datasets have the most community engagement to understand which domains are hot in AI development
โ FAQ
๐ How does this scraper work? The Hugging Face Datasets Scraper connects to Hugging Face's public dataset listing, so no account is needed and data is always current.
๐ Is the data accurate? Yes, you get real time, verified dataset metadata including accurate download and like counts directly from Hugging Face.
๐ Can I schedule regular runs? Absolutely. You can set up recurring collections on any schedule, hourly, daily, or weekly, to track how datasets grow over time.
โ๏ธ Is it legal to collect this data? Yes, you are collecting public metadata from Hugging Face. The data is freely available to anyone. Make sure you comply with the licenses of individual datasets if you plan to use the underlying data.
๐ก๏ธ Will Hugging Face block me? No, this scraper uses Hugging Face's public dataset listing, so there's no blocking risk. You're using the same interface that powers the web.
โก How long does a run take? A typical run collecting 100 datasets takes about 10-30 seconds depending on network speed. Larger runs collecting 1,000+ datasets take 2-5 minutes.
โ ๏ธ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.
๐ Integrate Hugging Face Datasets Scraper with any app
- Make - Automate workflows
- Zapier - Connect 5000+ apps
- GitHub - Version control integration
- Slack - Get notifications
- Airbyte - Data pipelines
- Google Drive - Export to spreadsheets
๐ก More ParseForge Actors
- Crunchbase Scraper - Extract startup data, investor info, and funding rounds
- Etsy Scraper - Collect product listings, prices, and seller ratings
- SEC 13F Holdings Scraper - Monitor institutional investor portfolios
- Indeed Scraper - Extract job postings and applicant insights
- Redfin Scraper - Collect real estate listings and price history
Browse our complete collection of data extraction tools for more.
๐ Ready to Start?
Create a free account with $5 credit and collect your first 100 datasets for free. No coding, no setup.
๐ Need Help?
- Check the FAQ section above for common questions
- Visit the Apify support page for documentation and tutorials
- Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form
โ ๏ธ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.