Pricing

Pay per usage

Kaggle Dataset Scraper - ML Dataset Metadata

Scrape Kaggle datasets and competitions. Extract dataset names, download counts, file sizes, usability ratings, tags, and license info.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Actor stats

Bookmarked

Total users

Monthly active users

an hour ago

Last modified

Kaggle Dataset Scraper

Extract dataset metadata and competition data from Kaggle at scale. Scrape dataset names, authors, download counts, vote counts, usability ratings, file sizes, tags, licenses, and more from listing pages, individual datasets, and competition pages.

Features

Dataset listings - Scrape the main Kaggle datasets directory with pagination support
Individual datasets - Extract detailed metadata from specific dataset pages
Competition pages - Scrape competition listings and metadata
User profiles - Extract all datasets from a specific Kaggle user
Search results - Scrape datasets matching a search query
Embedded data extraction - Parses __NEXT_DATA__, component props, and Kaggle State for comprehensive data capture
Smart pagination - Automatically enqueues next pages when more results are needed
Proxy support - Optional residential proxy for higher success rates

Input Parameters

Parameter	Type	Default	Description
`urls`	Array of strings	`["https://www.kaggle.com/datasets"]`	Kaggle URLs to scrape. Supports dataset listings, individual datasets, competitions, user profiles, and search queries.
`maxResults`	Integer	`100`	Maximum number of results to extract. Range: 1-10,000.
`useResidentialProxy`	Boolean	`false`	Use residential proxies for better success rates. Increases cost but reduces blocking.

Supported URL Formats

https://www.kaggle.com/datasets - Main dataset directory
https://www.kaggle.com/datasets?search=topic - Search results
https://www.kaggle.com/datasets/owner/dataset-name - Individual dataset
https://www.kaggle.com/username/datasets - User's datasets
https://www.kaggle.com/competitions - Competition listings

Output Fields

Field	Type	Description
`datasetName`	String	Name/title of the dataset
`author`	String	Username of the dataset creator
`description`	String	Dataset subtitle or description
`fileSize`	String	Total file size (e.g., "1.5 GB", "245.3 MB")
`downloadCount`	Integer	Number of times the dataset has been downloaded
`voteCount`	Integer	Number of upvotes/votes
`usabilityRating`	Number	Kaggle usability rating (0-10 scale)
`lastUpdated`	String	Date the dataset was last updated
`tags`	Array	List of tags/keywords associated with the dataset
`license`	String	License type (e.g., "CC0", "CC BY-SA 4.0")
`url`	String	Direct URL to the dataset on Kaggle
`scrapedAt`	String	ISO 8601 timestamp of when the data was collected

Example Output

{
    "datasetName": "Netflix Movies and TV Shows",
    "author": "shivamb",
    "description": "Listings of all movies and TV shows available on Netflix",
    "fileSize": "3.2 MB",
    "downloadCount": 245000,
    "voteCount": 1892,
    "usabilityRating": 8.8,
    "lastUpdated": "2025-09-15",
    "tags": ["movies and tv shows", "arts and entertainment", "netflix"],
    "license": "CC0: Public Domain",
    "url": "https://www.kaggle.com/datasets/shivamb/netflix-shows",
    "scrapedAt": "2026-02-11T12:00:00.000Z"
}

Example Use Cases

Data science research - Discover and catalog datasets for machine learning projects
Competitive analysis - Track the most popular and downloaded datasets across categories
Trend analysis - Monitor which data topics are gaining traction on Kaggle
Dataset discovery - Find datasets by tag, license, or popularity for specific research needs
Academic research - Build a comprehensive index of available open datasets
Competition tracking - Monitor active Kaggle competitions and their engagement

Cost Estimate

This actor uses Utility tier Pay-Per-Event pricing at $0.0003 per result.

Results	Estimated Cost
100	$0.03
1,000	$0.30
3,333	~$1.00
10,000	$3.00

Approximately 3,333 results per $1.00.

Compute costs are minimal since this is a Cheerio-based scraper (no browser overhead). A typical run of 100 results completes in under 2 minutes using ~256 MB memory.

Limitations

JavaScript-rendered content - Kaggle uses heavy client-side rendering (React/Next.js). Some pages may yield fewer results than visible in a browser. The scraper compensates by extracting data from embedded JSON, __NEXT_DATA__, and component props.
Rate limiting - Kaggle may throttle or block requests at high concurrency. Use residential proxies for large-scale scraping.
Private datasets - Only publicly accessible datasets can be scraped. Private or organization-only datasets require authentication.
Login-gated content - Some Kaggle pages require login to view full content. The scraper extracts what is available without authentication.
Dynamic loading - Kaggle uses infinite scroll on listing pages. The scraper handles pagination via URL parameters but may not capture all items from dynamically loaded content.
API changes - Kaggle may update their frontend structure at any time, which could affect extraction accuracy.

Kaggle Scraper

muhammetakkurtt/kaggle-scraper

Efficiently extracts dataset information from Kaggle based on user-defined search terms. Collects datasets metadata, categories, usability ratings and file information. Customizable scraping depth. Ideal for researchers and data scientists seeking quick insights into Kaggle datasets.

Muhammet Akkurt

5.0

Kaggle Phone Number Scraper

contacts-api/kaggle-phone-number-scraper

Extract public contact numbers with our Kaggle Phone Number Scraper. Find phone numbers from Kaggle profiles for recruitment and outreach.

Lead Heaven

Kaggle Email Scraper - Advanced, Fast & Cheapest

contacts-api/kaggle-email-scraper-fast-advanced-and-cheapest

📊 Kaggle Email Scraper enables you to gather data scientist and organization emails from Kaggle profiles ⚡ Ideal for hiring and research 📧

Lead Heaven

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

idIA Tech

CSV File to Dataset

lukaskrivka/csv-file-to-dataset

Upload a local or remote CSV/text file and convert it to Apify Dataset for further use.

Lukáš Křivka

162

Dataset(s) To Schema

zuzka/dataset-to-schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

Zuzka Pelechová

5.0

Brand Monitoring Dataset

nathan_switch/brand-monitoring-dataset

Switch

Scrape Rss To Dataset — Data, Details & Metadata

tropical_quince/rss-to-dataset

Scrape rss to dataset data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

XMLs To Dataset

mtrunkat/xmls-to-dataset

Go to actor anytime you need to download XML files and store them in the dataset.

Marek Trunkát

111

Dataset to HuggingFace

flamboyant_leaf/DatasetToHuggingFace

Transfers data from Apify datasets to Hugging Face datasets. Bridges web scraping with ML platforms, enabling access to pre-trained models and collaborative tools. Customize transfer limits, streamline ML workflows, and leverage data versioning. Ideal for data scientists and ML researchers.