Pricing

Pay per event

Kaggle Datasets Scraper

Extract Kaggle dataset metadata at scale: titles, owners, descriptions, tags, license, file types, sizes, downloads, views, and votes. Filter by search, tag, user, file type, or size.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

📊 Kaggle Datasets Scraper

🚀 Surface every public dataset on Kaggle in seconds. Filter by keyword, file format, license, sort order, and size. No API key, no registration, no manual CSV wrangling.

Kaggle hosts more than 400,000 public datasets contributed by data scientists, ML researchers, and academic groups, ranging from a 16 KB CSV of medical insurance costs to half-gigabyte historical stock dumps and image corpora used in published competitions. Each dataset has a rich metadata footprint that matters in practice: number of downloads, votes, view counts, the kernelCount of public Kaggle notebooks that consume it, the license, the file format, an automated usabilityRating for schema clarity, and a long-form Markdown description. This Actor turns that metadata layer into clean dataset rows you can sort, filter, and pipe into downstream tools.

This Actor is built for ML engineers picking training corpora, data scientists benchmarking model results against community baselines, AI researchers tracking which Kaggle datasets are gaining notebook traction, and academic teams sourcing reproducible inputs for coursework or thesis work. It is a pure HTTP scraper against Kaggle's public dataset endpoints, so runs are fast and cheap. It does not download the actual dataset files - only the metadata layer that helps you decide which ones to pull next. Output is plain JSON, ready to feed into BigQuery, a Postgres staging table, a notebook, or a Make / Zapier workflow.

🎯 Target Audience and Primary Use Cases

Audience	Use Case
🤖 ML engineers	Source training corpora and benchmark datasets by file format, license, and size
📈 Data scientists	Track which datasets are trending, surfacing newly hot competitions and corpora
🎓 AI researchers	Build reproducible bibliographies of community datasets used in papers
🏫 Academic teams	Pull dataset metadata for coursework, dissertations, and lit reviews
🧪 Product builders	Validate that public training data exists for a given vertical before committing engineering

📋 What the Kaggle Datasets Scraper does

🔎 Search by keyword. Free-text query against dataset titles, descriptions, and tags. Pass finance, nlp, medical imaging, etc., or leave blank to browse without a keyword.
🗂️ Filter by file format. CSV, JSON, SQLite, BigQuery, or all formats. Useful when your downstream tooling only accepts one shape.
⚖️ Filter by license. Restrict to Creative Commons, GPL, Open Database License, Other, or all licenses.
🏷️ Filter by tag. Pass any Kaggle tag slug (classification, nlp, finance, health) to scope the run to a topic, technique, or domain.
📐 Filter by size. Set minSize / maxSize in bytes to keep the result set within memory or storage limits for downstream tools.
🥇 Sort the way Kaggle does. Hottest, most votes, recently updated, most active, recently published.
📜 Optional full description enrichment. When enabled, each record is enriched with the dataset's long-form Markdown description, full tag list, and version history. Disable for faster runs when you only need card-level fields.

Each output record represents one public Kaggle dataset. Alongside the title, owner, and URL, the row includes the canonical ref (owner/slug), license, total bytes, current version number, usability rating, downloads, views, votes, the count of public notebooks (kernels) that reference the dataset, the topic count, last-updated timestamp, an array of tag slugs, a compact version history, and (optionally) the full Markdown description.

💡 Why it matters: Kaggle is one of the largest public catalogues of curated ML data on the web, and it sits behind a JS-heavy UI that is hard to crawl. A clean metadata feed lets you make data-sourcing decisions at the speed of SQL.

⚠️ Good to know: the Kaggle public API tolerates direct calls but rate-limits aggressively under sustained load. For runs above a few thousand datasets, enable Apify Residential proxy in the input.

🚀 How to use

🆔 Create a free account. Create a free account w/ $5 credit.
🔎 Open the Actor. Find the Kaggle Datasets Scraper on Apify Store.
📝 Fill the input form. Set a keyword and pick the filters that matter (file type, license, tag, size, sort).
▶️ Run. Click Start. The log streams listing pages and how many datasets have been collected.
⬇️ Export. Download as JSON, CSV, Excel, or stream into a Make / Zapier / n8n workflow.

⏱️ Total time to first row: under a minute for most filter combinations.

🔗 Recommended Actors

🤗 Hugging Face Model Scraper - public model catalogue with download counts and licenses
📚 Semantic Scholar Scraper - peer-reviewed papers with citation metadata
🧬 medRxiv Scraper - medical preprints for biomedical AI training corpora
🏛️ FRED Economic Data Scraper - public economic time series for finance and macro models
🏥 ClinicalTrials Scraper - structured clinical trial registry data

💡 Pro Tip: browse the complete ParseForge collection for more public-data scrapers built with the same conventions.

Disclaimer: This Actor is an independent project and is not affiliated with, endorsed by, or sponsored by Kaggle or Google LLC. It only reads public dataset metadata. You are responsible for complying with applicable laws, Kaggle's terms of service, and the per-dataset licenses when using the data downstream.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

Kaggle Dataset Scraper — Search, Metadata & Trending

openclawmara/kaggle-dataset-scraper

Scrape Kaggle datasets marketplace. Modes: search by keyword/tag, dataset details (owner, license, file list, size, votes, downloads), trending, and user profiles. Extracts titles, descriptions, updated dates, usability scores. Ideal for ML dataset discovery and competitive landscape research.

OpenClaw Mara

Kaggle Datasets Scraper

klondikeking/kaggle-datasets-scraper

Pierrick McD0nald

Kaggle Datasets Scraper - Dataset Search Data

benthepythondev/kaggle-datasets-scraper

Scrape Kaggle dataset search results: dataset titles, owners, subtitles, votes, usability scores and URLs.

Ben

Kaggle Scraper

muhammetakkurtt/kaggle-scraper

Efficiently extracts dataset information from Kaggle based on user-defined search terms. Collects datasets metadata, categories, usability ratings and file information. Customizable scraping depth. Ideal for researchers and data scientists seeking quick insights into Kaggle datasets.

Muhammet Akkurt

5.0

Kaggle Scraper

plantane/kaggle-scraper

Scrape datasets and competitions from Kaggle. List/search datasets by query with sorting options (hottest, most-voted, newest). List active or completed competitions (requires Kaggle API credentials). Uses the official Kaggle API.

Daniel

Kaggle Email Scraper - Advanced, Fast & Cheapest

contacts-api/kaggle-email-scraper-fast-advanced-and-cheapest

📊 Kaggle Email Scraper enables you to gather data scientist and organization emails from Kaggle profiles ⚡ Ideal for hiring and research 📧

Lead Heaven

Kaggle Scraper

lulzasaur/kaggle-scraper

Scrape Kaggle datasets, competitions, and notebooks. Get download counts, votes, tags, usability ratings, and metadata for ML and data science resources.

lulz bot

Kaggle Scraper

crawlerbros/kaggle-scraper

Scrape Kaggle datasets, competitions, notebooks, and user profiles. Datasets are open via the public API; competitions and notebooks need Kaggle API credentials.