Kaggle Dataset Scraper avatar

Kaggle Dataset Scraper

Pricing

Pay per usage

Go to Apify Store
Kaggle Dataset Scraper

Kaggle Dataset Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Donny Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrape Kaggle to find and extract metadata about datasets including names, descriptions, download counts, vote counts, file sizes, and usability scores. Search by keyword and sort results by popularity, recency, or hotness.

What does Kaggle Dataset Scraper do?

This scraper searches Kaggle for datasets matching your queries and extracts structured information from the search results. It collects comprehensive metadata about each dataset including author information, download statistics, vote counts, and usability ratings. Perfect for discovering relevant datasets for your data science projects.

Why scrape Kaggle datasets?

Kaggle hosts over 200,000 public datasets used by millions of data scientists worldwide. Finding the right dataset for your project often requires searching through many options. This actor automates the discovery process by collecting and structuring dataset metadata so you can quickly compare and evaluate options.

Input Configuration

  • Search Queries: List of search terms to find datasets (e.g. "natural language processing", "time series")
  • Sort By: How to order results - by votes, last updated, or hotness
  • Max Datasets: Maximum number of datasets to collect

Output Data

Each scraped dataset includes:

  • query: The search query that found this dataset
  • name: Dataset title
  • description: Brief description of the dataset
  • size: Total file size
  • downloadCount: Number of times downloaded
  • voteCount: Community upvotes
  • author: Dataset creator username
  • license: License type
  • usabilityRating: Kaggle usability score
  • url: Direct link to the dataset page
  • scrapedAt: Collection timestamp

Use Cases

  • Dataset discovery for machine learning projects
  • Benchmarking popular datasets in specific domains
  • Research on data availability for NLP, computer vision, etc.
  • Competitive analysis of dataset popularity trends
  • Building curated dataset collections

Integrations

Export results to Google Sheets or Airtable for team collaboration. Connect to Slack for notifications about trending datasets. Schedule weekly runs to discover newly published datasets in your area of interest.

Tips for Best Results

Use specific search queries that match the domain you are interested in. Sort by votes to find the most popular and well-regarded datasets. Use the maxDatasets parameter to control the scope of your search.

Built with Crawlee and the Apify SDK. See more scrapers by consummate_mandala on Apify Store.