Pricing

from $3.00 / 1,000 results

Kaggle Scraper

Scrape Kaggle datasets, competitions, notebooks, and user profiles. Datasets are open via the public API; competitions and notebooks need Kaggle API credentials.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What this actor does

8 modes: search, byDataset, byCompetition, byNotebook, byUser, trendingDatasets, trendingNotebooks, byUrl
Two auth tiers:
- Public (no auth): datasets search/list/view, byUser, trendingDatasets, byUrl for datasets/users
- Auth required: competitions, notebooks (kernels), trendingNotebooks
Filters: owner, sort order, file type, license family, min votes, min downloads, min usability, max size
URL auto-detection: paste any kaggle.com/datasets/<owner>/<slug>, /competitions/<slug>, /code/<owner>/<slug>, or user URL
Empty fields are omitted — every record only contains populated fields

Output

Each record is a flat dict. Field names you might see (omit-empty applies):

Common

recordType — dataset / competition / kernel / user
ref — Kaggle reference (e.g. heptapod/titanic)
scrapedAt

Dataset

datasetId, title, subtitle, description
ownerName, ownerRef, creatorName, creatorUrl
licenseName, lastUpdated
totalBytes, downloadCount, voteCount, viewCount, kernelCount
currentVersionNumber, usabilityRating
isPrivate, isFeatured, thumbnailImageUrl
tags[], files[], fileCount
datasetUrl

Competition

competitionId, title, description, category
organizationName, organizationRef, tags
deadline, enabledDate, evaluationMetric
rewardType, rewardQuantity, teamCount
submissionsDisabled, isKernelsSubmissionsOnly
competitionUrl

Kernel (notebook)

kernelId, title, author, language, kernelType
lastRunTime, totalVotes, totalViews, totalComments
kernelUrl

User

username, displayName, profileUrl
totalDatasetsListed

Input

Field	Type	Default	Description
`mode`	enum	`search`	One of the 8 modes
`searchQuery`	string	`titanic`	Free-text query
`datasetRefs`	array	–	`owner/slug` refs (mode=byDataset)
`competitionRefs`	array	–	Competition slugs (mode=byCompetition, auth)
`kernelRefs`	array	–	`owner/slug` refs (mode=byNotebook, auth)
`userSlugs`	array	–	Usernames (mode=byUser)
`startUrls`	array	–	Kaggle URLs (mode=byUrl)
`ownerSlug`	string	–	Filter to user/org
`sortBy`	enum	`hottest`	`hottest` / `votes` / `updated` / `active` / `published`
`fileType`	enum	`all`	`all` / `csv` / `sqlite` / `json` / `bigQuery`
`licenseGroup`	enum	`all`	`all` / `cc` / `gpl` / `odb` / `other`
`minVotes`	integer	–	Drop below this vote count
`minDownloads`	integer	–	Drop below this download count
`minUsability`	integer	–	Drop below this usability rating
`maxSizeBytes`	integer	–	Drop datasets larger than this
`kernelSortBy`	enum	`hotness`	Notebook sort key (auth modes)
`kernelLanguage`	enum	`all`	Notebook language (auth modes)
`kernelType`	enum	`all`	`script` / `notebook` (auth modes)
`kaggleUsername`	string	–	Required for competition / notebook modes
`kaggleApiKey`	string (secret)	–	Required for competition / notebook modes
`maxItems`	integer	`50`	Hard cap (1–10000)

Examples

Search top Titanic datasets

{
  "mode": "search",
  "searchQuery": "titanic",
  "sortBy": "votes",
  "maxItems": 25
}

{
  "mode": "trendingDatasets",
  "fileType": "csv",
  "minUsability": 0.8,
  "maxItems": 50
}

Lookup a specific dataset

{
  "mode": "byDataset",
  "datasetRefs": ["heptapod/titanic"]
}

Browse a user's datasets

{
  "mode": "byUser",
  "userSlugs": ["heptapod"]
}

Lookup by URL (auto-detect)

{
  "mode": "byUrl",
  "startUrls": [
    "https://www.kaggle.com/datasets/heptapod/titanic",
    "https://www.kaggle.com/heptapod"
  ]
}

Competition lookup (auth required)

{
  "mode": "byCompetition",
  "competitionRefs": ["titanic"],
  "kaggleUsername": "your-username",
  "kaggleApiKey": "your-api-key"
}

How to get Kaggle API credentials

Sign in to kaggle.com.
Go to Account settings → "API" → "Create New Token".
A kaggle.json file downloads. Use the username and key fields here as kaggleUsername and kaggleApiKey.

You only need credentials for byCompetition, byNotebook, and trendingNotebooks modes. All dataset modes work without auth.

Reliability

Direct calls to the official kaggle.com/api/v1/* endpoints
Exponential backoff retries on 429, 500–504
HTML 404 fallback handling (Kaggle redirects unknown refs to a 404 HTML page)
No proxy needed — works from datacenter IPs

Limitations

The Kaggle public API exposes user info indirectly; byUser records are derived from the user's first listed datasets and contain only username, displayName, and a count of listed datasets.
Competitions, notebooks (kernels), and trending notebooks all require Kaggle API credentials — these are private endpoints (return 401 Unauthenticated without auth).
The license filter passes one of 5 broad families (cc/gpl/odb/other/all); finer-grained licenses like cc-by-sa-4.0 are returned in the output's licenseName field but cannot be filtered server-side.
Single-version datasets only — version history is not enumerated.

FAQ

Do I need a Kaggle account? Only for competitions / notebooks. Dataset search and lookup work anonymously.

How fresh is the data? Real-time — every run hits the live Kaggle API.

Can I download dataset files? No. This actor exposes Kaggle metadata — refs, file lists, vote / download counts, license, etc. To download files, use the Kaggle CLI with the ref from this actor's output.

Why are some fields missing? Empty / null fields are omitted — only populated fields appear in the output.

Why does the daily test run only return datasets? The default prefill targets dataset search, which is the only mode that works without credentials. Once you provide kaggleUsername + kaggleApiKey, all 8 modes are available.

Kaggle Scraper

plantane/kaggle-scraper

Scrape datasets and competitions from Kaggle. List/search datasets by query with sorting options (hottest, most-voted, newest). List active or completed competitions (requires Kaggle API credentials). Uses the official Kaggle API.

Daniel

Kaggle Scraper

lulzasaur/kaggle-scraper

Scrape Kaggle datasets, competitions, and notebooks. Get download counts, votes, tags, usability ratings, and metadata for ML and data science resources.

lulz bot

Kaggle Datasets Scraper

klondikeking/kaggle-datasets-scraper

Pierrick McD0nald

Kaggle Email Scraper - Advanced, Fast & Cheapest

contacts-api/kaggle-email-scraper-fast-advanced-and-cheapest

📊 Kaggle Email Scraper enables you to gather data scientist and organization emails from Kaggle profiles ⚡ Ideal for hiring and research 📧

Lead Heaven

Kaggle Scraper

muhammetakkurtt/kaggle-scraper

Efficiently extracts dataset information from Kaggle based on user-defined search terms. Collects datasets metadata, categories, usability ratings and file information. Customizable scraping depth. Ideal for researchers and data scientists seeking quick insights into Kaggle datasets.

Muhammet Akkurt

5.0

Kaggle Datasets Scraper - Dataset Search Data

benthepythondev/kaggle-datasets-scraper

Scrape Kaggle dataset search results: dataset titles, owners, subtitles, votes, usability scores and URLs.

Ben

Kaggle Datasets Scraper

parseforge/kaggle-scraper

Extract Kaggle dataset metadata at scale: titles, owners, descriptions, tags, license, file types, sizes, downloads, views, and votes. Filter by search, tag, user, file type, or size.

ParseForge

Kaggle Dataset Scraper — Search, Metadata & Trending

openclawmara/kaggle-dataset-scraper

Scrape Kaggle datasets marketplace. Modes: search by keyword/tag, dataset details (owner, license, file list, size, votes, downloads), trending, and user profiles. Extracts titles, descriptions, updated dates, usability scores. Ideal for ML dataset discovery and competitive landscape research.

OpenClaw Mara

NotebookLM API - Export Notebooks, Sources & Citations

clearpath/notebooklm-api

Export your NotebookLM notebooks programmatically. Get conversations, source metadata with URLs, and citation mappings. Output to JSON, CSV, Markdown or Excel. Bulk export or select specific notebooks. Perfect for n8n workflows, RAG pipelines, or backups. Unlimited exports.

ClearPath

Scholarships, Competitions & Internships Extractor

saadithya/scholarships-competitions-internships-extractor

A production-ready Apify Actor that automatically extracts structured information from RSS feeds about scholarships, competitions, internships, and challenges across 11 different categories.