Kaggle Scraper
Pricing
from $3.00 / 1,000 results
Kaggle Scraper
Scrape Kaggle datasets, competitions, notebooks, and user profiles. Datasets are open via the public API; competitions and notebooks need Kaggle API credentials.
Pricing
from $3.00 / 1,000 results
Rating
5.0
(17)
Developer
Crawler Bros
Maintained by CommunityActor stats
17
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape Kaggle — the world's largest data-science community. Search public datasets, fetch by ref or URL, browse trending datasets, list a user's datasets, and (with API credentials) pull competitions and notebooks (kernels). Pure HTTP via the official Kaggle public API at kaggle.com/api/v1/*.
What this actor does
- 8 modes:
search,byDataset,byCompetition,byNotebook,byUser,trendingDatasets,trendingNotebooks,byUrl - Two auth tiers:
- Public (no auth): datasets search/list/view, byUser, trendingDatasets, byUrl for datasets/users
- Auth required: competitions, notebooks (kernels), trendingNotebooks
- Filters: owner, sort order, file type, license family, min votes, min downloads, min usability, max size
- URL auto-detection: paste any
kaggle.com/datasets/<owner>/<slug>,/competitions/<slug>,/code/<owner>/<slug>, or user URL - Empty fields are omitted — every record only contains populated fields
Output
Each record is a flat dict. Field names you might see (omit-empty applies):
Common
recordType—dataset/competition/kernel/userref— Kaggle reference (e.g.heptapod/titanic)scrapedAt
Dataset
datasetId,title,subtitle,descriptionownerName,ownerRef,creatorName,creatorUrllicenseName,lastUpdatedtotalBytes,downloadCount,voteCount,viewCount,kernelCountcurrentVersionNumber,usabilityRatingisPrivate,isFeatured,thumbnailImageUrltags[],files[],fileCountdatasetUrl
Competition
competitionId,title,description,categoryorganizationName,organizationRef,tagsdeadline,enabledDate,evaluationMetricrewardType,rewardQuantity,teamCountsubmissionsDisabled,isKernelsSubmissionsOnlycompetitionUrl
Kernel (notebook)
kernelId,title,author,language,kernelTypelastRunTime,totalVotes,totalViews,totalCommentskernelUrl
User
username,displayName,profileUrltotalDatasetsListed
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | enum | search | One of the 8 modes |
searchQuery | string | titanic | Free-text query |
datasetRefs | array | – | owner/slug refs (mode=byDataset) |
competitionRefs | array | – | Competition slugs (mode=byCompetition, auth) |
kernelRefs | array | – | owner/slug refs (mode=byNotebook, auth) |
userSlugs | array | – | Usernames (mode=byUser) |
startUrls | array | – | Kaggle URLs (mode=byUrl) |
ownerSlug | string | – | Filter to user/org |
sortBy | enum | hottest | hottest / votes / updated / active / published |
fileType | enum | all | all / csv / sqlite / json / bigQuery |
licenseGroup | enum | all | all / cc / gpl / odb / other |
minVotes | integer | – | Drop below this vote count |
minDownloads | integer | – | Drop below this download count |
minUsability | integer | – | Drop below this usability rating |
maxSizeBytes | integer | – | Drop datasets larger than this |
kernelSortBy | enum | hotness | Notebook sort key (auth modes) |
kernelLanguage | enum | all | Notebook language (auth modes) |
kernelType | enum | all | script / notebook (auth modes) |
kaggleUsername | string | – | Required for competition / notebook modes |
kaggleApiKey | string (secret) | – | Required for competition / notebook modes |
maxItems | integer | 50 | Hard cap (1–10000) |
Examples
Search top Titanic datasets
{"mode": "search","searchQuery": "titanic","sortBy": "votes","maxItems": 25}
Trending CSV datasets with high usability
{"mode": "trendingDatasets","fileType": "csv","minUsability": 0.8,"maxItems": 50}
Lookup a specific dataset
{"mode": "byDataset","datasetRefs": ["heptapod/titanic"]}
Browse a user's datasets
{"mode": "byUser","userSlugs": ["heptapod"]}
Lookup by URL (auto-detect)
{"mode": "byUrl","startUrls": ["https://www.kaggle.com/datasets/heptapod/titanic","https://www.kaggle.com/heptapod"]}
Competition lookup (auth required)
{"mode": "byCompetition","competitionRefs": ["titanic"],"kaggleUsername": "your-username","kaggleApiKey": "your-api-key"}
How to get Kaggle API credentials
- Sign in to kaggle.com.
- Go to Account settings → "API" → "Create New Token".
- A
kaggle.jsonfile downloads. Use theusernameandkeyfields here askaggleUsernameandkaggleApiKey.
You only need credentials for byCompetition, byNotebook, and trendingNotebooks modes. All dataset modes work without auth.
Reliability
- Direct calls to the official
kaggle.com/api/v1/*endpoints - Exponential backoff retries on
429,500–504 - HTML 404 fallback handling (Kaggle redirects unknown refs to a 404 HTML page)
- No proxy needed — works from datacenter IPs
Limitations
- The Kaggle public API exposes user info indirectly;
byUserrecords are derived from the user's first listed datasets and contain onlyusername,displayName, and a count of listed datasets. - Competitions, notebooks (kernels), and trending notebooks all require Kaggle API credentials — these are private endpoints (return
401 Unauthenticatedwithout auth). - The license filter passes one of 5 broad families (
cc/gpl/odb/other/all); finer-grained licenses likecc-by-sa-4.0are returned in the output'slicenseNamefield but cannot be filtered server-side. - Single-version datasets only — version history is not enumerated.
FAQ
Do I need a Kaggle account? Only for competitions / notebooks. Dataset search and lookup work anonymously.
How fresh is the data? Real-time — every run hits the live Kaggle API.
Can I download dataset files? No. This actor exposes Kaggle metadata — refs, file lists, vote / download counts, license, etc. To download files, use the Kaggle CLI with the ref from this actor's output.
Why are some fields missing? Empty / null fields are omitted — only populated fields appear in the output.
Why does the daily test run only return datasets? The default prefill targets dataset search, which is the only mode that works without credentials. Once you provide kaggleUsername + kaggleApiKey, all 8 modes are available.