Khan Academy Scraper
Pricing
from $3.00 / 1,000 results
Khan Academy Scraper
Scrape Khan Academy with free CC-licensed educational content. Search by keyword, fetch by path/URL/subject, list all courses, look up videos by YouTube ID. Returns videos with download URLs, articles, exercises, courses and units.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
17 days ago
Last modified
Categories
Share
Scrape free, CC-licensed educational content from Khan Academy. The scraper exposes search, the full topic browser, by-subject course listings, direct path/URL lookups for videos / articles / exercises / courses, and a YouTube-ID lookup for any video Khan Academy publishes.
Khan Academy is a freely accessible educational platform — no login or API key required. The scraper uses Khan Academy's own public GraphQL endpoints (the same ones the website calls) and returns clean structured records with rich video metadata (download URLs, subtitles, durations) for downstream use in research, content discovery, dataset building, or curriculum tools.
Features
- Search — full-text search across videos, articles, exercises and topics
- Topic browser — list every course Khan Academy offers, grouped by category
- By subject — list all courses under a major root subject (Math, Science, Computing, Test Prep, etc.)
- By path / URL — fetch a single course, video, article or exercise directly
- By YouTube video ID — look up any Khan Academy-published video by its YouTube ID
- Optional unit expansion — emit a record per course-unit alongside the course record
- Filtering — restrict by subject, content kind, duration window, or keyword in title/description
Use cases
- Educators: build curriculum playlists, find aligned exercises and articles for a topic
- Researchers: assemble open educational resource (OER) datasets, all CC-licensed
- Content creators: discover Khan Academy videos for a topic; pull download URLs and metadata
- EdTech: map a topic graph, mirror lesson content into your platform
- Translators / accessibility tools: detect subtitle availability per language
Input
The actor accepts a single input object. Required field: mode.
| Field | Type | Description |
|---|---|---|
mode | enum | One of search, byPaths, byUrls, bySubject, topicBrowser, byVideoIds |
searchQuery | string | Free-text query (mode=search) |
subject | enum | Root subject for mode=bySubject — math, science, computing, humanities, test-prep, ela, economics-finance-domain, partner-content, college-careers-more, khan-for-educators, ai-activities |
subjects | enum[] | Restrict mode=search or mode=topicBrowser to these subjects |
contentKinds | enum[] | Restrict mode=search to specific content kinds: Video, Article, Exercise, Topic |
paths | string[] | Khan Academy content paths (e.g. math/algebra) |
urls | string[] | Full khanacademy.org URLs |
videoIds | string[] | YouTube IDs to resolve back into KA videos |
containsKeyword | string | Drop records whose title/description/keywords do not contain this string (case-insensitive) |
minDurationSeconds | integer | Drop videos shorter than this |
maxDurationSeconds | integer | Drop videos longer than this |
includeUnits | boolean | For course paths/URLs: emit one record per unit in addition to the course record |
maxItems | integer | Hard cap on emitted records (default 50, max 5000) |
Example: list all courses under Math
{"mode": "bySubject","subject": "math","maxItems": 50}
Example: search videos about photosynthesis
{"mode": "search","searchQuery": "photosynthesis","contentKinds": ["Video"],"maxItems": 25}
Example: fetch a specific course + its units
{"mode": "byUrls","urls": ["https://www.khanacademy.org/math/algebra"],"includeUnits": true}
Example: full topic-browser tree
{"mode": "topicBrowser","maxItems": 500}
Output
Each record is pushed to the default dataset. Fields are emitted only when populated (no nulls). Common fields across record types:
id— Khan Academy content ID (e.g.19647488,x2f8bb11595b61c86)slug— URL-safe slugkind— One ofVideo,Article,Exercise,Topic,Course,Unit,Projecttitle— Translated titledescription— HTML-stripped descriptionurl— Direct Khan Academy URLsubject— Primary root subject slugrecordType— Stable record-type label (video,article,exercise,course,unit, etc.)scrapedAt— UTC ISO timestamp
Video-specific:
youtubeId+youtubeUrldurationSecondsthumbnailUrldownloadUrls—{m3u8, mp4, mp4-low, mp4-low-ios, png}direct CDN URLssubtitleLanguages— language codes with translated subtitlesauthorNames,keywords,dateAdded,license,language,educationalLevel
Course-specific:
unitCount,lessonCount,masteryEnabled,curriculumKey,iconUrl
Sample video record
{"id": "19647488","slug": "negative-numbers-introduction","kind": "Video","title": "Intro to negative numbers","description": "Mysterious negative numbers! What ARE they? ...","youtubeId": "Hlal9ME2Aig","youtubeUrl": "https://www.youtube.com/watch?v=Hlal9ME2Aig","durationSeconds": 576,"thumbnailUrl": "https://cdn.kastatic.org/googleusercontent/...","authorNames": ["Sal Khan"],"downloadUrls": {"m3u8": "https://cdn.kastatic.org/ka-youtube-converted/Hlal9ME2Aig.m3u8/Hlal9ME2Aig.m3u8","mp4": "https://cdn.kastatic.org/ka-youtube-converted/Hlal9ME2Aig.mp4/Hlal9ME2Aig.mp4"},"dateAdded": "2011-02-20T16:51:16Z","language": "en","license": "cc-by-nc-sa","url": "https://www.khanacademy.org/math/arithmetic-home/negative-numbers/neg-num-intro/v/negative-numbers-introduction","subject": "math","recordType": "video","scrapedAt": "2026-05-21T09:17:47Z"}
Data source
Khan Academy is a non-profit educational organization that provides free, CC-licensed (CC-BY-NC-SA) lessons in math, science, the arts, computer programming, economics, and more.
This scraper hits Khan Academy's public GraphQL endpoints — the same ones used by their website. No login, cookies or API key are required, and the actor runs on the free Apify plan without any paid proxy add-ons.
FAQ
Does this require login or an API key? No. Khan Academy's content is freely accessible. The scraper uses public endpoints with no authentication.
Do I need to provide a proxy? No. The scraper works from datacenter IPs out-of-the-box.
What is the license of the returned data?
Khan Academy content is licensed under CC-BY-NC-SA 3.0 unless otherwise noted. You can reuse it for non-commercial purposes with attribution. Verify the license field on each record.
Can I get transcripts?
Khan Academy stores subtitle files per language; the actor exposes the available subtitle languages in subtitleLanguages. To fetch the actual transcript bodies, follow up with the YouTube ID against the YouTube transcript API or KA's subtitles endpoint.
Why are some titles different from what I see on the site? Khan Academy localizes content per visitor region. The actor uses the en/US locale by default.
What if Khan Academy updates their GraphQL schema? The cacheable persisted queries are tied to a publish version (pcv) that the actor fetches dynamically. Search queries are sent as fully inlined GraphQL strings so they survive schema bumps that don't change field shape.
Can I run this on the free Apify plan? Yes. No proxy, no add-ons, no user-supplied credentials needed.