Khan Academy Scraper avatar

Khan Academy Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Khan Academy Scraper

Khan Academy Scraper

Scrape Khan Academy with free CC-licensed educational content. Search by keyword, fetch by path/URL/subject, list all courses, look up videos by YouTube ID. Returns videos with download URLs, articles, exercises, courses and units.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 days ago

Last modified

Share

Scrape free, CC-licensed educational content from Khan Academy. The scraper exposes search, the full topic browser, by-subject course listings, direct path/URL lookups for videos / articles / exercises / courses, and a YouTube-ID lookup for any video Khan Academy publishes.

Khan Academy is a freely accessible educational platform — no login or API key required. The scraper uses Khan Academy's own public GraphQL endpoints (the same ones the website calls) and returns clean structured records with rich video metadata (download URLs, subtitles, durations) for downstream use in research, content discovery, dataset building, or curriculum tools.

Features

  • Search — full-text search across videos, articles, exercises and topics
  • Topic browser — list every course Khan Academy offers, grouped by category
  • By subject — list all courses under a major root subject (Math, Science, Computing, Test Prep, etc.)
  • By path / URL — fetch a single course, video, article or exercise directly
  • By YouTube video ID — look up any Khan Academy-published video by its YouTube ID
  • Optional unit expansion — emit a record per course-unit alongside the course record
  • Filtering — restrict by subject, content kind, duration window, or keyword in title/description

Use cases

  • Educators: build curriculum playlists, find aligned exercises and articles for a topic
  • Researchers: assemble open educational resource (OER) datasets, all CC-licensed
  • Content creators: discover Khan Academy videos for a topic; pull download URLs and metadata
  • EdTech: map a topic graph, mirror lesson content into your platform
  • Translators / accessibility tools: detect subtitle availability per language

Input

The actor accepts a single input object. Required field: mode.

FieldTypeDescription
modeenumOne of search, byPaths, byUrls, bySubject, topicBrowser, byVideoIds
searchQuerystringFree-text query (mode=search)
subjectenumRoot subject for mode=bySubject — math, science, computing, humanities, test-prep, ela, economics-finance-domain, partner-content, college-careers-more, khan-for-educators, ai-activities
subjectsenum[]Restrict mode=search or mode=topicBrowser to these subjects
contentKindsenum[]Restrict mode=search to specific content kinds: Video, Article, Exercise, Topic
pathsstring[]Khan Academy content paths (e.g. math/algebra)
urlsstring[]Full khanacademy.org URLs
videoIdsstring[]YouTube IDs to resolve back into KA videos
containsKeywordstringDrop records whose title/description/keywords do not contain this string (case-insensitive)
minDurationSecondsintegerDrop videos shorter than this
maxDurationSecondsintegerDrop videos longer than this
includeUnitsbooleanFor course paths/URLs: emit one record per unit in addition to the course record
maxItemsintegerHard cap on emitted records (default 50, max 5000)

Example: list all courses under Math

{
"mode": "bySubject",
"subject": "math",
"maxItems": 50
}

Example: search videos about photosynthesis

{
"mode": "search",
"searchQuery": "photosynthesis",
"contentKinds": ["Video"],
"maxItems": 25
}

Example: fetch a specific course + its units

{
"mode": "byUrls",
"urls": ["https://www.khanacademy.org/math/algebra"],
"includeUnits": true
}

Example: full topic-browser tree

{
"mode": "topicBrowser",
"maxItems": 500
}

Output

Each record is pushed to the default dataset. Fields are emitted only when populated (no nulls). Common fields across record types:

  • id — Khan Academy content ID (e.g. 19647488, x2f8bb11595b61c86)
  • slug — URL-safe slug
  • kind — One of Video, Article, Exercise, Topic, Course, Unit, Project
  • title — Translated title
  • description — HTML-stripped description
  • url — Direct Khan Academy URL
  • subject — Primary root subject slug
  • recordType — Stable record-type label (video, article, exercise, course, unit, etc.)
  • scrapedAt — UTC ISO timestamp

Video-specific:

  • youtubeId + youtubeUrl
  • durationSeconds
  • thumbnailUrl
  • downloadUrls{m3u8, mp4, mp4-low, mp4-low-ios, png} direct CDN URLs
  • subtitleLanguages — language codes with translated subtitles
  • authorNames, keywords, dateAdded, license, language, educationalLevel

Course-specific:

  • unitCount, lessonCount, masteryEnabled, curriculumKey, iconUrl

Sample video record

{
"id": "19647488",
"slug": "negative-numbers-introduction",
"kind": "Video",
"title": "Intro to negative numbers",
"description": "Mysterious negative numbers! What ARE they? ...",
"youtubeId": "Hlal9ME2Aig",
"youtubeUrl": "https://www.youtube.com/watch?v=Hlal9ME2Aig",
"durationSeconds": 576,
"thumbnailUrl": "https://cdn.kastatic.org/googleusercontent/...",
"authorNames": ["Sal Khan"],
"downloadUrls": {
"m3u8": "https://cdn.kastatic.org/ka-youtube-converted/Hlal9ME2Aig.m3u8/Hlal9ME2Aig.m3u8",
"mp4": "https://cdn.kastatic.org/ka-youtube-converted/Hlal9ME2Aig.mp4/Hlal9ME2Aig.mp4"
},
"dateAdded": "2011-02-20T16:51:16Z",
"language": "en",
"license": "cc-by-nc-sa",
"url": "https://www.khanacademy.org/math/arithmetic-home/negative-numbers/neg-num-intro/v/negative-numbers-introduction",
"subject": "math",
"recordType": "video",
"scrapedAt": "2026-05-21T09:17:47Z"
}

Data source

Khan Academy is a non-profit educational organization that provides free, CC-licensed (CC-BY-NC-SA) lessons in math, science, the arts, computer programming, economics, and more.

This scraper hits Khan Academy's public GraphQL endpoints — the same ones used by their website. No login, cookies or API key are required, and the actor runs on the free Apify plan without any paid proxy add-ons.

FAQ

Does this require login or an API key? No. Khan Academy's content is freely accessible. The scraper uses public endpoints with no authentication.

Do I need to provide a proxy? No. The scraper works from datacenter IPs out-of-the-box.

What is the license of the returned data? Khan Academy content is licensed under CC-BY-NC-SA 3.0 unless otherwise noted. You can reuse it for non-commercial purposes with attribution. Verify the license field on each record.

Can I get transcripts? Khan Academy stores subtitle files per language; the actor exposes the available subtitle languages in subtitleLanguages. To fetch the actual transcript bodies, follow up with the YouTube ID against the YouTube transcript API or KA's subtitles endpoint.

Why are some titles different from what I see on the site? Khan Academy localizes content per visitor region. The actor uses the en/US locale by default.

What if Khan Academy updates their GraphQL schema? The cacheable persisted queries are tied to a publish version (pcv) that the actor fetches dynamically. Search queries are sent as fully inlined GraphQL strings so they survive schema bumps that don't change field shape.

Can I run this on the free Apify plan? Yes. No proxy, no add-ons, no user-supplied credentials needed.