Pricing

Pay per event

TED Talks Transcript Scraper

Extracts full transcripts from TED.com talks in any available language. Returns timed segments (JSON), plain text, SRT, and WebVTT formats alongside speaker metadata, tags, and multi-language availability.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

19 days ago

Last modified

TED Talks Transcript Scraper Features

Extracts complete transcripts with millisecond-accurate timing (427 cues for an average talk)
Returns four formats per transcript: JSON segments, plain text, SRT, and WebVTT — most actors pick one format and call it done
Collects speaker name, role, full bio, event name, recorded date, duration, view count, and topic tags alongside the transcript
Reports all available language codes so you can plan multi-language runs
Fetches only the native language by default, or every translation the talk has, or a specific list you provide
Accepts custom start URLs for targeted scraping of individual talks
Discovers all talks automatically via TED's year-by-year sitemap index when no URLs are given
No proxy required — TED serves transcripts publicly, no auth or Cloudflare management involved

What Can You Do With TED Transcript Data?

NLP researchers — Build or extend corpora for text classification, summarization, or speaker style analysis; TED-LIUM is a standard benchmark, and this actor gives you fresh slices of it
Language-learning app developers — Pull parallel transcripts (English audio + Japanese subtitles) for aligned bilingual reading and listening exercises
AI training teams — Collect multi-speaker, multi-language text at scale; TED's volunteer-translated transcripts cover 100+ languages with consistent quality
Public speaking coaches — Analyze rhetorical structure, pacing cues, and paragraph breaks across thousands of talks
Translation quality researchers — Compare the same content across 60+ language variants for benchmarking MT and human translation output
Educators and content curators — Build searchable archives of transcript text with metadata for curriculum alignment or topic discovery

How TED Talks Transcript Scraper Works

Seed the run. If you provide startUrls, those talks are processed directly. Otherwise the scraper walks TED's year-by-year sitemap index (2006–2025) and collects every talk URL up to your maxItems budget.
For each talk, the scraper fetches the transcript page HTML and parses the embedded __NEXT_DATA__ JSON blob. This yields the numeric talk ID, speaker details, event name, dates, view count, tags, and the full list of available language codes.
Using the language list, the scraper calls TED's public subtitles API — one request per language — and retrieves millisecond-timed caption cues.
The cues are assembled into four transcript formats, merged with the talk metadata, and saved as one dataset record per language.

Input

{
  "maxItems": 15,
  "startUrls": [
    { "url": "https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity" }
  ],
  "languages": ["en", "ja"],
  "fetchAllLanguages": false
}

Field	Type	Default	Description
`maxItems`	integer	15	Maximum transcript records to save. One record = one talk × one language.
`startUrls`	array	—	Specific TED talk URLs to scrape. When empty, the scraper discovers talks from the sitemap.
`languages`	array	—	ISO 639-1 codes to fetch (e.g. `["en", "ja", "es"]`). Leave empty for the talk's native language only.
`fetchAllLanguages`	boolean	false	When true, fetches every available translation for each talk. Overrides `languages`.

Fetch all languages for a single talk:

{
  "maxItems": 100,
  "startUrls": [
    { "url": "https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity" }
  ],
  "fetchAllLanguages": true
}

Ken Robinson's talk has 64 language translations — that input produces 64 records.

TED Talks Transcript Scraper Output Fields

{
  "talk_id": "66",
  "slug": "sir_ken_robinson_do_schools_kill_creativity",
  "title": "Do schools kill creativity?",
  "speaker_name": "Sir Ken Robinson",
  "speaker_role": "Author, educator",
  "speaker_bio": "Creativity expert Sir Ken Robinson challenged the way we educate our children...",
  "event": "TED2006",
  "recorded_date": "2006-02-25",
  "published_date": "2006-06-27T00:11:00.000Z",
  "duration_seconds": 1148,
  "language": "en",
  "language_name": "English",
  "tags": "culture, education, creativity, dance, parenting, teaching, kids",
  "description": "Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system...",
  "view_count": 80149052,
  "thumbnail_url": "https://pi.tedcdn.com/r/pe.tedcdn.com/...",
  "canonical_url": "https://www.ted.com/talks/sir_ken_robinson_do_schools_kill_creativity",
  "available_languages": "pt-br, el, eo, en, vi, ca, it, sv, cs, ar, ...",
  "transcript_plain": "Good morning. How are you? (Audience) Good. It's been great, hasn't it?...",
  "transcript_srt": "1\n00:00:02,103 --> 00:00:04,678\nGood morning. How are you?\n\n2\n...",
  "transcript_vtt": "WEBVTT\n\n1\n00:00:02.103 --> 00:00:04.678\nGood morning. How are you?\n\n2\n...",
  "transcript_segments": "[{\"start_ms\":2103,\"duration_ms\":2575,\"text\":\"Good morning. How are you?\",\"start_of_paragraph\":true},...]"
}

Field	Type	Description
`talk_id`	string	Numeric TED talk ID
`slug`	string	Canonical URL slug
`title`	string	Talk title in English
`speaker_name`	string	Speaker display name
`speaker_role`	string	One-line speaker description
`speaker_bio`	string	Full speaker biography
`event`	string	Event where the talk was given (e.g. TED2006, TEDxBoston)
`recorded_date`	string	Recording date (YYYY-MM-DD)
`published_date`	string	Publication date (ISO 8601)
`duration_seconds`	number	Talk duration in seconds
`language`	string	ISO 639-1 code for this transcript
`language_name`	string	Full language name in English
`tags`	string	Comma-separated TED topic tags
`description`	string	Talk abstract
`view_count`	number	Total view count across platforms
`thumbnail_url`	string	Talk thumbnail image URL
`canonical_url`	string	Canonical TED.com URL
`available_languages`	string	Comma-separated codes of all available translations
`transcript_plain`	string	Full transcript as plain text
`transcript_srt`	string	Transcript in SRT subtitle format
`transcript_vtt`	string	Transcript in WebVTT format
`transcript_segments`	string	JSON-serialized timed cue array: `[{start_ms, duration_ms, text, start_of_paragraph}]`

🔍 FAQ

How do I scrape TED talk transcripts?

TED Talks Transcript Scraper handles discovery automatically. Provide a startUrls list for specific talks or leave it empty to pull from the sitemap. Set maxItems to cap the output, then run.

How much does TED Talks Transcript Scraper cost to run?

TED Talks Transcript Scraper charges $0.003 per transcript record (one talk × one language) plus a small platform start fee. Fetching the English transcript for 100 talks costs roughly $0.30.

Can I get transcripts in multiple languages?

Yes. Set fetchAllLanguages: true to retrieve every translation for each talk, or pass a languages array with specific ISO 639-1 codes. A popular talk like Ken Robinson's "Do Schools Kill Creativity?" has 64 language variants.

Does TED Talks Transcript Scraper need proxies?

No. TED publishes transcripts publicly — no authentication, no Cloudflare challenge, no residential proxy required. The scraper runs on standard infrastructure at a courteous pace.

What format do the timed segments come in?

Each record includes transcript_segments as a JSON string containing an array of cue objects: {start_ms, duration_ms, text, start_of_paragraph}. Timing is in milliseconds, matching TED's source data. SRT and VTT formats are derived from the same cue data.

Are transcripts available for all TED talks?

Most established talks have English transcripts. Translations depend on TED's volunteer community — popular talks often have 50+ languages, while talks published in the last few months may have none yet. The scraper logs a warning and skips talks with no available transcripts for the requested language.

Need More Features?

Need filtering by event, speaker, or topic? Custom language combinations? File an issue or get in touch.

Why Use TED Talks Transcript Scraper?

Four formats, one run — plain text, SRT, WebVTT, and timestamped JSON segments in a single record; most alternatives force you to choose one and convert the rest yourself
Multi-language by design — fetch all 64+ translations of a talk with a single flag, which is the part that makes this corpus useful for NLP alignment work
No setup required — public access, no API keys, no proxies, sitemap-driven discovery out of the box

TED Talks Scraper

crawlerbros/ted-talks-scraper

Scrape TED.com talks with title, speaker, duration, view count, publish/record dates, topics, language, description, thumbnail. Two modes: fetch specific talks by URL/slug, or browse all talks in a topic. Pure HTTP, no auth needed.

Crawler Bros

TED Talk Transcript Scraper — TXT, SRT & VTT (No Login)

scrapersdelight/ted-transcript-scraper

Extract any TED Talk's transcript via TED's own public API — no login, no ASR. Full text, timestamped segments & SRT/VTT in any available language, plus speaker, views, topics and TED's AI takeaway. Point it at talk URLs or a topic/speaker page. $2 per 1,000 talks.

Scrapers Delight

TikTok Transcript Scraper

crawlerbros/tiktok-transcript-scraper

Extract transcripts and subtitles from TikTok videos in all available languages. Returns timestamped segments plus full plain-text transcript per language.

Crawler Bros

160

YouTube Transcript & Subtitle Scraper

abotapi/youtube-transcript-scraper

Extract transcripts and subtitles from YouTube videos in bulk using video, playlist, channel URLs, or keyword search. Returns timed transcript segments, plain text, SRT, and WebVTT subtitle files, with optional auto-translation to other languages.

Abot API

YouTube Transcript Scraper – JSON, SRT, VTT, Plain Text

scraperhive/youtube-transcript-scraper

Extract YouTube video transcripts, subtitles, and captions in multiple formats with precise timestamps. Plain Text · JSON · SRT · WebVTT · 20+ Languages · Batch Processing · Auto + Manual Captions

Mubeen Ali

5.0

Youtube Transcript Scraper

vero-api/youtube-transcript-scraper

Extract transcripts and subtitles from any YouTube video. Returns clean full text plus timestamped segments, optional SRT/WebVTT subtitles, translation to any language, and video details. No API key, no rate limits.

Veronica

YouTube Transcript Scraper

shanks0x0/youtube-transcript-scraper

Extracts full transcripts and metadata from YouTube videos. Supports single videos, channels, and playlists — returns timestamped segments, plain text, SRT, or VTT with video title, channel name, duration, and language info. No API key or proxy needed.

Meherab Hossain

YouTube Transcript Scraper

agilevendor/youtube-transcript-scraper

Extract transcripts and subtitles from any public YouTube video, playlist, or channel. Get plain text, timestamped segments, and ready-to-use SRT and VTT files in one run — plus title, channel, and language. Bulk playlists and channels, language selection. You only pay for successful transcripts.

Agilevendor

YouTube Transcript Scraper - Bulk + Multi-language

dltik/youtube-transcript-scraper

Extract YouTube transcripts in bulk: any public video, manual + auto-generated captions, multi-language fallback. Outputs full text + segments with timestamps. HTTP-only, no API key. Pay $0.005/transcript.