TED Talk Transcript Scraper — TXT, SRT & VTT (No Login) avatar

TED Talk Transcript Scraper — TXT, SRT & VTT (No Login)

Pricing

from $1.00 / 1,000 per record returneds

Go to Apify Store
TED Talk Transcript Scraper — TXT, SRT & VTT (No Login)

TED Talk Transcript Scraper — TXT, SRT & VTT (No Login)

Extract any TED Talk's transcript via TED's own public API — no login, no ASR. Full text, timestamped segments & SRT/VTT in any available language, plus speaker, views, topics and TED's AI takeaway. Point it at talk URLs or a topic/speaker page. $2 per 1,000 talks.

Pricing

from $1.00 / 1,000 per record returneds

Rating

0.0

(0)

Developer

Scrapers Delight

Scrapers Delight

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

🎤 TED Talk Transcript Scraper — TXT, SRT, VTT

Get any TED Talk's transcript instantly — no login, no AI transcription. TED publishes a transcript for every talk in dozens of languages, and this actor reads it straight from TED's own API: full text, timestamped segments, and ready-to-use SRT/VTT — plus the speaker, view count, topics, and TED's AI takeaway. Point it at talk URLs or a whole TED topic/speaker page.

Because the transcript already exists, there's no speech-to-text compute — it's fast and cheap.


What does it do?

For each TED talk you give it (by URL or harvested from a TED page), it returns:

  • 📝 Full transcript (plain text) — always included
  • ⏲️ Timestamped segments{start, end, text}
  • 🎬 SRT / VTT subtitles — ready for any editor
  • 🎤 Speaker, duration, view count, recorded/published dates
  • 🏷️ Topics + 💡 TED's AI takeaway headline
  • 🌍 Any available language

No ASR, no API key — it reads TED's published transcript.


What data does it extract?

For every talk:

  • 🆔 talk_id, slug, 🔗 url, 🏷️ title
  • 🎤 speaker, ⏱️ duration_sec, 👁️ views, 📅 recorded_on, published_at
  • 🏷️ topics[], 💡 takeaway_headline, 📝 description
  • 🌍 language, 📄 transcript, ⏲️ segments[], 🎬 srt, vtt, segment_count
  • is_new (monitor), 🕒 scraped_at

Who is it for?

  • ✍️ Writers & content teams repurposing talks into articles, quotes, and summaries.
  • 🤖 AI / RAG dataset builders assembling clean, multilingual speech text.
  • 🔎 Researchers & educators searching talk content and citing passages.
  • 🌍 Localization teams pulling transcripts across languages.

How to use it (step by step)

  1. Click Try for free.
  2. Paste one or more talk URLs (e.g. https://www.ted.com/talks/{slug}) — or a TED topic/speaker page URL.
  3. (Optional) set a language and extra formats (srt, vtt, segments).
  4. Click Start, then open the Dataset tab to view/export.
  5. (Optional) set monitorMode + a pageUrl + a Schedule to capture new talks automatically.

Quick start

{ "talkUrls": ["https://www.ted.com/talks/bill_gates_the_next_outbreak_we_re_not_ready"], "transcriptFormats": ["txt", "srt"] }

Input

FieldWhat it does
talkUrlsTED talk URLs / slugs
pageUrla TED topic/speaker/playlist page to harvest talk links from
languagetranscript language code (default en)
transcriptFormatstxt · segments · srt · vtt
includeTakeawaysadd topics, description, and TED's AI takeaway
maxTalkshard cap per run (0 = unlimited)
monitorMode, alertOnNewTalkrecurring new-talk watcher + alerts
webhookUrl, slackWebhookUrl, emailRecipientsalert channels
proxyConfiguration, requestConcurrencyproxy + parallelism

Output

Each talk is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.


How much does it cost?

Pay-per-event — and with no transcription compute, it's cheap:

EventWhat it coversSuggested price
lot-scrapedeach talk returned~$0.003 / talk
lot-detail-enrichedeach transcript fetched~$0.003 / talk
monitor-run-completedeach scheduled watch run~$0.05 / run
new-lot-detectedeach new talk~$0.02 / talk
alert-deliveredeach Slack/email/webhook push~$0.005 / alert

(Final per-event prices are set on the actor's pricing page.)


How does it work without AI transcription?

TED publishes a human/edited transcript for each talk, in many languages, and exposes it through a public API. This actor reads that existing transcript — it does not run speech-to-text, so there's no GPU/compute cost and results are instant.


TED talks and their transcripts are published publicly, and TED talks are generally released under a Creative Commons (BY–NC–ND) license. The output is talk content and public stats, not personal data. Scraping public data is generally legal, but you are responsible for your use — review TED's Terms of Service and the talks' Creative Commons license, and attribute/limit redistribution accordingly.


FAQ

Which languages? Whatever TED offers for the talk (often dozens). Set language; talks without that language are flagged.

Is there a Whisper/ASR step? No — it reads TED's own transcript, so it's fast and cheap.

Can I get subtitles? Yes — add srt and/or vtt to transcriptFormats.

Can I grab a whole topic or speaker's talks? Yes — set pageUrl to a TED topic/speaker page and the actor harvests the talk links. Add monitorMode to catch new talks.

How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.


Feedback

Want full-playlist crawling, speaker bios, or another language default? Open an issue on the actor.