MIT OpenCourseWare Lecture Transcript Scraper avatar

MIT OpenCourseWare Lecture Transcript Scraper

Pricing

from $4.00 / 1,000 per record returneds

Go to Apify Store
MIT OpenCourseWare Lecture Transcript Scraper

MIT OpenCourseWare Lecture Transcript Scraper

Extract MIT OpenCourseWare video-lecture transcripts — no login, no ASR. Give it a course (crawls every lecture) or specific lecture URLs and get the full transcript text, timestamped segments, and SRT/VTT, plus course and lecture titles. Free, Creative-Commons educational content.

Pricing

from $4.00 / 1,000 per record returneds

Rating

0.0

(0)

Developer

Scrapers Delight

Scrapers Delight

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

🎓 MIT OpenCourseWare Lecture Transcript Scraper

Pull MIT OpenCourseWare video-lecture transcripts — no login, no AI transcription. MIT OCW publishes a transcript for every lecture, and this actor reads it: full text, timestamped segments, and SRT/VTT, plus course and lecture titles. Give it a course (it crawls every lecture) or specific lecture URLs.

It reads OCW's own captions, so there's no speech-to-text compute — fast and cheap. (MIT OCW is free, Creative-Commons educational content.)


What does it do?

For each lecture (from a course crawl or direct URLs) it returns:

  • 📝 Full transcript (plain text) — always included
  • ⏲️ Timestamped segments{start, end, text}
  • 🎬 SRT / VTT subtitles
  • 🏷️ Course title + lecture title

No ASR, no API key — it reads the published .vtt caption track.


What data does it extract?

For every lecture: url, course_title, lecture_title, transcript, segments[], srt, vtt, segment_count, is_new (monitor), scraped_at.


Who is it for?

  • 🎓 Learners & educators turning lectures into searchable notes and study guides.
  • 🤖 AI / RAG builders — rigorous, structured lecture content is excellent training/retrieval data.
  • 🌍 Localization / accessibility workflows.

How to use it (step by step)

  1. Click Try for free.
  2. Paste a course URL (https://ocw.mit.edu/courses/{slug}/) — or specific lecture URLs.
  3. (Optional) add srt/vtt/segments formats.
  4. Click Start, open the Dataset tab to view/export.
  5. (Optional) set monitorMode + a Schedule to capture lectures as courses update.

Quick start

{ "courseUrls": ["https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/"], "transcriptFormats": ["txt", "srt"] }

Input

FieldWhat it does
courseUrlsOCW course URLs (crawls each course's lectures)
lectureUrlsspecific lecture resource URLs
transcriptFormatstxt · segments · srt · vtt
maxLectureshard cap per run (0 = all)
monitorMode, alertOnNewLecturerecurring watcher + alerts
webhookUrl, slackWebhookUrl, emailRecipientsalert channels
proxyConfiguration, requestConcurrencyproxy + parallelism

Output

Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.


How much does it cost?

Pay-per-event — and with no transcription compute, it's cheap:

EventWhat it coversSuggested price
lot-scrapedeach lecture returned~$0.003 / lecture
lot-detail-enrichedeach transcript fetched~$0.003 / lecture
monitor-run-completedeach scheduled watch run~$0.05 / run
new-lot-detectedeach new lecture~$0.02 / lecture
alert-deliveredeach Slack/email/webhook push~$0.005 / alert

(Final per-event prices are set on the actor's pricing page.)


MIT OpenCourseWare is published free to the public under a Creative Commons BY-NC-SA license. This actor reads those public transcripts. You must comply with the CC BY-NC-SA terms — attribute MIT OCW, non-commercial use, share-alike — and review OCW's site terms. You are responsible for your use.


FAQ

Does it crawl a whole course? Yes — give a course URL and it finds + transcribes every video lecture.

Is there a Whisper/ASR step? No — it reads OCW's .vtt captions, so it's fast and cheap.

Can I get subtitles? Yes — add srt and/or vtt to transcriptFormats.

How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.


Feedback

Want PDF-notes extraction or per-department crawling? Open an issue on the actor.