MIT OpenCourseWare Lecture Transcript Scraper
Pricing
from $4.00 / 1,000 per record returneds
MIT OpenCourseWare Lecture Transcript Scraper
Extract MIT OpenCourseWare video-lecture transcripts — no login, no ASR. Give it a course (crawls every lecture) or specific lecture URLs and get the full transcript text, timestamped segments, and SRT/VTT, plus course and lecture titles. Free, Creative-Commons educational content.
Pricing
from $4.00 / 1,000 per record returneds
Rating
0.0
(0)
Developer
Scrapers Delight
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
🎓 MIT OpenCourseWare Lecture Transcript Scraper
Pull MIT OpenCourseWare video-lecture transcripts — no login, no AI transcription. MIT OCW publishes a transcript for every lecture, and this actor reads it: full text, timestamped segments, and SRT/VTT, plus course and lecture titles. Give it a course (it crawls every lecture) or specific lecture URLs.
It reads OCW's own captions, so there's no speech-to-text compute — fast and cheap. (MIT OCW is free, Creative-Commons educational content.)
What does it do?
For each lecture (from a course crawl or direct URLs) it returns:
- 📝 Full transcript (plain text) — always included
- ⏲️ Timestamped segments —
{start, end, text} - 🎬 SRT / VTT subtitles
- 🏷️ Course title + lecture title
No ASR, no API key — it reads the published .vtt caption track.
What data does it extract?
For every lecture: url, course_title, lecture_title, transcript, segments[], srt, vtt, segment_count, is_new (monitor), scraped_at.
Who is it for?
- 🎓 Learners & educators turning lectures into searchable notes and study guides.
- 🤖 AI / RAG builders — rigorous, structured lecture content is excellent training/retrieval data.
- 🌍 Localization / accessibility workflows.
How to use it (step by step)
- Click Try for free.
- Paste a course URL (
https://ocw.mit.edu/courses/{slug}/) — or specific lecture URLs. - (Optional) add
srt/vtt/segmentsformats. - Click Start, open the Dataset tab to view/export.
- (Optional) set monitorMode + a Schedule to capture lectures as courses update.
Quick start
{ "courseUrls": ["https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/"], "transcriptFormats": ["txt", "srt"] }
Input
| Field | What it does |
|---|---|
courseUrls | OCW course URLs (crawls each course's lectures) |
lectureUrls | specific lecture resource URLs |
transcriptFormats | txt · segments · srt · vtt |
maxLectures | hard cap per run (0 = all) |
monitorMode, alertOnNewLecture | recurring watcher + alerts |
webhookUrl, slackWebhookUrl, emailRecipients | alert channels |
proxyConfiguration, requestConcurrency | proxy + parallelism |
Output
Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.
How much does it cost?
Pay-per-event — and with no transcription compute, it's cheap:
| Event | What it covers | Suggested price |
|---|---|---|
lot-scraped | each lecture returned | ~$0.003 / lecture |
lot-detail-enriched | each transcript fetched | ~$0.003 / lecture |
monitor-run-completed | each scheduled watch run | ~$0.05 / run |
new-lot-detected | each new lecture | ~$0.02 / lecture |
alert-delivered | each Slack/email/webhook push | ~$0.005 / alert |
(Final per-event prices are set on the actor's pricing page.)
Is it legal to scrape OCW transcripts?
MIT OpenCourseWare is published free to the public under a Creative Commons BY-NC-SA license. This actor reads those public transcripts. You must comply with the CC BY-NC-SA terms — attribute MIT OCW, non-commercial use, share-alike — and review OCW's site terms. You are responsible for your use.
FAQ
Does it crawl a whole course? Yes — give a course URL and it finds + transcribes every video lecture.
Is there a Whisper/ASR step?
No — it reads OCW's .vtt captions, so it's fast and cheap.
Can I get subtitles?
Yes — add srt and/or vtt to transcriptFormats.
How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.
Feedback
Want PDF-notes extraction or per-department crawling? Open an issue on the actor.