Coursera Transcript Scraper — Lecture Subtitles
Pricing
from $4.00 / 1,000 per record returneds
Coursera Transcript Scraper — Lecture Subtitles
Extract Coursera lecture transcripts from the course's own subtitle tracks — no login, no ASR. By course slug, returns each open lecture's transcript as text, timestamped segments, and SRT/VTT, in any of 30+ available languages. Gated lectures are flagged, not faked.
Pricing
from $4.00 / 1,000 per record returneds
Rating
0.0
(0)
Developer
Scrapers Delight
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🎓 Coursera Transcript Scraper — Lecture Subtitles (TXT / SRT / VTT)
Pull the transcript of any open Coursera lecture straight from the course's own subtitle tracks — no login, no AI transcription. Give it a course slug and it returns every available lecture's transcript as clean text, timestamped segments, and ready-to-use SRT/VTT — in any of the 30+ languages Coursera provides. Enrollment-gated lectures are flagged honestly, never faked.
No speech-to-text compute — it reads Coursera's existing captions, so it's fast and cheap.
What does it do?
For each course slug you provide, it walks Coursera's public catalog API to list the lectures, then fetches each open lecture's subtitle track and returns, per lecture:
- 📝 Full transcript (plain text) — always included
- ⏲️ Timestamped segments —
{start, end, text} - 🎬 SRT / VTT subtitles — drop into a video editor or LMS
- 🌍 Language + the full list of available languages (often 30+)
- 🔒 needs_auth flag for enrollment-gated lectures (no fake data)
What data does it extract?
For every lecture:
- 🏷️
course_slug,course_id,course_title - 🎬
lecture_id,lecture_title - 🌍
language,available_languages[] - 📄
transcript, ⏲️segments[],srt,vtt,segment_count - 🔒
needs_auth, ✨is_new(monitor), 🕒scraped_at
Who is it for?
- 📚 Ed-tech & course builders repurposing lecture content into notes, summaries, and study guides.
- 🤖 AI / RAG dataset builders assembling clean, multilingual instructional text.
- 🌍 Localization teams pulling subtitles across languages.
- 🧑🎓 Learners & researchers searching lecture content as text.
How to use it (step by step)
- Click Try for free.
- Enter one or more course slugs (e.g.
machine-learning) or/learn/{slug}URLs. - (Optional) set a language (default
en) and extra formats (srt,vtt,segments). - Click Start, then open the Dataset tab to view/export.
- (Optional) set monitorMode + a Schedule to capture new lectures as courses update.
Quick start
{ "courseSlugs": ["machine-learning"], "language": "en", "transcriptFormats": ["txt", "srt"] }
Input
| Field | What it does |
|---|---|
courseSlugs | Coursera course slugs or /learn/{slug} URLs |
language | preferred subtitle language (falls back to first available) |
transcriptFormats | txt · segments · srt · vtt |
maxLecturesPerCourse | hard cap per course (0 = all) |
includeLocked | also list enrollment-gated lectures (flagged, no transcript) |
monitorMode, alertOnNewLecture | recurring new-lecture watcher + alerts |
webhookUrl, slackWebhookUrl, emailRecipients | alert channels |
proxyConfiguration, requestConcurrency | proxy + parallelism |
Output
Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.
How much does it cost?
Pay-per-event — and with no transcription compute, it's cheap:
| Event | What it covers | Suggested price |
|---|---|---|
lot-scraped | each lecture returned | ~$0.003 / lecture |
lot-detail-enriched | each subtitle track fetched | ~$0.003 / lecture |
monitor-run-completed | each scheduled watch run | ~$0.05 / run |
new-lot-detected | each new lecture | ~$0.02 / lecture |
alert-delivered | each Slack/email/webhook push | ~$0.005 / alert |
(Final per-event prices are set on the actor's pricing page.)
How does it work without AI transcription?
Coursera publishes a subtitle (.vtt) track for each lecture's video, in many languages. This actor uses Coursera's public onDemand catalog APIs to find the subtitle URL for each open lecture and parses it — it does not run speech-to-text, so there's no GPU/compute cost.
Is it legal to scrape Coursera transcripts?
This actor reads publicly available lecture subtitles for open / audit-accessible lectures; it cannot access enrollment-gated lectures (those are flagged needs_auth with no transcript). The content is instructional material, not personal data. Scraping public data is generally legal, but course content is copyrighted — you are responsible for your use: review Coursera's Terms of Service and respect the content owners' rights. Don't redistribute copyrighted material.
FAQ
Does it work on paid/locked lectures?
No. Only open / audit-accessible lectures expose a subtitle URL. Locked lectures are reported with needs_auth=true and no transcript.
Which languages?
Whatever the course provides — often 30+. Set language; it falls back to the first available.
Is there a Whisper/ASR step? No — it reads Coursera's own captions, so it's fast and cheap.
Can I get subtitles?
Yes — add srt and/or vtt to transcriptFormats.
How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.
Feedback
Want full-course bundling, speaker labels, or another language default? Open an issue on the actor.