Coursera Transcript Scraper — Lecture Subtitles avatar

Coursera Transcript Scraper — Lecture Subtitles

Pricing

from $4.00 / 1,000 per record returneds

Go to Apify Store
Coursera Transcript Scraper — Lecture Subtitles

Coursera Transcript Scraper — Lecture Subtitles

Extract Coursera lecture transcripts from the course's own subtitle tracks — no login, no ASR. By course slug, returns each open lecture's transcript as text, timestamped segments, and SRT/VTT, in any of 30+ available languages. Gated lectures are flagged, not faked.

Pricing

from $4.00 / 1,000 per record returneds

Rating

0.0

(0)

Developer

Scrapers Delight

Scrapers Delight

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🎓 Coursera Transcript Scraper — Lecture Subtitles (TXT / SRT / VTT)

Pull the transcript of any open Coursera lecture straight from the course's own subtitle tracks — no login, no AI transcription. Give it a course slug and it returns every available lecture's transcript as clean text, timestamped segments, and ready-to-use SRT/VTT — in any of the 30+ languages Coursera provides. Enrollment-gated lectures are flagged honestly, never faked.

No speech-to-text compute — it reads Coursera's existing captions, so it's fast and cheap.


What does it do?

For each course slug you provide, it walks Coursera's public catalog API to list the lectures, then fetches each open lecture's subtitle track and returns, per lecture:

  • 📝 Full transcript (plain text) — always included
  • ⏲️ Timestamped segments{start, end, text}
  • 🎬 SRT / VTT subtitles — drop into a video editor or LMS
  • 🌍 Language + the full list of available languages (often 30+)
  • 🔒 needs_auth flag for enrollment-gated lectures (no fake data)

What data does it extract?

For every lecture:

  • 🏷️ course_slug, course_id, course_title
  • 🎬 lecture_id, lecture_title
  • 🌍 language, available_languages[]
  • 📄 transcript, ⏲️ segments[], srt, vtt, segment_count
  • 🔒 needs_auth, ✨ is_new (monitor), 🕒 scraped_at

Who is it for?

  • 📚 Ed-tech & course builders repurposing lecture content into notes, summaries, and study guides.
  • 🤖 AI / RAG dataset builders assembling clean, multilingual instructional text.
  • 🌍 Localization teams pulling subtitles across languages.
  • 🧑‍🎓 Learners & researchers searching lecture content as text.

How to use it (step by step)

  1. Click Try for free.
  2. Enter one or more course slugs (e.g. machine-learning) or /learn/{slug} URLs.
  3. (Optional) set a language (default en) and extra formats (srt, vtt, segments).
  4. Click Start, then open the Dataset tab to view/export.
  5. (Optional) set monitorMode + a Schedule to capture new lectures as courses update.

Quick start

{ "courseSlugs": ["machine-learning"], "language": "en", "transcriptFormats": ["txt", "srt"] }

Input

FieldWhat it does
courseSlugsCoursera course slugs or /learn/{slug} URLs
languagepreferred subtitle language (falls back to first available)
transcriptFormatstxt · segments · srt · vtt
maxLecturesPerCoursehard cap per course (0 = all)
includeLockedalso list enrollment-gated lectures (flagged, no transcript)
monitorMode, alertOnNewLecturerecurring new-lecture watcher + alerts
webhookUrl, slackWebhookUrl, emailRecipientsalert channels
proxyConfiguration, requestConcurrencyproxy + parallelism

Output

Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.


How much does it cost?

Pay-per-event — and with no transcription compute, it's cheap:

EventWhat it coversSuggested price
lot-scrapedeach lecture returned~$0.003 / lecture
lot-detail-enrichedeach subtitle track fetched~$0.003 / lecture
monitor-run-completedeach scheduled watch run~$0.05 / run
new-lot-detectedeach new lecture~$0.02 / lecture
alert-deliveredeach Slack/email/webhook push~$0.005 / alert

(Final per-event prices are set on the actor's pricing page.)


How does it work without AI transcription?

Coursera publishes a subtitle (.vtt) track for each lecture's video, in many languages. This actor uses Coursera's public onDemand catalog APIs to find the subtitle URL for each open lecture and parses it — it does not run speech-to-text, so there's no GPU/compute cost.


This actor reads publicly available lecture subtitles for open / audit-accessible lectures; it cannot access enrollment-gated lectures (those are flagged needs_auth with no transcript). The content is instructional material, not personal data. Scraping public data is generally legal, but course content is copyrighted — you are responsible for your use: review Coursera's Terms of Service and respect the content owners' rights. Don't redistribute copyrighted material.


FAQ

Does it work on paid/locked lectures? No. Only open / audit-accessible lectures expose a subtitle URL. Locked lectures are reported with needs_auth=true and no transcript.

Which languages? Whatever the course provides — often 30+. Set language; it falls back to the first available.

Is there a Whisper/ASR step? No — it reads Coursera's own captions, so it's fast and cheap.

Can I get subtitles? Yes — add srt and/or vtt to transcriptFormats.

How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.


Feedback

Want full-course bundling, speaker labels, or another language default? Open an issue on the actor.