Pricing

from $0.50 / 1,000 transcriptions

Try for free

Go to Apify Store

YouTube Transcript Scraper AI

Try for free

Extract YouTube transcripts with AI-powered fallback when captions are unavailable. Enter a URL or search query, get clean timestamped JSON with segments and word-level timings. Ideal for content repurposing, LLM training data, and video accessibility workflows.

Pricing

from $0.50 / 1,000 transcriptions

Rating

5.0

(1)

Developer

Epic Scrapers

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

YouTube Transcript Scraper ⭐

YouTube Transcript Scraper banner

Extract transcripts, captions, and AI-powered transcriptions from any YouTube video. When YouTube has no captions available, this scraper automatically falls back to AI speech-to-text, so you never come back empty-handed.

What Makes This Different

Most YouTube transcript scrapers can only extract existing captions — if a video has no caption track, they return an error. This scraper is different.

It uses a three-tier approach:

YouTube captions (manual or auto-generated) — extracted via yt-dlp with full subtitle parsing
AI transcription — if no captions exist, the audio is downloaded and transcribed using an enterprise-grade speech-to-text engine
Speaker diarization — optionally identifies who said what, with per-utterance labels and timestamps

This means you can extract transcripts from any public YouTube video — including podcasts, interviews, lectures, live streams, and music videos — regardless of whether the uploader enabled captions.

🚀 Features

Extraction & Fallback

Automatic mode — tries YouTube captions first, falls back to AI transcription if none found. Set it and forget it.
Captions-only mode — strict mode that only returns existing YouTube captions, never uses AI
AI-only mode — always transcribes via AI, even when captions are available (useful for comparing quality)

AI Transcription

Enterprise-grade AI — powered by a state-of-the-art speech-to-text engine with industry-leading accuracy
Speaker diarization — identifies up to 10 unique speakers with per-utterance labels and precise timestamps. Perfect for interviews, panel discussions, and podcasts
Multi-language translation — translate transcripts into any target language (e.g. es, fr, de, ja) while preserving utterance structure

Technical

yt-dlp + Deno — uses yt-dlp with Deno-based JS challenge solving for reliable YouTube access, bypassing the limitations of the InnerTube API
Residential proxy support — uses Apify residential proxies to avoid geo-blocking and rate limiting
Bulk processing — pass any number of YouTube video URLs in a single run
No YouTube Data API key required — no quotas, no OAuth, no API key management

📋 What You Get

Every scraped video returns structured data with up to 18+ fields per result:

Field	Type	Description	Example
`url`	`string`	Full YouTube video URL	`https://www.youtube.com/watch?v=dQw4w9WgXcQ`
`source`	`string`	Source of the transcript — `captions`, `ai_transcription`, `error`, or `none`	`captions`
`transcript`	`string \| null`	Full plain-text transcript when source is `captions`	`"We're no strangers to love..."`
`transcription`	`object \| null`	Full AI transcription result object when source is `ai_transcription`	`{ full_transcript: "...", utterances: [...] }`
`transcription.full_transcript`	`string \| null`	Complete concatenated transcript from AI	`"Welcome to today's lecture on..."`
`transcription.utterances`	`array \| null`	Array of per-utterance segments with speaker labels and timestamps	`[{ "start": 0.5, "end": 3.2, "text": "...", "speaker": "SPEAKER_00" }]`
`transcription.languages`	`array \| null`	Detected languages and confidence scores	`[{ "language": "en", "confidence": 0.98 }]`
`transcription.translated_transcripts`	`object \| null`	Translated transcripts keyed by target language	`{ "es": { "full_transcript": "...", "utterances": [...] } }`
`error`	`string \| null`	Error message if extraction failed	`"No captions available"`

Status Values

`source`	Meaning
`captions`	Transcript extracted from YouTube's caption track
`ai_transcription`	Transcript generated by AI speech-to-text from downloaded audio
`error`	Extraction failed
`none`	Captions-only mode with no captions available

❓ Frequently Asked Questions

How do I use this actor?

Open the actor on Apify Console
Paste one or more YouTube video URLs into the urlList field
Select your preferred transcription mode (Auto, AI Only, or Captions Only)
(Optional) Enable speaker diarization or translation
Click Run and wait for the results
Export the dataset as JSON, CSV, Excel, or HTML

What if a video has no captions?

In Auto mode (the default), the actor will download the video's audio track and transcribe it using AI. You'll get back a full transcript with timestamps — just as if captions were available. In Captions Only mode, it will skip videos without captions. In AI Only mode, it always uses AI transcription regardless.

Does this work with YouTube Shorts and live streams?

Yes. The actor accepts any standard YouTube URL format — youtube.com/watch?v=..., youtu.be/..., youtube.com/shorts/.... Completed live streams (VODs) are fully supported. Currently live streams are not supported.

What languages are supported?

For captions extraction, any language that YouTube provides captions for. For AI transcription, the engine supports 100+ languages with state-of-the-art accuracy for English, Spanish, French, German, Portuguese, Japanese, Korean, Arabic, Hindi, and many more.

Can I process hundreds of videos at once?

Yes. Pass an array of up to hundreds of video URLs in the urlList field. Each video is processed sequentially with proper cleanup between runs. There is no hard limit — your only constraint is the Apify platform's per-run timeout.

Do I need a YouTube API key or OAuth?

No. This actor uses yt-dlp with Deno-based JS challenge solving, which does not require any YouTube Data API key, OAuth setup, or quota management. For AI transcription, you will need an API key for the AI transcription service (set as the appropriate environment variable).

How is this different from other YouTube transcript scrapers on Apify?

Most competitors only extract existing captions — if the video has no captions, they return an error or null. This actor is unique in its AI transcription fallback when captions aren't available, plus its speaker diarization capability (identifying who said what). No other YouTube transcript actor on Apify offers diarization with per-utterance labels.

📥 Input

Input	Type	Required	Default	Description
`urlList`	`array<string>`	✅ Yes	—	YouTube video URLs to extract transcripts from
`transcriptionMode`	`string`	❌ No	`auto`	`auto` (captions → AI fallback), `ai_only`, or `captions_only`
`diarization`	`boolean`	❌ No	`false`	Identify different speakers with per-utterance labels and timestamps
`translationEnabled`	`boolean`	❌ No	`false`	Translate the transcript to other languages
`translationLanguages`	`array<string>`	❌ No	`["es"]`	Target language codes (e.g. `es`, `fr`, `de`). Only used when `translationEnabled` is `true`

Example Input

{
  "urlList": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/jNQXAC9IVRw"
  ],
  "transcriptionMode": "auto",
  "diarization": true,
  "translationEnabled": false
}

Example Output (Captions Source)

{
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "source": "captions",
  "transcript": "We're no strangers to love\nYou know the rules and so do I\nA full commitment's what I'm thinking of\nYou wouldn't get this from any other guy"
}

Example Output (AI Transcription with Diarization and Translation)

[
  {
    "url": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
    "source": "ai_transcription",
    "metadata": {
      "audio_duration": 19.008,
      "number_of_distinct_channels": 1,
      "billing_time": 19.008,
      "transcription_time": 8.95
    },
    "transcription": {
      "utterances": [
        {
          "text": "All right,",
          "language": "en",
          "start": 1.297,
          "end": 1.517,
          "confidence": 0.33,
          "channel": 0,
          "words": [
            {
              "word": "All",
              "start": 1.297,
              "end": 1.4169999999999998,
              "confidence": 0.32
            },
            {
              "word": " right,",
              "start": 1.418,
              "end": 1.517,
              "confidence": 0.35
            }
          ],
          "speaker": 0
        },
        {
          "text": "so here we are in front of the elephants.",
          "language": "en",
          "start": 1.518,
          "end": 3.379,
          "confidence": 0.32,
          "channel": 0,
          "words": [
            {
              "word": " so",
              "start": 1.518,
              "end": 1.597,
              "confidence": 0.04
            },
            {
              "word": " here",
              "start": 1.617,
              "end": 1.837,
              "confidence": 0.65
            },
            {
              "word": " we",
              "start": 1.838,
              "end": 1.917,
              "confidence": 0.03
            },
            {
              "word": " are",
              "start": 1.918,
              "end": 2.077,
              "confidence": 0.63
            },
            {
              "word": " in",
              "start": 2.0780000000000003,
              "end": 2.198,
              "confidence": 0
            },
            {
              "word": " front",
              "start": 2.318,
              "end": 2.458,
              "confidence": 0.44
            },
            {
              "word": " of",
              "start": 2.459,
              "end": 2.538,
              "confidence": 0.33
            },
            {
              "word": " the",
              "start": 2.539,
              "end": 2.638,
              "confidence": 0.25
            },
            {
              "word": " elephants.",
              "start": 2.918,
              "end": 3.379,
              "confidence": 0.53
            }
          ],
          "speaker": 0
        },
        {
          "text": "Cool thing about these guys is that they have really,",
          "language": "en",
          "start": 5.2,
          "end": 8.163,
          "confidence": 0.59,
          "channel": 0,
          "words": [
            {
              "word": " Cool",
              "start": 5.2,
              "end": 5.38,
              "confidence": 0.44
            },
            {
              "word": " thing",
              "start": 5.4,
              "end": 5.6,
              "confidence": 0.58
            },
            {
              "word": " about",
              "start": 5.841,
              "end": 5.981,
              "confidence": 0.19
            },
            {
              "word": " these",
              "start": 6.021,
              "end": 6.201,
              "confidence": 0.36
            },
            {
              "word": " guys",
              "start": 6.221,
              "end": 6.601,
              "confidence": 0.78
            },
            {
              "word": " is",
              "start": 7.022,
              "end": 7.142,
              "confidence": 0.89
            },
            {
              "word": " that",
              "start": 7.143,
              "end": 7.242,
              "confidence": 0.68
            },
            {
              "word": " they",
              "start": 7.243,
              "end": 7.362,
              "confidence": 0.93
            },
            {
              "word": " have",
              "start": 7.382,
              "end": 7.522,
              "confidence": 0.74
            },
            {
              "word": " really,",
              "start": 7.922,
              "end": 8.163,
              "confidence": 0.34
            }
          ],
          "speaker": 0
        },
        {
          "text": "really,",
          "language": "en",
          "start": 9.143999999999998,
          "end": 9.424,
          "confidence": 0.78,
          "channel": 0,
          "words": [
            {
              "word": " really,",
              "start": 9.143999999999998,
              "end": 9.424,
              "confidence": 0.78
            }
          ],
          "speaker": 0
        },
        {
          "text": "really long trunks,",
          "language": "en",
          "start": 9.623999999999999,
          "end": 12.366,
          "confidence": 0.42,
          "channel": 0,
          "words": [
            {
              "word": " really",
              "start": 9.623999999999999,
              "end": 9.844000000000001,
              "confidence": 0.8
            },
            {
              "word": " long",
              "start": 10.143999999999998,
              "end": 10.445,
              "confidence": 0.09
            },
            {
              "word": " trunks,",
              "start": 12.046,
              "end": 12.366,
              "confidence": 0.38
            }
          ],
          "speaker": 0
        },
        {
          "text": "and that's that's cool.",
          "language": "en",
          "start": 12.827000000000002,
          "end": 13.707999999999998,
          "confidence": 0.47,
          "channel": 0,
          "words": [
            {
              "word": " and",
              "start": 12.827000000000002,
              "end": 12.947,
              "confidence": 0.66
            },
            {
              "word": " that's",
              "start": 12.948,
              "end": 13.227,
              "confidence": 0.4
            },
            {
              "word": " that's",
              "start": 13.286999999999999,
              "end": 13.486999999999998,
              "confidence": 0.43
            },
            {
              "word": " cool.",
              "start": 13.527000000000001,
              "end": 13.707999999999998,
              "confidence": 0.41
            }
          ],
          "speaker": 0
        },
        {
          "text": "And that's pretty much all there is to say.",
          "language": "en",
          "start": 16.97,
          "end": 18.432,
          "confidence": 0.62,
          "channel": 0,
          "words": [
            {
              "word": " And",
              "start": 16.97,
              "end": 17.11,
              "confidence": 0.78
            },
            {
              "word": " that's",
              "start": 17.13,
              "end": 17.291,
              "confidence": 0.45
            },
            {
              "word": " pretty",
              "start": 17.331,
              "end": 17.471,
              "confidence": 0.36
            },
            {
              "word": " much",
              "start": 17.511,
              "end": 17.671,
              "confidence": 0.96
            },
            {
              "word": " all",
              "start": 17.750999999999998,
              "end": 17.910999999999998,
              "confidence": 0.83
            },
            {
              "word": " there",
              "start": 17.930999999999997,
              "end": 18.051,
              "confidence": 0.04
            },
            {
              "word": " is",
              "start": 18.052,
              "end": 18.111,
              "confidence": 0.42
            },
            {
              "word": " to",
              "start": 18.112,
              "end": 18.211,
              "confidence": 0.84
            },
            {
              "word": " say.",
              "start": 18.250999999999998,
              "end": 18.432,
              "confidence": 0.93
            }
          ],
          "speaker": 0
        }
      ],
      "full_transcript": "All right, so here we are in front of the elephants. Cool thing about these guys is that they have really, really, really long trunks, and that's that's cool. And that's pretty much all there is to say.",
      "languages": [
        "en"
      ]
    },
    "diarization": {
      "success": true,
      "is_empty": false,
      "exec_time": 7,
      "results": [
        {
          "text": "All right,",
          "language": "en",
          "start": 1.297,
          "end": 1.517,
          "confidence": 0.33,
          "channel": 0,
          "words": [
            {
              "word": "All",
              "start": 1.297,
              "end": 1.4169999999999998,
              "confidence": 0.32
            },
            {
              "word": " right,",
              "start": 1.418,
              "end": 1.517,
              "confidence": 0.35
            }
          ],
          "speaker": 0
        },
        {
          "text": "so here we are in front of the elephants.",
          "language": "en",
          "start": 1.518,
          "end": 3.379,
          "confidence": 0.32,
          "channel": 0,
          "words": [
            {
              "word": " so",
              "start": 1.518,
              "end": 1.597,
              "confidence": 0.04
            },
            {
              "word": " here",
              "start": 1.617,
              "end": 1.837,
              "confidence": 0.65
            },
            {
              "word": " we",
              "start": 1.838,
              "end": 1.917,
              "confidence": 0.03
            },
            {
              "word": " are",
              "start": 1.918,
              "end": 2.077,
              "confidence": 0.63
            },
            {
              "word": " in",
              "start": 2.0780000000000003,
              "end": 2.198,
              "confidence": 0
            },
            {
              "word": " front",
              "start": 2.318,
              "end": 2.458,
              "confidence": 0.44
            },
            {
              "word": " of",
              "start": 2.459,
              "end": 2.538,
              "confidence": 0.33
            },
            {
              "word": " the",
              "start": 2.539,
              "end": 2.638,
              "confidence": 0.25
            },
            {
              "word": " elephants.",
              "start": 2.918,
              "end": 3.379,
              "confidence": 0.53
            }
          ],
          "speaker": 0
        },
        {
          "text": "Cool thing about these guys is that they have really,",
          "language": "en",
          "start": 5.2,
          "end": 8.163,
          "confidence": 0.59,
          "channel": 0,
          "words": [
            {
              "word": " Cool",
              "start": 5.2,
              "end": 5.38,
              "confidence": 0.44
            },
            {
              "word": " thing",
              "start": 5.4,
              "end": 5.6,
              "confidence": 0.58
            },
            {
              "word": " about",
              "start": 5.841,
              "end": 5.981,
              "confidence": 0.19
            },
            {
              "word": " these",
              "start": 6.021,
              "end": 6.201,
              "confidence": 0.36
            },
            {
              "word": " guys",
              "start": 6.221,
              "end": 6.601,
              "confidence": 0.78
            },
            {
              "word": " is",
              "start": 7.022,
              "end": 7.142,
              "confidence": 0.89
            },
            {
              "word": " that",
              "start": 7.143,
              "end": 7.242,
              "confidence": 0.68
            },
            {
              "word": " they",
              "start": 7.243,
              "end": 7.362,
              "confidence": 0.93
            },
            {
              "word": " have",
              "start": 7.382,
              "end": 7.522,
              "confidence": 0.74
            },
            {
              "word": " really,",
              "start": 7.922,
              "end": 8.163,
              "confidence": 0.34
            }
          ],
          "speaker": 0
        },
        {
          "text": "really,",
          "language": "en",
          "start": 9.143999999999998,
          "end": 9.424,
          "confidence": 0.78,
          "channel": 0,
          "words": [
            {
              "word": " really,",
              "start": 9.143999999999998,
              "end": 9.424,
              "confidence": 0.78
            }
          ],
          "speaker": 0
        },
        {
          "text": "really long trunks,",
          "language": "en",
          "start": 9.623999999999999,
          "end": 12.366,
          "confidence": 0.42,
          "channel": 0,
          "words": [
            {
              "word": " really",
              "start": 9.623999999999999,
              "end": 9.844000000000001,
              "confidence": 0.8
            },
            {
              "word": " long",
              "start": 10.143999999999998,
              "end": 10.445,
              "confidence": 0.09
            },
            {
              "word": " trunks,",
              "start": 12.046,
              "end": 12.366,
              "confidence": 0.38
            }
          ],
          "speaker": 0
        },
        {
          "text": "and that's that's cool.",
          "language": "en",
          "start": 12.827000000000002,
          "end": 13.707999999999998,
          "confidence": 0.47,
          "channel": 0,
          "words": [
            {
              "word": " and",
              "start": 12.827000000000002,
              "end": 12.947,
              "confidence": 0.66
            },
            {
              "word": " that's",
              "start": 12.948,
              "end": 13.227,
              "confidence": 0.4
            },
            {
              "word": " that's",
              "start": 13.286999999999999,
              "end": 13.486999999999998,
              "confidence": 0.43
            },
            {
              "word": " cool.",
              "start": 13.527000000000001,
              "end": 13.707999999999998,
              "confidence": 0.41
            }
          ],
          "speaker": 0
        },
        {
          "text": "And that's pretty much all there is to say.",
          "language": "en",
          "start": 16.97,
          "end": 18.432,
          "confidence": 0.62,
          "channel": 0,
          "words": [
            {
              "word": " And",
              "start": 16.97,
              "end": 17.11,
              "confidence": 0.78
            },
            {
              "word": " that's",
              "start": 17.13,
              "end": 17.291,
              "confidence": 0.45
            },
            {
              "word": " pretty",
              "start": 17.331,
              "end": 17.471,
              "confidence": 0.36
            },
            {
              "word": " much",
              "start": 17.511,
              "end": 17.671,
              "confidence": 0.96
            },
            {
              "word": " all",
              "start": 17.750999999999998,
              "end": 17.910999999999998,
              "confidence": 0.83
            },
            {
              "word": " there",
              "start": 17.930999999999997,
              "end": 18.051,
              "confidence": 0.04
            },
            {
              "word": " is",
              "start": 18.052,
              "end": 18.111,
              "confidence": 0.42
            },
            {
              "word": " to",
              "start": 18.112,
              "end": 18.211,
              "confidence": 0.84
            },
            {
              "word": " say.",
              "start": 18.250999999999998,
              "end": 18.432,
              "confidence": 0.93
            }
          ],
          "speaker": 0
        }
      ],
      "error": null
    },
    "translation": {
      "success": true,
      "is_empty": false,
      "results": [
        {
          "languages": [
            "es"
          ],
          "full_transcript": "Muy bien, aquí estamos frente a los elefantes. Lo genial de estos animales es que tienen trompas muy largas, y eso es genial. Y eso es prácticamente todo lo que hay que decir.",
          "utterances": [
            {
              "words": [
                {
                  "word": "Muy",
                  "start": 1.297,
                  "end": 1.4169999999999998,
                  "confidence": 0.32
                },
                {
                  "word": " bien,",
                  "start": 1.418,
                  "end": 1.517,
                  "confidence": 0.35
                }
              ],
              "text": "Muy bien,",
              "language": "es",
              "start": 1.297,
              "end": 1.517,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.33499999999999996
            },
            {
              "words": [
                {
                  "word": " aquí",
                  "start": 1.518,
                  "end": 1.837,
                  "confidence": 0.34500000000000003
                },
                {
                  "word": " estamos",
                  "start": 1.838,
                  "end": 2.077,
                  "confidence": 0.33
                },
                {
                  "word": " frente",
                  "start": 2.0780000000000003,
                  "end": 2.198,
                  "confidence": 0
                },
                {
                  "word": " a",
                  "start": 2.318,
                  "end": 2.458,
                  "confidence": 0.44
                },
                {
                  "word": " los",
                  "start": 2.459,
                  "end": 2.638,
                  "confidence": 0.29000000000000004
                },
                {
                  "word": " elefantes.",
                  "start": 2.918,
                  "end": 3.379,
                  "confidence": 0.53
                }
              ],
              "text": " aquí estamos frente a los elefantes.",
              "language": "es",
              "start": 1.518,
              "end": 3.379,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.3222222222222222
            },
            {
              "words": [
                {
                  "word": " Lo",
                  "start": 5.2,
                  "end": 5.38,
                  "confidence": 0.44
                },
                {
                  "word": " genial",
                  "start": 5.4,
                  "end": 5.6,
                  "confidence": 0.58
                },
                {
                  "word": " de",
                  "start": 5.841,
                  "end": 5.981,
                  "confidence": 0.19
                },
                {
                  "word": " estos",
                  "start": 6.021,
                  "end": 6.201,
                  "confidence": 0.36
                },
                {
                  "word": " animales",
                  "start": 6.221,
                  "end": 6.601,
                  "confidence": 0.78
                },
                {
                  "word": " es",
                  "start": 7.022,
                  "end": 7.362,
                  "confidence": 0.8475
                },
                {
                  "word": " que",
                  "start": 7.382,
                  "end": 7.522,
                  "confidence": 0.74
                },
                {
                  "word": " tienen",
                  "start": 7.922,
                  "end": 8.163,
                  "confidence": 0.34
                }
              ],
              "text": " Lo genial de estos animales es que tienen",
              "language": "es",
              "start": 5.2,
              "end": 8.163,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.593
            },
            {
              "words": [
                {
                  "word": " trompas",
                  "start": 9.143999999999998,
                  "end": 9.424,
                  "confidence": 0.78
                }
              ],
              "text": " trompas",
              "language": "es",
              "start": 9.143999999999998,
              "end": 9.424,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.78
            },
            {
              "words": [
                {
                  "word": " muy",
                  "start": 9.623999999999999,
                  "end": 9.844000000000001,
                  "confidence": 0.8
                },
                {
                  "word": " largas,",
                  "start": 10.143999999999998,
                  "end": 10.445,
                  "confidence": 0.09
                },
                {
                  "word": " y",
                  "start": 12.046,
                  "end": 12.366,
                  "confidence": 0.38
                }
              ],
              "text": " muy largas, y",
              "language": "es",
              "start": 9.623999999999999,
              "end": 12.366,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.42333333333333334
            },
            {
              "words": [
                {
                  "word": " eso",
                  "start": 12.827000000000002,
                  "end": 13.227,
                  "confidence": 0.53
                },
                {
                  "word": " es",
                  "start": 13.286999999999999,
                  "end": 13.486999999999998,
                  "confidence": 0.43
                },
                {
                  "word": " genial.",
                  "start": 13.527000000000001,
                  "end": 13.707999999999998,
                  "confidence": 0.41
                }
              ],
              "text": " eso es genial.",
              "language": "es",
              "start": 12.827000000000002,
              "end": 13.707999999999998,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.475
            },
            {
              "words": [
                {
                  "word": " Y eso",
                  "start": 16.97,
                  "end": 17.11,
                  "confidence": 0.78
                },
                {
                  "word": " es",
                  "start": 17.13,
                  "end": 17.291,
                  "confidence": 0.45
                },
                {
                  "word": " prácticamente",
                  "start": 17.331,
                  "end": 17.471,
                  "confidence": 0.36
                },
                {
                  "word": " todo",
                  "start": 17.511,
                  "end": 17.671,
                  "confidence": 0.96
                },
                {
                  "word": " lo",
                  "start": 17.750999999999998,
                  "end": 17.910999999999998,
                  "confidence": 0.83
                },
                {
                  "word": " que",
                  "start": 17.930999999999997,
                  "end": 18.051,
                  "confidence": 0.04
                },
                {
                  "word": " hay",
                  "start": 18.052,
                  "end": 18.111,
                  "confidence": 0.42
                },
                {
                  "word": " que",
                  "start": 18.112,
                  "end": 18.211,
                  "confidence": 0.84
                },
                {
                  "word": " decir.",
                  "start": 18.250999999999998,
                  "end": 18.432,
                  "confidence": 0.93
                }
              ],
              "text": " Y eso es prácticamente todo lo que hay que decir.",
              "language": "es",
              "start": 16.97,
              "end": 18.432,
              "channel": 0,
              "speaker": 0,
              "confidence": 0.6233333333333334
            }
          ],
          "error": null
        }
      ],
      "exec_time": 1.4865992790013551,
      "error": null
    }
  }
]

🛠️ Technical Details

How It Works

The actor uses a two-phase extraction pipeline. Phase 1 attempts to extract existing YouTube caption tracks using yt-dlp with Deno-powered JS challenge solving — this handles the vast majority of videos. If no captions are available and AI transcription is enabled, Phase 2 downloads the audio stream (opus 48kHz format 251) and sends it to an AI transcription service, polling for completion.

Error Handling

Missing captions — gracefully falls back to AI transcription (in auto mode) or returns a none status (in captions_only mode)
Audio download failure — if yt-dlp fails to download the audio track (e.g. the video is private, deleted, or geo-blocked), the actor returns a clear error status
AI transcription failure — if the transcription service is unreachable, the API key is invalid, or transcription times out, the error is captured and returned
Missing API key — when AI transcription is requested but no API key is configured for the transcription service, the actor logs a warning and continues with an error result for that video
Temporary files — all downloaded audio files are cleaned up immediately after transcription to avoid disk bloat on long runs

Data Integrity

No duplicate data — each video URL produces exactly one result in the dataset
Original format preserved — captions are returned as clean plain text with SRT/VTT timestamps stripped during parsing
Full audit trail — every result includes a source field indicating exactly how the transcript was obtained
No silent failures — every error is captured with a human-readable message in the error field

💘 Comparison: YouTube Transcript Scrapers

Feature	YouTube Transcript Scraper ⭐	akash9078	scrapesmith	crawlerbros
YouTube captions extraction	✅	✅	✅	✅
AI transcription fallback (no captions)	✅ — speech-to-text engine	❌	❌	✅ — Whisper (local)
Speaker diarization	✅ — per-utterance labels	❌	❌	❌
Translation	✅ — multi-language AI translation	✅ — Mistral AI	❌	❌
Bulk processing	✅ — unlimited URLs	✅	✅ — state migration	✅
Timestamped segments	✅ — via utterances	❌	✅ — per segment	✅ — per segment
Rich video metadata	❌ (focused on transcript)	✅ — title, views, thumbnails	✅ — title, views, channel, duration	✅ — title, views, channel, duration
Proxy support	✅ — Apify residential	✅ — built-in rotation	❌ (not required)	❌ (not required)
No YouTube API key	✅ — yt-dlp + Deno	✅ — InnerTube API	✅ — session cookies	✅ — multiple fallback paths
Pricing model	Per-result	Per-result	Per-result	Per-result

💡 Use Cases

AI Training Data Pipeline

A machine learning engineer is building a large-scale speech-to-text training dataset. They need to collect thousands of hours of transcribed speech across multiple domains — news, podcasts, lectures, interviews, and casual conversations. Many of the most valuable videos (podcasts, interviews) don't have captions because they're too long for uploaders to manually caption, and auto-captions may be disabled.

Using this actor in auto mode, the engineer can feed in video URLs from any domain and get back clean, structured transcripts for every video — captions where they exist, AI transcription where they don't. The transcription.full_transcript field provides the complete text, while transcription.languages gives confidence scores for filtering. With speaker diarization enabled, the resulting dataset includes natural speaker turns, which is invaluable for training dialogue-aware models.

The engineer can run this actor on a weekly schedule via Apify's scheduler, feeding in new videos from a curated channel list, and the dataset grows continuously without any manual intervention.

Podcast and Interview Analysis

A media analyst needs to extract actionable insights from a library of 500+ podcast episodes. Each episode features multiple guests discussing specific topics, but only a handful have captions. Manual transcription would cost thousands of dollars and take weeks.

With diarization enabled, the actor identifies each speaker across every utterance and returns structured data showing exactly who said what and when. The analyst can filter by speaker to isolate a specific guest's comments across dozens of episodes, or search the full transcript corpus for mentions of a specific topic. The utterances array with precise start and end timestamps makes it trivial to clip specific soundbites for social media or create show notes with page-accurate references.

The result: 500 podcast episodes transcribed and searchable in a few hours, for a fraction of the cost of professional transcription services.

Multilingual Content Localization

A content strategist for a global SaaS company wants to repurpose the CEO's quarterly all-hands videos into blog posts and social content for Spanish, French, and German markets. The videos are internal — no captions at all.

By enabling translationEnabled: true and setting translationLanguages: ["es", "fr", "de"], the actor transcribes the videos and returns the full transcript in each target language. The transcription.translated_transcripts object contains separate full_transcript and utterance structures for every requested language. The strategist feeds the English transcript into their CMS as a blog draft and the translated versions into the appropriate regional channels.

No external translation service needed — the actor handles transcription and translation in one run, cutting the localization workflow from three steps to one.

Competitive Intelligence Monitoring

A product marketing manager needs to track what competitors are saying in their quarterly earnings calls, product launch videos, and conference presentations. These videos are typically long-form (45-90 minutes) and rarely have captions.

The manager sets up a scheduled Apify actor run every Monday morning with a curated list of competitor video URLs. The actor downloads each video's audio, transcribes it, and pushes the results to an Apify dataset. With speaker diarization enabled, the manager can distinguish between the CEO's prepared remarks and analyst Q&A segments.

The dataset feeds into a Slack webhook that alerts the team about specific keyword mentions ("partnership", "new feature", "pricing change"). The full transcripts are searchable in a vector database for downstream analysis. The entire pipeline runs automatically — no manual transcription, no missed videos.

Academic Research — Discourse Analysis

A linguistics PhD student is studying discourse patterns in a corpus of 200 TED Talks. They need precise, time-aligned transcripts to analyze turn-taking, filler word usage, and rhetorical structures across different speakers and topics.

TED Talks typically have captions, but the student also wants to analyze off-script remarks and audience Q&A sessions, which are often uncaptioned. The actor's auto mode handles both: captioned talks are extracted in milliseconds via yt-dlp, while uncaptioned Q&A segments trigger the AI fallback.

The transcription.utterances array provides sub-second timing precision for every phrase, enabling quantitative analysis of speaking pace, pause duration, and speaker overlap. The transcription.languages field confirms language detection for multilingual talks. The structured JSON output integrates directly with the student's Python analysis pipeline using pandas and spaCy.

Accessibility Compliance for Public Content

A university's digital accessibility coordinator needs to ensure that all 300+ publicly posted lecture videos on the department's YouTube channel have accompanying transcripts for hearing-impaired students. Many of the older videos predate YouTube's auto-captioning and have no transcript whatsoever.

Using the actor with default auto mode, the coordinator feeds the entire channel's video list into a single run. Videos with existing captions are extracted instantly. The remaining videos — the bulk of the work — are automatically transcribed via AI with no manual review needed. The source field clearly distinguishes between native captions and AI-generated transcripts.

The result: a complete transcript archive for every public lecture, exportable as JSON for the learning management system or CSV for the accessibility audit. The entire compliance project goes from weeks of manual work to a single afternoon of setup.

🌐 Supported URL Formats

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
https://www.youtube.com/live/VIDEO_ID

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by YouTube, Google LLC, or any of their subsidiaries. All trademarks are the property of their respective owners.

This Actor accesses only publicly available transcript and caption data from youtube.com. You are solely responsible for ensuring your use complies with YouTube's Terms of Service and applicable laws.

SEO Keywords

youtube transcript scraper, youtube captions extractor, youtube subtitle scraper, scrape youtube transcripts, youtube transcript api alternative, youtube ai transcription, speaker diarization youtube, transcribe youtube video without captions, get youtube transcript bulk, youtube speech to text, youtube video transcription, youtube transcript downloader, apify youtube actor, youtube data extraction, youtube caption downloader, extract transcript from youtube, youtube video to text, youtube podcast transcript, youtube lecture transcript, youtube interview transcription, multi-speaker diarization youtube, ai video transcription apify, youtube transcript with timestamps, batch youtube transcript extractor, no api key youtube transcript, youtube transcript python, youtube transcript json, youtube subtitle extractor apify, youtube transcript apify store, youtube video analysis, youtube content repurposing, youtube transcript for ai training, youtube rag dataset, youtube nlp dataset

Youtube Transcript Scraper

scrapeengine/youtube-transcript-scraper

🎬 YouTube Transcript Scraper (youtube-transcript-scraper) pulls clean video transcripts/captions with timestamps, multi-language, and batch export (JSON/CSV). 🔎 Ideal for SEO, keyword research, summaries, accessibility, and content repurposing. ⚡ Fast, reliable, API-ready.

ScrapeEngine

YouTube Transcript Scraper

fetch_cat/youtube-transcript-scraper

Extract transcripts and timestamped caption segments from public YouTube videos. Export text for summaries, RAG, content research, and AI-agent workflows.

Hanna Nosova

YouTube AI Summary — Transcript + AI Analysis

umischael/youtube-ai-summary

Extract YouTube video transcripts and get AI-powered summaries in any language.

Mischael RADABANORO

YouTube Transcript Scraper PRO

intelscrape/youtube-transcript-scraper-pro

YouTube Transcript Scraper PRO, YouTube transcript scraper, get YouTube transcripts, download YouTube captions, extract subtitles, YouTube comments scraper, YouTube video text extractor, YouTube API alternative, LLM training data, datasets, Whisper AI transcription, scrape YouTube transcripts

IntelScrape

YouTube Transcript Scraper

happy_b/youtube-transcript-scraper

Extract YouTube video transcripts with timestamps, word counts, and full video metadata.

Happy B

5.0

📝 YouTube Transcript Scraper — Captions & Text

nexgendata/youtube-transcript-scraper

Extract full transcripts and captions from YouTube videos. Supports multiple languages, auto-generated subs, and bulk processing. Perfect for content repurposing, SEO, and AI training data.

NexGenData

YouTube Transcript Extractor & Caption Downloader

vnx0/youtube-transcription-analyzer

Extract YouTube video transcripts with timestamps, multi-language fallback, and token-efficient JSON output. Built for AI pipelines, content analysis, and accessibility.

Vnx0

Youtube Transcript Scraper

easyapi/youtube-transcript-scraper

Extract YouTube video transcripts and captions effortlessly using multiple transcript services. Perfect for content analysis, subtitles extraction, and video accessibility.

EasyApi

YouTube Transcript Scraper - Subtitles and Captions

openclawmara/youtube-transcript-scraper

Extract transcripts and subtitles from YouTube videos. Get auto-generated or manual captions in any language. Bulk extraction from video URLs, channels, or playlists. Output as plain text, timestamped segments, or SRT. Perfect for content repurposing, SEO, and video analysis.

OpenClaw Mara

Youtube Transcript Scraper

scrapapi/youtube-transcript-scraper

🎥 YouTube Transcript Scraper (youtube-transcript-scraper) extracts clean video transcripts & captions—timestamps, languages, and more. ⚡ Bulk scrape playlists/channels, export JSON/CSV for SEO, research, summarization & AI. 🔎 Perfect for repurposing and indexing.

ScrapAPI

YouTube Transcript Scraper AI

YouTube Transcript Scraper ⭐

What Makes This Different

🚀 Features

Extraction & Fallback

AI Transcription

Technical

📋 What You Get

Status Values

❓ Frequently Asked Questions

How do I use this actor?

What if a video has no captions?

Does this work with YouTube Shorts and live streams?

What languages are supported?

Can I process hundreds of videos at once?

Do I need a YouTube API key or OAuth?

How is this different from other YouTube transcript scrapers on Apify?

📥 Input

Example Input

Example Output (Captions Source)

Example Output (AI Transcription with Diarization and Translation)

🛠️ Technical Details

How It Works

Error Handling

Data Integrity

💘 Comparison: YouTube Transcript Scrapers

💡 Use Cases

AI Training Data Pipeline

Podcast and Interview Analysis

Multilingual Content Localization

Competitive Intelligence Monitoring

Academic Research — Discourse Analysis

Accessibility Compliance for Public Content

🌐 Supported URL Formats

⚠️ Disclaimer

SEO Keywords

You might also like

Youtube Transcript Scraper

YouTube Transcript Scraper

YouTube AI Summary — Transcript + AI Analysis

YouTube Transcript Scraper PRO

YouTube Transcript Scraper

📝 YouTube Transcript Scraper — Captions & Text

YouTube Transcript Extractor & Caption Downloader

Youtube Transcript Scraper

YouTube Transcript Scraper - Subtitles and Captions

Youtube Transcript Scraper