Youtube Transcript Scraper avatar

Youtube Transcript Scraper

Pricing

from $40.00 / 1,000 results

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

Extract transcripts and captions from YouTube videos with language selection support. Returns timestamped segments, full concatenated text, and basic video metadata.

Pricing

from $40.00 / 1,000 results

Rating

5.0

(8)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

9

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Extract transcripts and captions from YouTube videos with language selection support. Returns timestamped segments, full concatenated text, and basic video metadata.

What does it do?

This scraper extracts transcripts (subtitles/captions) from YouTube videos. It supports both manually created and auto-generated captions, with optional language selection and translation.

For each video, you get:

  • Timestamped transcript segments (start time, duration, text)
  • Full concatenated transcript as plain text
  • Transcript language and type (manual vs auto-generated)
  • List of all available transcript languages
  • Basic video metadata (title, channel, views, duration, thumbnail)

Input

ParameterTypeRequiredDefaultDescription
videoUrlsarrayYesYouTube video URLs, short links, or plain video IDs
languagestringNo"" (auto)Preferred transcript language code (e.g., en, es, fr)
includeAutoGeneratedbooleanNotrueInclude auto-generated captions when manual ones aren't available

Supported URL formats

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • Plain video ID (11 characters, e.g., dQw4w9WgXcQ)

Output

Each video produces one dataset item:

FieldTypeDescription
video_idstringYouTube video ID
titlestringVideo title
channel_namestringChannel name
channel_idstringChannel ID
duration_secondsintegerVideo duration in seconds
viewsintegerView count
published_datestringPublish date (YYYY-MM-DD)
thumbnailstringThumbnail URL
transcript_languagestringLanguage code of the transcript
transcript_language_namestringLanguage name
is_auto_generatedbooleanWhether the transcript is auto-generated
available_languagesarrayAll available transcript languages
segmentsarrayTimestamped transcript segments
segment_countintegerNumber of segments
full_textstringFull transcript as plain text
successbooleanWhether the scrape succeeded
errorstringError message (null if successful)

Segment format

{
"start": "0.000",
"dur": "3.500",
"text": "We're no strangers to love"
}

Available languages format

{
"code": "en",
"name": "English",
"is_auto_generated": true
}

Input examples

Basic usage

{
"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"]
}

Multiple videos with language selection

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/9bZkp7q19f0"
],
"language": "en"
}

Spanish transcript

{
"videoUrls": ["dQw4w9WgXcQ"],
"language": "es",
"includeAutoGenerated": true
}

How it works

  1. For each video URL, extracts the video ID
  2. Fetches the transcript using YouTube's internal transcript API
  3. If a specific language is requested, tries to find it or translate to it
  4. Falls back to auto-generated captions if manual ones aren't available
  5. Fetches basic video metadata from the video page
  6. Returns everything as a structured dataset item

Limitations

  • Some videos have transcripts/captions disabled by the creator
  • Age-restricted videos may not be accessible
  • Private or deleted videos cannot be scraped
  • Auto-generated captions may contain errors
  • Translation quality depends on YouTube's translation engine