Youtube Multilanguage Transcript Extractor avatar
Youtube Multilanguage Transcript Extractor

Pricing

$6.00 / 1,000 results

Go to Store
Youtube Multilanguage Transcript Extractor

Youtube Multilanguage Transcript Extractor

Developed by

Real Noob

Real Noob

Maintained by Community

Youtube Multilanguage Transcript Extractor fetches structured metadata and rich subtitles / transcripts for any public YouTube video, even when the video is geo-restricted or the captions require on-the-fly machine translation. It is designed for large-scale.

0.0 (0)

Pricing

$6.00 / 1,000 results

0

Total users

6

Monthly users

6

Runs succeeded

>99%

Last modified

8 days ago

Overview

Youtube Multilanguage Transcript Extractor fetches structured metadata and rich subtitles / transcripts for any public YouTube video, even when the video is geo-restricted or the captions require on-the-fly machine translation. It is designed for large-scale.

How it Works

  1. Input parsing - Validates the url field..
  2. Original transcripts - Optionally retrieves additional caption tracks in the original spoken languages you specify.
  3. Result storage - Return a single, self-contained JSON object.

Input

FieldTypeRequiredDefaultDescription
urlstringYes-Full YouTube URL or just the 11-char video ID.
preferredLangsstring | string[]No["en"]Language codes (ISO-639-1) to try for translated captions, in order of preference.
preferredOrigLangstring | string[]No[]Languages to retrieve original captions for. Useful for bilingual corpora.

Example Input

{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"preferredLangs": ["en"],
"preferredOrigLang": ["ro", "fr"]
}

Output

The Actor returns a single record with the following schema (keys not in camelCase are explicitly preserved for clarity):

KeyTypeDescription
video_idstring11-character YouTube video ID.
titlestringVideo title.
channel_idstringUploader’s channel ID.
channel_namestringChannel display name.
upload_datestring (RFC-3339)Original publish date in UTC.
urlstringCanonical watch URL.
duration_secondsintegerVideo length.
view_countintegerPublic view counter at crawl time.
translated_languagestringLanguage code actually used for the translated transcript.
translated_transcriptarraySequential caption segments with start, end, and text.
original_transcriptsobject → arrayMap keyed by language code; each value is an array of segments (same shape as above).

Sample (truncated)

{
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Video)",
"channel_id": "UCuAXFkgsw1L7xaCfnd5JJOw",
"channel_name": "Rick Astley",
"upload_date": "2009-10-25T07:57:33+00:00",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"duration_seconds": 212,
"view_count": 138562947,
"translated_language": "en",
"translated_transcript": [
{ "start": "0.000", "end": "6.839", "text": "We're no strangers to love…" }
],
"original_transcripts": {
"ja": [
{ "start": "0.000", "end": "6.839", "text": "愛に不慣れじゃない…" }
]
}
}