YouTube & Podcast Transcript Extractor avatar

YouTube & Podcast Transcript Extractor

Pricing

from $3.00 / 1,000 youtube transcripts

Go to Apify Store
YouTube & Podcast Transcript Extractor

YouTube & Podcast Transcript Extractor

Extracts existing transcripts from YouTube videos, playlists and channels, plus podcast feeds (Podcasting 2.0). Returns LLM-ready text and timestamped segments.

Pricing

from $3.00 / 1,000 youtube transcripts

Rating

0.0

(0)

Developer

Prooflio AI

Prooflio AI

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

11 days ago

Last modified

Share

Extracts existing transcripts and returns them as clean, LLM-ready text:

  • YouTube — caption tracks (auto-generated or uploaded) from individual videos, whole playlists, or entire channels
  • Podcasts — transcripts declared in the RSS feed via the Podcasting 2.0 <podcast:transcript> tag (SRT, VTT, JSON, or plain text, normalized to plain text)

Each transcript becomes one dataset item with both a full plain-text transcript and optional timestamped segments. Failures are recorded per item, so one bad video never fails the whole run.

This Actor reads transcripts that already exist. It does not transcribe audio. For videos/episodes with no captions, see "Extending" below.

Input

FieldTypeDefaultDescription
videosstring[]YouTube video URLs or 11-char IDs.
playlistsstring[]Playlist URLs or IDs; each is expanded into its videos.
channelsstring[]Channel URLs, @handles, or UC… IDs; uploads are expanded into videos.
podcastFeedsstring[]Podcast RSS feed URLs.
languagestringenPreferred language code; falls back to the first available track.
includeSegmentsbooleantrueInclude timestamped segments alongside the plain text.
maxVideosPerSourceinteger50Cap on videos pulled from each playlist/channel.
maxEpisodesPerFeedinteger10Episodes to process per feed.
proxyConfigurationobjectApify ProxyProxy settings. Residential is strongly recommended for YouTube.
{
"channels": ["https://www.youtube.com/@veritasium"],
"playlists": ["https://www.youtube.com/playlist?list=PLxxxx"],
"maxVideosPerSource": 25,
"language": "en",
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Videos collected from videos, playlists, and channels are de-duplicated before extraction, so the same video is never transcribed (or charged) twice.

Output

One item per transcript. The Output tab shows a curated Overview table (source, title, language, URL, transcript); all fields are available in the "All fields" view.

{
"source": "youtube",
"videoId": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "...",
"language": "en",
"isAutoGenerated": true,
"transcript": "full plain text ...",
"segments": [{ "start": 0.0, "duration": 3.2, "text": "..." }]
}

A note on YouTube blocking

YouTube aggressively blocks datacenter IPs and may serve a bot challenge instead of the page. If you see "YouTube likely served a bot challenge", switch proxyConfiguration to a residential group. This is the single biggest reliability factor for this Actor, and it's also the main cost driver — see pricing notes below.

This Actor accesses publicly available caption tracks and publicly published podcast RSS transcripts. Automated access to YouTube is restricted by its Terms of Service; you are responsible for ensuring your use complies with the terms of any site you target and with applicable law and copyright. Transcripts are the intellectual property of their creators — use the output accordingly.