YouTube Transcript Tool Server (MCP-compatible)
Under maintenancePricing
from $0.01 / 1,000 results
YouTube Transcript Tool Server (MCP-compatible)
Under maintenanceFetch YouTube video transcripts + metadata via tool-style invocations using public endpoints (no API key). Tools: video_transcript, video_metadata, channel_videos, playlist_videos.
Apify YouTube Transcript Tool Server (MCP-style)
An Apify Actor that exposes YouTube transcripts + metadata as a small set of tools an LLM agent can call — one Actor run = one tool call. Uses only public endpoints: the watch page HTML, the timedtext caption API, the oEmbed endpoint, and the channel/playlist RSS feeds. No API key required.
Tools
| Tool | Purpose |
|---|---|
video_transcript | Fetch a video's caption track as timestamped segments + merged full text |
video_metadata | Title, author, thumbnail (via oEmbed) |
channel_videos | Latest videos from a channel (RSS feed) |
playlist_videos | Videos from a public playlist (RSS feed) |
Input
{"tool": "video_transcript","args": { "video_id": "dQw4w9WgXcQ", "language": "en", "allow_auto": true }}
Example output: video_transcript
{"video_id": "dQw4w9WgXcQ","language": "en","language_name": "English","is_auto_generated": false,"segments": [{ "start_s": 0.0, "duration_s": 2.0, "text": "Hello world." },{ "start_s": 2.0, "duration_s": 1.5, "text": "Welcome to the talk." }],"full_text": "Hello world. Welcome to the talk."}
If no captions are available the response is { video_id, not_found: true, reason: "no_transcript_available" }.
Example output: channel_videos
{"channel_id": "UCsomechannel","returned": 15,"videos": [{"video_id": "vidaaa11111","title": "First Video","author": "Sample Channel","published_at": "2026-05-01T00:00:00+00:00","watch_url": "https://www.youtube.com/watch?v=vidaaa11111"}]}
Run locally
npm installnpm run buildapify run --input-file=./examples/transcript.json
Tests
npm test
22 tests across test/client.test.ts (watch-page parser + caption picker + json3 parser + RSS parser + oEmbed + retry) and test/tools.test.ts (the four tools end-to-end with a mocked fetch).
How transcript extraction works
- Fetch
https://www.youtube.com/watch?v=<id>with a normal browser UA. - Parse the
ytInitialPlayerResponseJSON embedded in the page. - Walk
captions.playerCaptionsTracklistRenderer.captionTracks[]and pick the best match for the requested language (or fall back to the first available, optionally skipping auto-generated tracks). - Fetch the picked track's
baseUrlwith&fmt=json3to get structured{events: [{tStartMs, dDurationMs, segs}]}data. - Flatten + clean into
[{ start_s, duration_s, text }].
Because this is a scrape of public surfaces, YouTube can break it. The actor logs the failure mode and returns not_found with a reason rather than crashing.
Architecture
src/main.ts Apify entry — reads input, dispatches to runTooltools/index.ts Single dispatcher; one async runner per toolyoutube-client.ts Watch-page scraper + json3 parser + oEmbed + RSS parsertypes.ts Zod schemas for input + outputstest/fixtures.ts Realistic watch HTML + transcript + RSS + oEmbed fixturesclient.test.ts 16 tests on the HTTP/parsing layertools.test.ts 6 tests on the dispatcher
License
MIT