YouTube Transcript API & MCP Server | Audio to Text
Pricing
from $50.00 / 1,000 transcript (mcp)s
YouTube Transcript API & MCP Server | Audio to Text
Extract full verbatim transcripts from any YouTube video, even without captions. Works as a standalone batch API or an MCP server for AI agents. Pay per transcript.
Pricing
from $50.00 / 1,000 transcript (mcp)s
Rating
0.0
(0)
Developer
Andok
Actor stats
1
Bookmarked
1
Total users
1
Monthly active users
21 days ago
Last modified
Categories
Share
YouTube Transcript MCP Server for AI Agents
Give your AI agents real-time access to YouTube video transcripts through the Model Context Protocol. Unlike caption-based scrapers, this actor uses Google Gemini to transcribe directly from audio — so it works on videos without subtitles, with poor auto-captions, or in any language. Connect it to Claude, GPT, or any MCP-compatible agent and let them pull transcripts on demand.
Features
- Works on any video — AI-powered transcription from audio, not dependent on YouTube captions
- MCP-native — exposes a
transcribe_youtubetool via the Model Context Protocol for agent integration - Speaker identification — automatically labels different speakers in the transcript
- Verbatim output — every spoken word captured, no summarization or shortcuts
- One-shot mode — also runs as a standard Apify Actor for single-video batch processing
- Bring your own key — uses your Google Gemini API key (free tier available)
MCP Setup
Add this server to any MCP-compatible AI agent's configuration:
{"mcpServers": {"youtube-transcript": {"type": "url","url": "https://actors-mcp-server.apify.actor/sse","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
MCP Tool: transcribe_youtube
| Parameter | Type | Required | Description |
|---|---|---|---|
videoUrl | string | Yes | YouTube video URL or video ID to transcribe |
includeMetadata | boolean | No | Include word count and speakers list in the response |
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
geminiApiKey | string | Yes | — | Your Google Gemini API key for AI transcription. Get one free at Google AI Studio. |
videoUrl | string | No | — | YouTube URL for one-shot mode. When provided with the API key, runs a single transcription instead of starting the MCP server. |
Input Example (One-Shot Mode)
{"geminiApiKey": "your-gemini-api-key","videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw"}
Output
Each transcription produces a structured result with the full transcript text, word count, and speaker labels.
Key output fields:
| Field | Type | Description |
|---|---|---|
videoUrl | string | The video URL that was transcribed |
title | string | Video title |
transcript | string | Full verbatim transcript with speaker labels |
wordCount | number | Total word count of the transcript |
speakers | array | List of identified speakers |
processedAt | string | ISO timestamp of when the transcription completed |
Output Example
{"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw","title": "Introduction to Machine Learning — Full Course","transcript": "Speaker 1: Welcome to today's lecture on machine learning fundamentals. We'll start with supervised learning. Speaker 2: Thanks for having me. Let me walk through the key concepts...","wordCount": 4820,"speakers": ["Speaker 1", "Speaker 2"],"error": null,"processedAt": "2026-03-01T20:00:00.000Z"}
Pricing
Pay per event on Apify platform.
| Event | Description |
|---|---|
| Transcript (MCP) | One video transcribed via AI |
Note: You also pay for Gemini API usage through your own API key. The Gemini free tier covers a generous amount of transcription.
Trying It Out
To test the actor before committing to a workflow:
- Get a free Gemini API key at Google AI Studio
- Run the actor in one-shot mode with a short video URL
- Review the transcript output in the Apify Console dataset view
For MCP integration, configure your AI agent using the MCP setup above and provide your Apify token for authentication.
Use Cases
- AI agent research — let agents pull video transcripts on demand during multi-step reasoning
- Uncaptioned video access — transcribe videos that have no captions or only poor auto-generated ones
- Multilingual transcription — get transcripts from videos in any spoken language
- Podcast analysis — extract full conversations with speaker labels for sentiment or topic analysis
- Accessibility workflows — generate transcripts for video libraries that lack subtitles
Related Actors
| Actor | What it adds |
|---|---|
| YouTube Transcript Scraper | Faster, free caption extraction when videos already have subtitles — no API key needed |
| YouTube Video Metadata Extractor | Get views, likes, tags, and SEO data to complement your transcripts |
| YouTube Subscriber Counter | Track channel growth metrics alongside video-level transcript data |
Supported URL Formats
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://youtube.com/embed/VIDEO_ID- Just the video ID:
dQw4w9WgXcQ
Notes
- Transcription quality depends on audio clarity. Background music or overlapping speakers may reduce accuracy.
- Very long videos (2+ hours) may take several minutes to process.
- The Gemini API key is never stored — it is used only during the actor run.
