YouTube Transcript API & MCP Server | Audio to Text avatar

YouTube Transcript API & MCP Server | Audio to Text

Pricing

from $50.00 / 1,000 transcript (mcp)s

Go to Apify Store
YouTube Transcript API & MCP Server | Audio to Text

YouTube Transcript API & MCP Server | Audio to Text

Extract full verbatim transcripts from any YouTube video, even without captions. Works as a standalone batch API or an MCP server for AI agents. Pay per transcript.

Pricing

from $50.00 / 1,000 transcript (mcp)s

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

1

Bookmarked

1

Total users

1

Monthly active users

21 days ago

Last modified

Share

YouTube Transcript MCP Server for AI Agents

Give your AI agents real-time access to YouTube video transcripts through the Model Context Protocol. Unlike caption-based scrapers, this actor uses Google Gemini to transcribe directly from audio — so it works on videos without subtitles, with poor auto-captions, or in any language. Connect it to Claude, GPT, or any MCP-compatible agent and let them pull transcripts on demand.

Features

  • Works on any video — AI-powered transcription from audio, not dependent on YouTube captions
  • MCP-native — exposes a transcribe_youtube tool via the Model Context Protocol for agent integration
  • Speaker identification — automatically labels different speakers in the transcript
  • Verbatim output — every spoken word captured, no summarization or shortcuts
  • One-shot mode — also runs as a standard Apify Actor for single-video batch processing
  • Bring your own key — uses your Google Gemini API key (free tier available)

MCP Setup

Add this server to any MCP-compatible AI agent's configuration:

{
"mcpServers": {
"youtube-transcript": {
"type": "url",
"url": "https://actors-mcp-server.apify.actor/sse",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

MCP Tool: transcribe_youtube

ParameterTypeRequiredDescription
videoUrlstringYesYouTube video URL or video ID to transcribe
includeMetadatabooleanNoInclude word count and speakers list in the response

Input

FieldTypeRequiredDefaultDescription
geminiApiKeystringYesYour Google Gemini API key for AI transcription. Get one free at Google AI Studio.
videoUrlstringNoYouTube URL for one-shot mode. When provided with the API key, runs a single transcription instead of starting the MCP server.

Input Example (One-Shot Mode)

{
"geminiApiKey": "your-gemini-api-key",
"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw"
}

Output

Each transcription produces a structured result with the full transcript text, word count, and speaker labels.

Key output fields:

FieldTypeDescription
videoUrlstringThe video URL that was transcribed
titlestringVideo title
transcriptstringFull verbatim transcript with speaker labels
wordCountnumberTotal word count of the transcript
speakersarrayList of identified speakers
processedAtstringISO timestamp of when the transcription completed

Output Example

{
"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
"title": "Introduction to Machine Learning — Full Course",
"transcript": "Speaker 1: Welcome to today's lecture on machine learning fundamentals. We'll start with supervised learning. Speaker 2: Thanks for having me. Let me walk through the key concepts...",
"wordCount": 4820,
"speakers": ["Speaker 1", "Speaker 2"],
"error": null,
"processedAt": "2026-03-01T20:00:00.000Z"
}

Pricing

Pay per event on Apify platform.

EventDescription
Transcript (MCP)One video transcribed via AI

Note: You also pay for Gemini API usage through your own API key. The Gemini free tier covers a generous amount of transcription.

Trying It Out

To test the actor before committing to a workflow:

  1. Get a free Gemini API key at Google AI Studio
  2. Run the actor in one-shot mode with a short video URL
  3. Review the transcript output in the Apify Console dataset view

For MCP integration, configure your AI agent using the MCP setup above and provide your Apify token for authentication.

Use Cases

  • AI agent research — let agents pull video transcripts on demand during multi-step reasoning
  • Uncaptioned video access — transcribe videos that have no captions or only poor auto-generated ones
  • Multilingual transcription — get transcripts from videos in any spoken language
  • Podcast analysis — extract full conversations with speaker labels for sentiment or topic analysis
  • Accessibility workflows — generate transcripts for video libraries that lack subtitles
ActorWhat it adds
YouTube Transcript ScraperFaster, free caption extraction when videos already have subtitles — no API key needed
YouTube Video Metadata ExtractorGet views, likes, tags, and SEO data to complement your transcripts
YouTube Subscriber CounterTrack channel growth metrics alongside video-level transcript data

Supported URL Formats

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://youtube.com/embed/VIDEO_ID
  • Just the video ID: dQw4w9WgXcQ

Notes

  • Transcription quality depends on audio clarity. Background music or overlapping speakers may reduce accuracy.
  • Very long videos (2+ hours) may take several minutes to process.
  • The Gemini API key is never stored — it is used only during the actor run.