Actor Youtube Transcript
Under maintenancePricing
Pay per usage
Actor Youtube Transcript
Under maintenanceExtract transcripts from any YouTube video — no API key needed. Supports batch processing, parallel fetching, auto-retry with residential proxies, and multi-language captions. Output is LLM-ready Markdown, built for RAG pipelines, LangChain, LlamaIndex, and AI automation workflows.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Foudhil Riahi
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
YouTube Transcript Extractor — RAG & AI Ready
Extract clean, structured transcripts from any YouTube video in seconds. No YouTube API key required. Batch-ready. Built for AI pipelines that need to index video content at scale.
Why developers choose this actor
| Feature | This actor |
|---|---|
| API key required | ❌ None needed |
| Batch processing | ✅ Hundreds of videos per run |
| Parallel fetching | ✅ Up to 10 concurrent videos |
| Timestamp support | ✅ Optional [MM:SS] per line |
| Multi-language | ✅ Any language, priority order |
| Auto-retry on block | ✅ 3 attempts, fresh proxy each time |
| LLM-ready output | ✅ Clean paragraphs, no post-processing |
| Cloud IP bypass | ✅ Residential proxy routing built-in |
Input
| Field | Type | Required | Description |
|---|---|---|---|
videoUrl | string | one of the two | Single YouTube URL or video ID |
videoUrls | array | one of the two | List of URLs/IDs for batch mode |
includeTimestamps | boolean | no | Prefix each line with [MM:SS] (default: false) |
languages | array | no | Language priority list (default: ["en","en-US","en-GB"]) |
maxConcurrency | integer | no | Parallel videos, 1–10 (default: 3) |
proxyConfiguration | object | recommended | Use Apify Residential proxies to bypass YouTube cloud IP blocks |
Supported URL formats:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDVIDEO_ID ← bare 11-character ID also works
Output
Each video produces one result object.
Successful transcript:
{"videoId": "jNQXAC9IVRw","youtubeUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw","language": "en","wordCount": 39,"durationMinutes": 0.3,"segmentCount": 9,"transcript": "All right, so here we are in front of the elephants...","status": "success"}
Failed video (disabled captions, private, etc.):
{"videoId": "abc123","youtubeUrl": "https://www.youtube.com/watch?v=abc123","status": "error","error": "Transcripts are disabled for this video"}
| Field | Description |
|---|---|
videoId | 11-character YouTube video ID |
youtubeUrl | Full YouTube URL |
language | Language code of the fetched transcript |
wordCount | Total word count of the transcript |
durationMinutes | Video duration in minutes |
segmentCount | Number of caption segments |
transcript | Clean Markdown text, ready for LLM ingestion |
status | success or error |
error | Error description (only present on failure) |
Example inputs
Single video:
{"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw","proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }}
Batch of videos with timestamps:
{"videoUrls": ["https://www.youtube.com/watch?v=VIDEO_1","https://www.youtube.com/watch?v=VIDEO_2","https://www.youtube.com/watch?v=VIDEO_3"],"includeTimestamps": true,"maxConcurrency": 5,"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }}
Non-English content:
{"videoUrl": "https://www.youtube.com/watch?v=VIDEO_ID","languages": ["fr", "fr-FR", "en"],"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }}
Use cases
- RAG knowledge bases — ingest entire YouTube channels into vector databases (LangChain, LlamaIndex, Pinecone, Weaviate)
- AI video summarization — feed transcripts to GPT-4 / Claude for summaries, key points, action items
- Content repurposing pipelines — convert video content to blog posts, newsletters, social media threads
- Podcast transcription — extract transcripts from YouTube-hosted podcast episodes
- Competitive intelligence — monitor competitor product demos, webinars, conference talks
- Educational tools — index courses and lectures for search and Q&A
- Multilingual pipelines — extract captions in the original language for translation workflows
Code examples
Python (Apify client)
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("foudhilriahi/youtube-transcript-extractor").call(run_input={"videoUrls": ["https://www.youtube.com/watch?v=VIDEO_1","https://www.youtube.com/watch?v=VIDEO_2",],"maxConcurrency": 5,"proxyConfiguration": {"useApifyProxy": True,"apifyProxyGroups": ["RESIDENTIAL"],},})for item in client.dataset(run["defaultDatasetId"]).iterate_items():if item["status"] == "success":print(f"{item['videoId']}: {item['wordCount']} words")print(item["transcript"][:200])
LangChain integration
from langchain_community.utilities import ApifyWrapperapify = ApifyWrapper()loader = apify.call_actor(actor_id="foudhilriahi/youtube-transcript-extractor",run_input={"videoUrls": ["https://www.youtube.com/watch?v=VIDEO_ID"],"proxyConfiguration": {"useApifyProxy": True,"apifyProxyGroups": ["RESIDENTIAL"],},},dataset_mapping_function=lambda item: item.get("transcript", ""),)docs = loader.load()# docs is now ready for your RAG pipeline
n8n / Make automation
- Add an Apify node
- Actor ID:
foudhilriahi/youtube-transcript-extractor - Input:
{ "videoUrl": "{{ $json.youtubeUrl }}", "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } } - Connect output to your vector database, Google Sheets, or email node
- Schedule daily — done
Pricing
Pay Per Event — you only pay for successful transcripts.
| Volume | Price per transcript |
|---|---|
| Any | $0.05 per successful extraction |
Failed videos (disabled captions, private videos, no captions available) are not charged.
Cost example: 1,000 transcripts = $50. At a typical video length of 30 minutes, that's 500,000 minutes of transcribed content for $50.
Notes & limitations
- Videos must have captions available (manually added or auto-generated by YouTube)
- Private and age-restricted videos cannot be transcribed
- YouTube blocks cloud IP ranges — residential proxy configuration is required for reliable operation (pre-filled in the input)
- Auto-generated captions may contain minor transcription errors, especially for technical terms
Keywords
youtube transcript extractor, youtube transcript api, youtube to text, youtube captions download, video transcript, rag youtube, llm video content, youtube to markdown, ai video pipeline, langchain youtube, llamaindex video, youtube transcript python, video indexing ai, batch youtube transcript, youtube captions api