Actor Youtube Transcript avatar

Actor Youtube Transcript

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Actor Youtube Transcript

Actor Youtube Transcript

Under maintenance

Extract transcripts from any YouTube video — no API key needed. Supports batch processing, parallel fetching, auto-retry with residential proxies, and multi-language captions. Output is LLM-ready Markdown, built for RAG pipelines, LangChain, LlamaIndex, and AI automation workflows.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Foudhil Riahi

Foudhil Riahi

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

YouTube Transcript Extractor — RAG & AI Ready

Extract clean, structured transcripts from any YouTube video in seconds. No YouTube API key required. Batch-ready. Built for AI pipelines that need to index video content at scale.


Why developers choose this actor

FeatureThis actor
API key required❌ None needed
Batch processing✅ Hundreds of videos per run
Parallel fetching✅ Up to 10 concurrent videos
Timestamp support✅ Optional [MM:SS] per line
Multi-language✅ Any language, priority order
Auto-retry on block✅ 3 attempts, fresh proxy each time
LLM-ready output✅ Clean paragraphs, no post-processing
Cloud IP bypass✅ Residential proxy routing built-in

Input

FieldTypeRequiredDescription
videoUrlstringone of the twoSingle YouTube URL or video ID
videoUrlsarrayone of the twoList of URLs/IDs for batch mode
includeTimestampsbooleannoPrefix each line with [MM:SS] (default: false)
languagesarraynoLanguage priority list (default: ["en","en-US","en-GB"])
maxConcurrencyintegernoParallel videos, 1–10 (default: 3)
proxyConfigurationobjectrecommendedUse Apify Residential proxies to bypass YouTube cloud IP blocks

Supported URL formats:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
VIDEO_ID ← bare 11-character ID also works

Output

Each video produces one result object.

Successful transcript:

{
"videoId": "jNQXAC9IVRw",
"youtubeUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
"language": "en",
"wordCount": 39,
"durationMinutes": 0.3,
"segmentCount": 9,
"transcript": "All right, so here we are in front of the elephants...",
"status": "success"
}

Failed video (disabled captions, private, etc.):

{
"videoId": "abc123",
"youtubeUrl": "https://www.youtube.com/watch?v=abc123",
"status": "error",
"error": "Transcripts are disabled for this video"
}
FieldDescription
videoId11-character YouTube video ID
youtubeUrlFull YouTube URL
languageLanguage code of the fetched transcript
wordCountTotal word count of the transcript
durationMinutesVideo duration in minutes
segmentCountNumber of caption segments
transcriptClean Markdown text, ready for LLM ingestion
statussuccess or error
errorError description (only present on failure)

Example inputs

Single video:

{
"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Batch of videos with timestamps:

{
"videoUrls": [
"https://www.youtube.com/watch?v=VIDEO_1",
"https://www.youtube.com/watch?v=VIDEO_2",
"https://www.youtube.com/watch?v=VIDEO_3"
],
"includeTimestamps": true,
"maxConcurrency": 5,
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Non-English content:

{
"videoUrl": "https://www.youtube.com/watch?v=VIDEO_ID",
"languages": ["fr", "fr-FR", "en"],
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Use cases

  • RAG knowledge bases — ingest entire YouTube channels into vector databases (LangChain, LlamaIndex, Pinecone, Weaviate)
  • AI video summarization — feed transcripts to GPT-4 / Claude for summaries, key points, action items
  • Content repurposing pipelines — convert video content to blog posts, newsletters, social media threads
  • Podcast transcription — extract transcripts from YouTube-hosted podcast episodes
  • Competitive intelligence — monitor competitor product demos, webinars, conference talks
  • Educational tools — index courses and lectures for search and Q&A
  • Multilingual pipelines — extract captions in the original language for translation workflows

Code examples

Python (Apify client)

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("foudhilriahi/youtube-transcript-extractor").call(
run_input={
"videoUrls": [
"https://www.youtube.com/watch?v=VIDEO_1",
"https://www.youtube.com/watch?v=VIDEO_2",
],
"maxConcurrency": 5,
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"],
},
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item["status"] == "success":
print(f"{item['videoId']}: {item['wordCount']} words")
print(item["transcript"][:200])

LangChain integration

from langchain_community.utilities import ApifyWrapper
apify = ApifyWrapper()
loader = apify.call_actor(
actor_id="foudhilriahi/youtube-transcript-extractor",
run_input={
"videoUrls": ["https://www.youtube.com/watch?v=VIDEO_ID"],
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"],
},
},
dataset_mapping_function=lambda item: item.get("transcript", ""),
)
docs = loader.load()
# docs is now ready for your RAG pipeline

n8n / Make automation

  1. Add an Apify node
  2. Actor ID: foudhilriahi/youtube-transcript-extractor
  3. Input: { "videoUrl": "{{ $json.youtubeUrl }}", "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } }
  4. Connect output to your vector database, Google Sheets, or email node
  5. Schedule daily — done

Pricing

Pay Per Event — you only pay for successful transcripts.

VolumePrice per transcript
Any$0.05 per successful extraction

Failed videos (disabled captions, private videos, no captions available) are not charged.

Cost example: 1,000 transcripts = $50. At a typical video length of 30 minutes, that's 500,000 minutes of transcribed content for $50.


Notes & limitations

  • Videos must have captions available (manually added or auto-generated by YouTube)
  • Private and age-restricted videos cannot be transcribed
  • YouTube blocks cloud IP ranges — residential proxy configuration is required for reliable operation (pre-filled in the input)
  • Auto-generated captions may contain minor transcription errors, especially for technical terms

Keywords

youtube transcript extractor, youtube transcript api, youtube to text, youtube captions download, video transcript, rag youtube, llm video content, youtube to markdown, ai video pipeline, langchain youtube, llamaindex video, youtube transcript python, video indexing ai, batch youtube transcript, youtube captions api