CopyThat Actor Caption Direct avatar
CopyThat Actor Caption Direct

Pricing

$0.05 / 1,000 minutes

Go to Apify Store
CopyThat Actor Caption Direct

CopyThat Actor Caption Direct

Developed by

CopyThat

CopyThat

Maintained by Community

Extract YouTube transcripts instantly. Fast, affordable, accurate.

0.0 (0)

Pricing

$0.05 / 1,000 minutes

0

3

2

Last modified

17 hours ago

Caption Direct — Stop Wrestling with yt-dlp and Proxies

Tired of managing yt-dlp, yt-transcript-api, or debugging failed proxy rotations at 2am? Caption Direct handles YouTube transcript extraction so you can focus on building features your users actually want.


The Problem: YouTube Infrastructure Is a Time Sink

Every developer building with YouTube transcripts faces the same headaches:

Real-World Scenario

"A developer at an EdTech startup spends 2 weeks implementing yt-dlp + residential proxies for transcript extraction. Within a month, YouTube updates their API and everything breaks. The team spends another week debugging proxy rotation failures, cookie expirations, and rate limits. Meanwhile, their actual product features are stuck in the backlog."

This is where Caption Direct comes in.


Why Developers Choose Caption Direct

Ship Features, Not Infrastructure

Stop wasting engineering time on:

  • ❌ Maintaining yt-dlp or yt-transcript-api dependencies
  • ❌ Managing residential and datacenter proxy rotations
  • ❌ Debugging YouTube API changes every few months
  • ❌ Handling rate limits, cookie refreshes, and captcha failures

Let CopyThat maintain the code while you build the rest.

🚀 Fast & Reliable

  • Process entire playlists: up to 150 videos per request
  • Typical speed: ~10 seconds for 10 videos, ~60 seconds for 50 videos
  • Auto-translation included: all transcripts automatically translated to English by default
  • Zero speech processing overhead—uses official YouTube captions

🌍 75+ Languages with Auto-Translation

Caption Direct automatically translates all fetched transcripts to English by default—no configuration needed.

  • Non-English captions detected and translated automatically
  • Override with target_language param for any of 77 supported languages
  • Smart language detection prevents unnecessary translation
  • Perfect for multilingual content platforms

Example: Japanese captions → English transcript (automatic) Example: French captions → Spanish transcript (target_language: "es")

💰 Introductory Pricing (Limited Time)

🎉 Launch Special: $0.15 per 1,000 minutes until November 16, 2025

After November 16th, pricing adjusts to standard rates:

  • $16.38 per 1,000 minutes ($0.01638/min)

Lock in launch pricing now—test Caption Direct before the price change.

DurationLaunch Price (until Nov 16)Regular Price (after Nov 16)
10 minutes$0.0015$0.16
30 minutes$0.0045$0.49
1 hour$0.009$0.98
1,000 minutes$0.15$16.38

Who Should Use Caption Direct?

Developers & Builders

  • NLP pipeline builders tired of proxy management
  • Content platforms indexing thousands of videos
  • EdTech developers building searchable lecture libraries
  • API integration teams who don't want to maintain YouTube scraping infrastructure

Product Teams

  • Multilingual platforms needing auto-translation at scale
  • Analytics products processing video transcripts for insights
  • Search engines indexing YouTube content
  • Developer tools integrating YouTube transcripts without headaches

Use Cases That Save Engineering Time

1. EdTech Platform: Searchable Lecture Library

Challenge: Building a course discovery platform that indexes 50,000+ YouTube educational videos. Initial yt-dlp implementation breaks every 2-3 weeks requiring constant maintenance.

Solution: Replace yt-dlp with Caption Direct API.

Results:

  • Engineering time saved: 40 hours/month (no more proxy debugging)
  • Processing speed: 150 videos every ~2.5 minutes
  • Uptime: 99.9% (vs. 85% with self-managed yt-dlp)
  • Team focus: Shipped 3 new features while transcripts "just worked"

2. Content Discovery Startup: Multilingual Indexing

Challenge: Index cooking videos from 12 languages for recipe extraction. Current system: 3 different APIs + manual translation management + constant failures.

Solution: Single Caption Direct integration with auto-translation.

Investment: $16.38 per 1,000 minutes (after promo)

Results:

  • Reduced API integrations: 3 → 1
  • Auto-translation: Zero configuration (detects non-English automatically)
  • Processing: 10,000 videos/week with no manual intervention
  • Cost savings: 60% cheaper than previous multi-API setup

3. Developer Tool: Video Transcript API

Challenge: Offering YouTube transcript extraction as a developer API. Managing proxies for 500+ users costs $2,000/month and requires 2 dedicated engineers.

Solution: White-label Caption Direct as backend infrastructure.

Investment: Pay-per-use pricing (no infrastructure costs)

Results:

  • Infrastructure costs: $2,000/month → $0
  • Engineering team: 2 → 0 (reallocated to product features)
  • Reliability: 99.9% uptime guarantee
  • Customer satisfaction: 45% increase (fewer failures)

How It Works

┌──────────────────┐
│ YouTube URL
└────────┬─────────┘
┌─────────────────────┐
│ Caption Extraction │
(official CC)
└─────────┬───────────┘
┌──────────────────┐
│ Language Detection│
└──────┬───────────┘
┌──────────────┐
│ Translation │
(if needed)
└──────┬───────┘
┌──────────────────┐
│ Timestamped JSON
└──────────────────┘

Key Advantage: No proxy management, no API maintenance, no YouTube API changes breaking your code. You get clean JSON transcripts. We handle everything else.


Input

Basic Usage

{
"url": "https://www.youtube.com/watch?v=VIDEO_ID"
}

Process Playlists

{
"url": "https://www.youtube.com/playlist?list=PLxxx",
"offset": 0,
"limit": 150
}

Custom Output Language

{
"url": "https://www.youtube.com/watch?v=VIDEO_ID",
"target_language": "es"
}

Parameters

ParameterTypeDefaultDescription
urlstringrequiredYouTube video, playlist, or channel URL
target_languagestring"en"Output language (auto-translates from source). Supports 77 languages
languagesstring[]["en"]"Preferred source caption languages to search
offsetnumber0Playlist pagination: skip first N videos
limitnumber150Playlist pagination: max videos per request (max: 150)

Output

{
"videos": [
{"id": "VIDEO_ID", "title": "...", "duration": 1222}
],
"transcripts": [
{
"video_id": "VIDEO_ID",
"segments": [
{"start": 8.32, "end": 10.749, "text": "transcript text here"}
],
"coverage_pct": 1.0,
"source": "captions"
}
],
"pagination": {
"total": 300,
"offset": 0,
"limit": 150,
"returned": 150
}
}

Performance

MetricValue
Speed (English, no translation)~1-2s per video, ~10s for 10 videos, ~75s for 50 videos, ~4 min for 150 videos
Speed (with translation)~2-3s per video
Languages77 (auto-translation included)
Cost (until Nov 16)$0.15 per 1,000 minutes
Cost (after Nov 16)$16.38 per 1,000 minutes
Reliability99.9% uptime

Real-World Examples

Processing 50-video playlist (avg 15 min/video) - English transcripts:

  • Total duration: 750 minutes
  • Processing time: ~75 seconds
  • Cost (launch pricing): $0.1125
  • Cost (regular pricing): $12.29

Processing 3-video playlist (avg 30 min/video) - Translated to Vietnamese:

  • Total duration: 90 minutes
  • Processing time: ~8 seconds (includes translation time)
  • Cost (launch pricing): $0.0135
  • Cost (regular pricing): $1.47

Supported Languages (77 Total)

All transcripts automatically translated to English by default. Override with target_language parameter.

English, Spanish, French, German, Hindi, Portuguese, Italian, Dutch, Russian, Japanese, Chinese, Korean, Arabic, Turkish, Swedish, Danish, Norwegian, Indonesian, Polish, Czech, Ukrainian, Thai, Vietnamese, Hebrew, Greek, Romanian, Bulgarian, Slovak, Croatian, Finnish, Hungarian, Lithuanian, Latvian, Slovenian, Estonian, Afrikaans, Albanian, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Catalan, Welsh, Georgian, Galician, Gujarati, Hausa, Icelandic, Irish, Kannada, Kazakh, Khmer, Lao, Macedonian, Malayalam, Marathi, Malay, Maltese, Mongolian, Burmese, Nepali, Persian, Punjabi, Serbian, Sinhala, Swahili, Tagalog, Tamil, Telugu, Tajik, Urdu, Uzbek, Xhosa, Yiddish, Zulu


Important: Closed Caption Requirement

⚠️ Closed Captions Required

Caption Direct on Apify requires YouTube videos with existing closed captions (CC) or subtitles. This includes:

  • ✅ Manually uploaded captions
  • ✅ Auto-generated captions (YouTube's automatic captioning)
  • ✅ Community-contributed captions

Videos without closed captions cannot be processed on Apify.


🎯 Need ASR for Non-CC Videos?

For YouTube videos without closed captions, visit CopyThat.ch where Caption Direct includes integrated Automatic Speech Recognition (ASR) with additional benefits:

  • No captions required — ASR automatically transcribes audio from any video
  • Better pricing — Volume discounts up to 25% off
  • CLI access — Streamlined command-line interface for developers
  • Token-based bundles — Buy in bulk, save more
  • Enhanced features — Direct API access, webhooks, and automation tools
  • Same quality — Auto-translation to 77 languages

Professional-grade ASR transcription with better pricing and developer tools at CopyThat.ch/developers-api


Getting Started

1. On Apify Console

  1. Visit Caption Direct on Apify
  2. Paste your YouTube URL
  3. Click "Start"
  4. Get transcripts in seconds

2. Via Apify API

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
# Run Caption Direct
run = client.actor('copythat/caption-direct').call(
run_input={"url": "https://www.youtube.com/watch?v=VIDEO_ID"}
)
# Access results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
transcripts = item["transcripts"]
for transcript in transcripts:
print(f"Video: {transcript['video_id']}")
for segment in transcript["segments"]:
print(f"[{segment['start']:.2f}s] {segment['text']}")

3. JavaScript SDK

const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
// Run Caption Direct
const run = await client.actor('copythat/caption-direct').call({
url: 'https://www.youtube.com/watch?v=VIDEO_ID',
target_language: 'en'
});
// Get results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log('Transcripts:', items[0].transcripts);

Why CopyThat?

No Vendor Lock-In

Standard JSON output works with any stack. Move between providers or build your own solution later—no proprietary formats or APIs to migrate from.

Transparent Pricing

  • Launch pricing: $0.15/1K min (until Nov 16)
  • Regular pricing: $16.38/1K min (after Nov 16)
  • Pay only for what you use
  • No monthly minimums or hidden fees

Production-Ready Infrastructure

We handle YouTube API changes, proxy management, rate limiting, and error recovery so you don't have to.


FAQs

Q: What happens after November 16th? A: Pricing changes from $0.15/1K min to $16.38/1K min. Lock in launch pricing now.

Q: How many videos can I process? A: Up to 150 videos per request. Use pagination for larger playlists.

Q: How does auto-translation work? A: Automatically detects non-English content and translates to English (or your specified target_language). Zero configuration needed.

Q: Do I need to manage proxies? A: No. Caption Direct handles all infrastructure.

Q: What if a video doesn't have captions? A: Caption Direct returns empty segments for videos without CC. For ASR support, visit CopyThat.ch.

Q: Can I get transcripts in multiple languages? A: Yes! Set target_language to any of 77 supported languages.

Q: How long does processing take? A: ~10-15 seconds for 10 videos, ~60 seconds for 50 videos. Scales efficiently with batch sizes.


Stop debugging yt-dlp. Start using Caption Direct →

Need study materials? Check out Course Forge

Want dubbed audio? Try Dub Master

Need ASR or advanced features? Visit CopyThat.ch for full API access