Youtube Transcript Scraper avatar
Youtube Transcript Scraper

Pricing

from $9.52 / 1,000 results

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

Developed by

Delowar Munna

Delowar Munna

Maintained by Community

Lightning-fast transcript extraction with pay-per-result pricing. Extract comprehensive transcript data from YouTube videos using official APIs. Get paragraph-formatted transcript text, timed segments, and metadata with 15 complete fields in just 1-2 seconds per video.

0.0 (0)

Pricing

from $9.52 / 1,000 results

0

2

2

Last modified

3 days ago

YouTube Transcript Scraper 📝

Lightning-fast transcript extraction with pay-per-result pricing

Extract comprehensive transcript data from YouTube videos using official APIs. Get paragraph-formatted transcript text, timed segments, and metadata with 15 complete fields in just 1-2 seconds per video.

⭐ Why Choose This Scraper?

  • 💰 Pay Per Result: Only pay for successful transcript extractions
  • Fastest: 1-2 seconds per video (3-4x faster than competitors)
  • 🎯 Most Reliable: 99%+ success rate, never blocked by YouTube
  • 📊 Most Complete: 15 comprehensive fields with paragraph formatting
  • 🚀 No Commitment: No monthly fees, use when you need it
  • 💡 Perfect For: All use cases from occasional to high-volume extraction

📊 High-Volume Users (10,000+ transcripts/month)? Contact us for enterprise pricing and volume discounts!


YouTube Transcript Scraper

🚀 Key Features

  • Lightning Fast: 1-2 seconds per video (330x faster than translation-enabled tools)
  • 📝 Paragraph Formatting: Transcript text formatted with natural paragraph breaks (~40 words each)
  • 🌍 Multi-Language Support: Auto-detects transcript language in 30+ languages
  • 🎯 Manual & Auto Captions: Extracts both human-created and auto-generated transcripts
  • 📊 15 Complete Fields: Comprehensive data with metadata and timed segments
  • 🚀 API-Only Architecture: No browser automation = faster, more reliable, no blocking issues
  • 🎬 Video Metadata: Complete video information (title, channel, duration, views, likes)
  • 📦 Multiple URL Formats: Supports full URLs, short URLs (youtu.be), and raw video IDs
  • 📄 Subtitle Ready: Timed segments with millisecond precision for SRT/VTT generation
  • 🌐 Formatted Language Display: Shows "English (en)" instead of "en" for better readability

Best for: Content analysis, SEO optimization, accessibility, research, subtitle generation


🎯 At a Glance

FeatureValue
Speed~1-2s per video (API-only, no browser)
Throughput30-60 transcripts/minute (1,800-3,600/hour)
Fields15 complete fields (100% reliability for transcripts)
FormattingParagraph breaks (~40 words each, \n\n separators)
SegmentsTimestamped (millisecond precision)
ArchitectureAPI-only (Supadata + YouTube Data API v3)
Concurrency20 parallel requests (optimized automatically)

💡 Why This Scraper?

Traditional YouTube transcript tools rely on browser automation which is slow and unreliable. YouTube Transcript Scraper uses official APIs for maximum speed and reliability:

MetricYouTube Transcript ScraperTraditional Browser-Based Tools
Architecture✅ API-only (fast, reliable)❌ Browser automation (slow)
Time per video~1-2s~3-8s (browser overhead)
YouTube blocking✅ Never blocked (API access)❌ Often blocked (bot detection)
Fields extracted15 complete fields5-10 fields
Paragraph formatting✅ Built-in (~40 words/para)❌ Raw text only
Language display✅ "English (en)" formatting❌ "en" only
Timestamp precisionMillisecondsSometimes missing
Transcripts per minute30-6010-20 (browser limits)
Reliability99%+ (API-based)70-85% (blocking, errors)

Performance Advantages:

  • No Browser Overhead: Direct API access = 3x faster than browser-based extraction
  • No Blocking: APIs never trigger YouTube's bot detection
  • Complete Data: Full transcript text + structured segments with timestamps
  • High Reliability: 99%+ success rate for videos with transcripts
  • Business Ready: All data needed for content analysis, SEO, accessibility

📋 Input Parameters

FieldKeyTypeDefaultDescription
Video ReferencesvideoRefsArray[]Full URLs, short URLs (youtu.be), or raw 11-char video IDs

Important Notes:

  • 📝 Video Formats: Accepts watch URLs, short URLs (youtu.be), or raw video IDs
  • 🌍 Language Auto-Detection: Automatically detects and extracts transcripts in the video's original language
  • Performance: Optimized for speed and reliability with API-only architecture
  • 🎯 Zero Configuration: Works immediately - no API keys or setup required

🔧 How It Works

YouTube Transcript Scraper uses a modern API-only architecture for maximum reliability and speed:

Technology Stack

  1. Supadata API (Primary Transcript Extraction)

    • Official third-party transcript API
    • 99%+ reliability for all languages
    • Fast extraction (~500ms per video)
    • No browser automation required
  2. YouTube Data API v3 (Video Metadata)

    • Official Google API for video information
    • Provides title, channel, duration, views, likes
    • Instant metadata retrieval

Why API-Only Architecture?

Traditional browser-based scrapers face many challenges:

  • ❌ YouTube bot detection and blocking
  • ❌ Slow page loading and rendering
  • ❌ High resource usage (Chrome instances)
  • ❌ Frequent 429 rate limit errors
  • ❌ Complex proxy management

Our API-only approach eliminates all these issues:

  • No Blocking: APIs use official access methods
  • 3x Faster: No browser overhead
  • Lower Costs: No proxy or browser infrastructure needed
  • 99%+ Reliability: Direct API access
  • Scalable: Handle high-volume requests easily

📤 Output Schema

Comprehensive Transcript Data: 15 Complete Fields

#FieldTypeDescription
1typeString (const: "video")Record type for filtering
2videoIdStringYouTube video ID (11 characters)
3PageURLStringFull YouTube watch URL
4titleStringVideo title
5channelIdStringChannel ID (UC...)
6transcriptLanguageStringDetected language (e.g., "English (en)", "Spanish (es)")
7transcriptTypeEnum: "manual" / "auto"Manual vs auto-generated captions
8transcriptTextStringFull transcript with paragraph formatting (\n\n breaks)
9segmentsArrayTimed segments (startMs, durMs, text)
10hasTranscriptBooleanWhether transcript was successfully extracted
11durationSecNumberVideo duration in seconds
12publishedAtString (ISO 8601)Video publish date
13fetchedAtString (ISO 8601)Timestamp of extraction
14viewCountNumberTotal views
15likeCountNumberTotal likes

Segment Structure

Each segment in the segments array contains:

FieldTypeDescription
startMsNumberSegment start time (milliseconds)
durMsNumberSegment duration (milliseconds)
textStringSegment text content

Why These Fields Matter:

  • 📝 Complete Transcript: Paragraph-formatted text + structured segments with precise timestamps
  • 🌍 Language Intelligence: Auto-detection with formatted display ("English (en)")
  • 🎬 Video Context: Title, channel, duration, views, likes for complete picture
  • 📄 Subtitle Generation: Segments with timestamps for SRT/VTT creation
  • 💼 Business-Ready: All data needed for content analysis, SEO, accessibility

📊 Output Examples

Output Table View - English Video

Output Table - English Video

Output Table View - Bengali Video

Output Table - Bengali Video

Example Output - Complete Transcript Data (JSON)

English Video Example:

[{
"type": "video",
"videoId": "eLqveVYFWc4",
"PageURL": "https://www.youtube.com/watch?v=eLqveVYFWc4",
"title": "Example Video Title",
"channelId": "UCbo-KbSjJDG6JWQ_MTZ_rNA",
"transcriptLanguage": "English (en)",
"transcriptType": "auto",
"transcriptText": "This is the full transcript text with paragraph formatting...\n\n(Full transcript continues with natural paragraph breaks)",
"segments": [
{
"startMs": 80,
"durMs": 2799,
"text": "First segment text"
},
{
"startMs": 1280,
"durMs": 3280,
"text": "Second segment text"
},
{
"startMs": 2880,
"durMs": 2559,
"text": "Third segment text"
}
],
"hasTranscript": true,
"durationSec": 769,
"publishedAt": "2025-01-20T10:00:00.000Z",
"fetchedAt": "2025-10-28T05:21:51.641Z",
"viewCount": 148755,
"likeCount": 6463
}]

Bengali Video Example:

[{
"type": "video",
"videoId": "UmmMLV5BaKQ",
"PageURL": "https://www.youtube.com/watch?v=UmmMLV5BaKQ",
"title": "Breaking: বেরিয়ে আসছে ...বিমানবন্দরে আগুনের গোপন কাহিনী | বিশ্লেষক: আমিরুল মোমেনীন মানিক",
"channelId": "UCWzOfBhVmuRmAd5bzxXP2CA",
"transcriptLanguage": "Bengali (bn)",
"transcriptType": "auto",
"transcriptText": "আসসালামু আলাইকুম প্রিয় দর্শক আমিরুল মুমিনন মানিক আপনাদের প্রত্যেককে আমন্ত্রণ জানাচ্ছি চেঞ্জ টিভির লাইভ পর্যালোচনায় প্রিয় দর্শক আজ 18ই অক্টোবর 2025 এই মুহূর্তে রাত 8টা বেজে 59 মিনিট আমাদের আলোচনার বিষয় হচ্ছে...\n\n(Full Bengali transcript continues with paragraph formatting)",
"segments": [
{
"startMs": 2560,
"durMs": 4400,
"text": "আসসালামু আলাইকুম প্রিয় দর্শক আমিরুল"
},
{
"startMs": 4960,
"durMs": 5040,
"text": "মুমিনন মানিক আপনাদের প্রত্যেককে আমন্ত্রণ"
},
{
"startMs": 6960,
"durMs": 6960,
"text": "জানাচ্ছি চেঞ্জ টিভির লাইভ পর্যালোচনায়"
}
],
"hasTranscript": true,
"durationSec": 904,
"publishedAt": "2025-10-18T15:26:57Z",
"fetchedAt": "2025-10-27T12:34:01.574Z",
"viewCount": 913141,
"likeCount": 11498
}]

🎬 Quick Start

Example 1: List of YouTube Video URLs

{
"videoRefs": [
"https://www.youtube.com/watch?v=DOtJEwVsJic",
"https://www.youtube.com/watch?v=eLqveVYFWc4",
"https://www.youtube.com/watch?v=gYXaPTDatis",
"https://www.youtube.com/watch?v=ip8FEYOQob0"
]
}

Example 2: Video ID List

{
"videoRefs": [
"dQw4w9WgXcQ",
"jNQXAC9IVRw",
"5oAnKSCP4do"
]
}

Example 3: Mix of Video URLs and IDs

{
"videoRefs": [
"iG9CE55wbtY",
"UyyjU8fzEYU",
"https://youtu.be/jNQXAC9IVRw",
"https://www.youtube.com/watch?v=Cm_Juzt9H2o"
]
}

💪 Performance & Benchmarks

Speed Benchmarks

Video LengthSegmentsProcessing TimeTotal Fields
5 minutes~200~1-2 seconds15
10 minutes~400~1-2 seconds15
15 minutes~600~1-2 seconds15
30 minutes~1200~2-3 seconds15
60 minutes~2400~3-4 seconds15

Throughput Comparison

VideosYouTube Transcript ScraperTraditional Browser Scrapers
10 videos~10-20 seconds~30-60 seconds
50 videos~1-2 minutes~5-8 minutes
100 videos~2-3 minutes~10-15 minutes
500 videos~10-15 minutes~40-70 minutes
1,000 videos~20-30 minutes~80-140 minutes

Why So Fast?

  • ✅ API-only (no browser startup/rendering)
  • ✅ Parallel processing (20 concurrent requests)
  • ✅ No YouTube blocking or retries
  • ✅ Direct API access to transcript data

📚 Use Cases

Content Analysis & Research

  • NLP Analysis: Extract transcripts for sentiment analysis, topic modeling, keyword extraction
  • Market Research: Analyze competitor video content at scale
  • Academic Research: Study video content patterns, themes, and trends
  • Content Summarization: Generate summaries from full transcript text

SEO & Marketing

  • SEO Optimization: Convert video content to text for search engine indexing
  • Content Repurposing: Transform video transcripts into blog posts, articles, social media
  • Keyword Research: Analyze transcript text for keywords and themes
  • Competitive Analysis: Study competitor video strategies through transcript analysis

Accessibility & Subtitles

  • Accessibility: Create text versions of video content for hearing-impaired users
  • Subtitle Generation: Create SRT/VTT subtitle files from transcript segments
  • Caption Management: Extract and manage closed captions for video platforms
  • Multi-Platform: Use transcripts across different platforms and applications

Content Creation

  • Video Editing: Use timestamps to find exact moments in videos
  • Clip Creation: Identify key segments for social media clips
  • Script Analysis: Study successful video scripts and patterns
  • Quality Control: Review video content at scale

❓ FAQ

Q: Do I need to provide API keys? A: No! The actor handles all API integrations automatically. Just provide video URLs.

Q: What if a video doesn't have transcripts? A: The scraper will set hasTranscript: false and still extract available metadata (title, channel, duration, views, likes).

Q: What video URL formats are supported? A: All formats:

  • Full URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
  • Short URL: https://youtu.be/dQw4w9WgXcQ
  • Video ID only: dQw4w9WgXcQ

Q: How accurate are the timestamps? A: Timestamps are millisecond-precision and come directly from YouTube's official transcript data, ensuring high accuracy for subtitle generation.

Q: Does it work with live streams or premieres? A: Only after the video is published and transcripts are available. Live streams and ongoing premieres typically don't have transcripts yet.

Q: How much does it cost to run? A: Approximately $47/month for 1000 videos (Supadata API cost). YouTube Data API v3 is FREE with 10,000 requests/day quota.

Q: Can I translate the transcripts? A: The actor extracts transcripts in the video's original language. For translation, you can use the transcript data with external translation services.


🛠️ Technologies

Built with modern APIs for maximum performance:

  • Supadata API: High-reliability transcript extraction (99%+ success rate)
  • YouTube Data API v3: Official Google API for video metadata
  • Crawlee Framework: Enterprise-grade web crawling with queue management
  • Node.js 18+: Fast async processing and API handling

Why These Technologies?

  • API-Only: No browser overhead, 3x faster than traditional scrapers
  • No Blocking: Official APIs never trigger YouTube's bot detection
  • High Reliability: 99%+ success rate with automatic fallbacks
  • Cost-Effective: Optimal balance of speed, quality, and cost
  • Scalable: Handle high-volume requests efficiently

📋 Best Practices

  1. Start Small: Test with 3-5 videos before bulk processing
  2. Check Availability: Not all videos have transcripts (check hasTranscript field)
  3. Language Auto-Detection: Transcripts are extracted in the video's original language
  4. Paragraph Formatting: Automatic paragraph breaks make transcripts more readable
  5. Subtitle Generation: Use segments with timestamps for creating SRT/VTT files
  6. Export Formats: Download as JSON, CSV, or Excel for further analysis
  7. Filter Results: Filter by hasTranscript: true for analysis workflows
  8. Batch Processing: Process videos in batches of 100-500 for optimal performance
  9. Monitor Costs: Track Supadata API usage (30,000 credits/month on Mega plan)
  10. Backup Data: Save extracted transcripts for future use

📜 Version

v2.5.0 - Production Ready

Current Features:

  • ✅ 15 comprehensive fields per video
  • ✅ Paragraph formatting with natural breaks
  • ✅ Multi-language support (30+ languages)
  • ✅ Millisecond-precision timing
  • ✅ API-only architecture (no browser)
  • ✅ Formatted language display ("English (en)")
  • ✅ Lightning-fast extraction (1-2 seconds per video)
  • ✅ 99%+ reliability with official APIs

🤝 Compliance

  • Intended for legitimate content analysis, SEO, accessibility, and research
  • Extracts only publicly available YouTube transcript data
  • Uses official APIs for data access
  • Designed for content repurposing and business intelligence
  • Respects YouTube's Terms of Service
  • Users responsible for compliance with applicable laws in their jurisdiction

💬 Support

  • Issues: Report via Apify support or GitHub
  • Feature Requests: Contact us with your use case
  • Documentation: Comprehensive examples and guides included

Built with ❤️ for lightning-fast transcript extraction and content analysis