Extract Youtube Transcript avatar
Extract Youtube Transcript

Deprecated

Pricing

$5.00 / 1,000 results

Go to Store
Extract Youtube Transcript

Extract Youtube Transcript

Deprecated

Developed by

Jimmy

Jimmy

Maintained by Community

Extract YouTube video captions & subtitles with multi-language support. Get JSON, text, SRT, or WebVTT formats. Flexible timestamp options, metadata control, and robust error handling. Perfect for content creators, researchers, and accessibility needs. 🎬✨

0.0 (0)

Pricing

$5.00 / 1,000 results

0

Total users

1

Monthly users

1

Last modified

22 days ago

YouTube Transcript Extractor

A powerful Apify actor that extracts captions and subtitles from YouTube videos with multi-language support and various output formats.

🚀 Live Actor: https://apify.com/jimbiano/extract-youtube-transcript

Features

  • ✅ Extract transcripts from YouTube videos using video URL or ID
  • ✅ Support for multiple languages with automatic detection
  • ✅ Multiple output formats: JSON, plain text, SRT, and WebVTT
  • ✅ Handle both manual and auto-generated captions
  • ✅ Robust error handling for various edge cases
  • ✅ Preserve or strip HTML formatting
  • ✅ Detailed metadata and timestamps
  • ✅ Translation support (when available)

Input Parameters

ParameterTypeRequiredDescription
videoUrlstringNo*YouTube video URL (e.g., https://www.youtube.com/watch?v=dQw4w9WgXcQ)
videoIdstringNo*YouTube video ID (e.g., dQw4w9WgXcQ)
languagesarrayNoPreferred language codes (e.g., ["en", "es", "fr"])
includeGeneratedbooleanNoInclude auto-generated transcripts (default: true)
includeManualbooleanNoInclude manually created transcripts (default: true)
outputFormatstringNoOutput format: json, text, srt, vtt (default: json)
preserveFormattingbooleanNoPreserve HTML formatting (default: false)
translateTostringNoLanguage code to translate to (optional)
includeTimestampsbooleanNoInclude timing information (default: true)
timestampFormatstringNoTimestamp format: start_only, start_end, all (default: start_only)
includeMetadatabooleanNoInclude metadata information (default: false)
simplifiedOutputbooleanNoReturn only transcript array (default: false)

*Either videoUrl or videoId must be provided.

Supported URL Formats

The actor supports various YouTube URL formats:

  • Standard: https://www.youtube.com/watch?v=VIDEO_ID
  • Short: https://youtu.be/VIDEO_ID
  • Embed: https://www.youtube.com/embed/VIDEO_ID
  • Shorts: https://www.youtube.com/shorts/VIDEO_ID
  • Video ID only: VIDEO_ID

Output Format

JSON Output Examples

Default Output (start timestamp only, no metadata)

{
"success": true,
"videoId": "dQw4w9WgXcQ",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"transcript": [
{
"text": "We're no strangers to love",
"start": 0.5
},
{
"text": "You know the rules and so do I",
"start": 2.8
}
]
}

With All Timestamps and Metadata

{
"success": true,
"videoId": "dQw4w9WgXcQ",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"transcript": [
{
"text": "We're no strangers to love",
"start": 0.5,
"end": 2.8,
"duration": 2.3,
"offset": 0.5
},
{
"text": "You know the rules and so do I",
"start": 2.8,
"end": 4.9,
"duration": 2.1,
"offset": 2.8
}
],
"metadata": {
"videoId": "dQw4w9WgXcQ",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"extractedAt": "2024-01-15T10:30:00.000Z",
"totalSegments": 156,
"totalDuration": 212.5,
"language": "en",
"preserveFormatting": false,
"outputFormat": "json",
"includeTimestamps": true,
"timestampFormat": "all",
"includeMetadata": true
}
}

Simplified Output (transcript array only)

[
{
"text": "We're no strangers to love",
"start": 0.5
},
{
"text": "You know the rules and so do I",
"start": 2.8
}
]

Error Output

{
"success": false,
"videoId": "invalid123",
"videoUrl": "https://www.youtube.com/watch?v=invalid123",
"error": "No transcript found for this video",
"errorType": "NO_TRANSCRIPT_AVAILABLE",
"extractedAt": "2024-01-15T10:30:00.000Z"
}

Usage Examples

Basic Usage

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Extract Specific Languages

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"languages": ["en", "es"],
"outputFormat": "json"
}

Generate SRT Subtitles

{
"videoId": "dQw4w9WgXcQ",
"outputFormat": "srt",
"includeGenerated": true
}

Timestamp Control Examples

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"includeTimestamps": false,
"outputFormat": "json"
}

Simplified Array Output

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"simplifiedOutput": true,
"timestampFormat": "start_only"
}

Full Information with Metadata

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"timestampFormat": "all",
"includeMetadata": true,
"includeTimestamps": true
}

Plain Text Output

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"outputFormat": "text",
"preserveFormatting": false
}

Error Handling

The actor handles various error scenarios gracefully:

  • TRANSCRIPTS_DISABLED: Video has transcripts disabled
  • NO_TRANSCRIPT_AVAILABLE: No transcripts found for the video
  • VIDEO_UNAVAILABLE: Video is private or unavailable
  • AGE_RESTRICTED: Video is age-restricted
  • NETWORK_ERROR: Network connectivity issues
  • RATE_LIMITED: Too many requests (temporary)
  • UNKNOWN_ERROR: Other unexpected errors

Local Development

Prerequisites

  • Node.js 18 or higher
  • npm or yarn

Setup

  1. Clone the repository

  2. Install dependencies:

    $npm install
  3. Run tests:

    $npm test
  4. Run the actor locally:

    $npm start

Testing

The actor includes a test suite that validates core functionality:

$node src/test.js

API Usage

Using Apify API

const ApifyClient = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN',
});
const input = {
videoUrl: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
languages: ['en'],
outputFormat: 'json',
timestampFormat: 'start_only',
includeMetadata: false
};
const run = await client.actor('jimbiano/extract-youtube-transcript').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0]);

Using cURL

curl -X POST https://api.apify.com/v2/acts/jimbiano/extract-youtube-transcript/runs \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"outputFormat": "json",
"timestampFormat": "start_only",
"includeMetadata": false
}'

Using Apify Console

  1. Visit: https://apify.com/jimbiano/extract-youtube-transcript
  2. Click "Try for free"
  3. Enter your input:
    {
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  4. Click "Start" to run the actor

Limitations

  • Depends on YouTube's transcript availability
  • Some videos may not have transcripts
  • Rate limiting may apply for high-volume usage
  • Age-restricted videos may require authentication

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions:

  • Create an issue on GitHub
  • Contact support through Apify Console
  • Check the Apify documentation

Changelog

v1.1.0

  • NEW: Optional timestamp control (includeTimestamps, timestampFormat)
  • NEW: Optional metadata inclusion (includeMetadata)
  • NEW: Simplified output option (simplifiedOutput)
  • NEW: Flexible timestamp formats: start_only (default), start_end, all
  • IMPROVED: More granular control over output structure
  • IMPROVED: Better default settings for cleaner output

v1.0.0

  • Initial release
  • Basic transcript extraction
  • Multi-language support
  • Multiple output formats
  • Error handling