Changelog
All notable changes to the YouTube Transcript Scraper will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
[1.0.0] - 2024-12-19
Added
- Complete YouTube Transcript API implementation in Node.js
- Support for multiple output formats (JSON, SRT, WebVTT, Text)
- Language fallback support with priority ordering
- Transcript translation functionality
- Comprehensive error handling for various YouTube API errors
- Proxy support for IP blocking issues
- Batch processing of multiple videos
- State persistence for long-running jobs
- Support for different YouTube URL formats (watch, youtu.be, shorts)
- Automatic and manual transcript detection
- HTML formatting preservation option
- Extensive documentation and examples
Features
- Core API: Full implementation of YouTube transcript fetching
- Language Support: Multi-language with fallback mechanism
- Output Formats: JSON, SRT, WebVTT, and plain text
- Translation: Built-in transcript translation support
- Error Handling: Comprehensive error types and messages
- Proxy Support: Built-in proxy configuration for IP blocking
- Batch Processing: Process multiple videos efficiently
- State Management: Persistent state for long-running operations
Technical Implementation
- Built on Apify SDK for actor management
- Uses got-scraping for HTTP requests with anti-detection
- JSDOM for HTML parsing and JavaScript execution
- tough-cookie for cookie management
- xmldom for XML transcript parsing
- Comprehensive error handling and retry logic
Documentation
- Complete README with usage examples
- Input/output format documentation
- Error handling guide
- Language code reference
- Development and deployment instructions
Examples
- Basic usage examples
- Multi-language configuration
- Translation examples
- Different output format examples
- Batch processing examples
- Test script for local development
Migration from Python Version
This Node.js implementation maintains API compatibility with the Python youtube-transcript-api library:
Similar API Structure
const api = new YouTubeTranscriptApi();
const transcript = await api.fetch(videoId, ['en']);
from youtube_transcript_api import YouTubeTranscriptApi
api = YouTubeTranscriptApi()
transcript = api.fetch(video_id, languages=['en'])
Key Differences
- Async/await pattern instead of synchronous calls
- JavaScript object notation instead of Python dataclasses
- Built-in output formatting methods (toSRT, toWebVTT, etc.)
- Apify-specific optimizations for web scraping
Feature Parity
- ✅ Transcript fetching with language support
- ✅ Transcript listing functionality
- ✅ Translation support
- ✅ Error handling
- ✅ Proxy support
- ✅ HTML formatting preservation
- ✅ Multiple output formats (enhanced)
Enhanced Features
- 🆕 Built-in output formatters (SRT, WebVTT, Text)
- 🆕 Apify actor integration
- 🆕 Batch processing capabilities
- 🆕 State persistence
- 🆕 Enhanced error messages
- 🆕 Better proxy handling