
YouTube Transcript Scraper
Pricing
$3.00 / 1,000 transcripts
Go to Store

YouTube Transcript Scraper
Cheap, Reliable and best YouTube transcript scraper.
0.0 (0)
Pricing
$3.00 / 1,000 transcripts
0
Total users
6
Monthly users
6
Last modified
12 days ago
This project is a YouTube Transcript Scraper that retrieves transcripts for YouTube videos using the youtube-transcript-api
library. It supports features like proxy management via WebShare, handling rate limits, and concurrent processing of multiple videos. The scraper is designed to work efficiently even when YouTube blocks requests by rotating proxies.
Features
- Retrieve Transcripts: Fetch transcripts for YouTube videos in multiple formats (JSON, SRT, VTT, or plain text).
- Proxy Management: Use WebShare proxies to bypass IP bans and rate limits.
- Concurrency Control: Process multiple videos concurrently with configurable concurrency limits.
- Retry Mechanism: Automatically retry failed requests up to a configurable number of attempts.
- Rate Limiting Handling: Detect and handle rate limits by switching proxies and adding cooldown periods.
- Custom Logging: Minimal logging in non-debug mode with detailed logs available in debug mode.
- Language Preference: Specify preferred languages for transcripts.
Input Parameters
Parameter | Type | Description |
---|---|---|
videoUrls | List | List of YouTube video URLs to process. |
language | String | Preferred language for transcripts (e.g., en for English). |
includeAutoGenerated | Boolean | Whether to include auto-generated transcripts. |
outputFormat | String | Output format for transcripts (json , txt , srt , or vtt ). |
maxConcurrency | Integer | Maximum number of videos to process concurrently. |
minDelayBetweenVideos | Integer | Minimum delay (in seconds) between processing batches of videos. |
cooldownPeriod | Integer | Cooldown period (in seconds) when rate limits are detected. |
youtubeApiKey | String | YouTube Data API key for fetching video metadata (optional). |
debugMode | Boolean | Enable debug mode for detailed logging. |
Example Input
1{ 2 "videoUrls": ["https://youtu.be/oFqwcngsts8", "https://youtu.be/another_video_id"], 3 "language": "en", 4 "includeAutoGenerated": true, 5 "outputFormat": "json", 6 "maxConcurrency": 3, 7 "minDelayBetweenVideos": 5, 8 "cooldownPeriod": 60, 9 "youtubeApiKey": "YOUR_YOUTUBE_API_KEY", 10 "useWebshareProxy": true, 11 "debugMode": false 12}
Output
1[ 2 { 3 "videoId": "oFqwcngsts8", 4 "url": "https://www.youtube.com/watch?v=oFqwcngsts8", 5 "success": true, 6 "language": "en", 7 "transcript": [ 8 { 9 "start": 0.2, 10 "duration": 1.233, 11 "text": "Hello there." 12 }, 13 { 14 "start": 1.433, 15 "duration": 2.7, 16 "text": "You see that? Those are three" 17 }, 18 { 19 "start": 4.133, 20 "duration": 3, 21 "text": "AI agents all chasing a cuboid." 22 }, 23 { 24 "start": 7.7, 25 "duration": 3.783, 26 "text": "Now, they have already been trained\non this channel before." 27 }, 28 { 29 "start": 12.416, 30 "duration": 2.6, 31 "text": "And today we have something" 32 }, 33 { 34 "start": 15.016, 35 "duration": 3, 36 "text": "even more exciting." 37 }, 38 { 39 "start": 24.016, 40 "duration": 1.867, 41 "text": "This is a centipede." 42 }, 43 { 44 "start": 25.883, 45 "duration": 3.717, 46 "text": "A multi-legged organism\nbelonging to family myriapods." 47 }, 48 { 49 "start": 29.633, 50 "duration": 5.45, 51 "text": "At the moment, it is nothing\nbut a 3D model capable of nothingness." 52 }, 53 { 54 "start": 35.1, 55 "duration": 3.116, 56 "text": "In order to bring it alive,\nit needs a brain." 57 }, 58 { 59 "start": 38.25, 60 "duration": 3, 61 "text": "Ladies and gentlemen, let's call the brain" 62 }, 63 { 64 "start": 41.25, 65 "duration": 3, 66 "text": "a neural network." 67 }, 68 { 69 "start": 48.616, 70 "duration": 0.734, 71 "text": "Today" 72 }, 73 { 74 "start": 49.35, 75 "duration": 4.383, 76 "text": "we are using a network with 2048 nodes\nand two layers." 77 }, 78 { 79 "start": 53.75, 80 "duration": 2.7, 81 "text": "We will train it\nusing a classic reinforcement" 82 }, 83 { 84 "start": 56.45, 85 "duration": 3.033, 86 "text": "algorithm called proximal policy\noptimization." 87 }, 88 { 89 "start": 59.516, 90 "duration": 1.65, 91 "text": "Now hold your horses, folks." 92 }, 93 { 94 "start": 61.166, 95 "duration": 4.384, 96 "text": "You see, this centipede\nmodel is very clean and realistic." 97 }, 98 { 99 "start": 65.566, 100 "duration": 5.05, 101 "text": "Too clean, I must say. The first prototype\nwas nowhere near this accurate." 102 }, 103 { 104 "start": 70.65, 105 "duration": 3.833, 106 "text": "In fact, it was an absolute nightmare." 107 }, 108 { 109 "start": 79.5, 110 "duration": 2.4, 111 "text": "Ladies and gentlemen, introducing." 112 }, 113 { 114 "start": 81.9, 115 "duration": 0.916, 116 "text": "He don't bite." 117 }, 118 { 119 "start": 82.816, 120 "duration": 4.667, 121 "text": "This mutation has seven body segments\nand 14 elongated limbs." 122 }, 123 { 124 "start": 87.5, 125 "duration": 3.716, 126 "text": "We will now attempt to train this creature\nto walk." 127 }, 128 { 129 "start": 91.25, 130 "duration": 4.716, 131 "text": "To achieve this, we will give the AI\nfull body control over all of its limbs." 132 }, 133 { 134 "start": 95.966, 135 "duration": 3, 136 "text": "Precisely two degrees of freedom\nper leg in the" 137 }, 138 { 139 "start": 98.966, 140 "duration": 3, 141 "text": "Y and Z rotational axis is." 142 }, 143 { 144 "start": 102.35, 145 "duration": 3.4, 146 "text": "We will reward the\nAI for matching the arrows direction" 147 }, 148 { 149 "start": 105.75, 150 "duration": 3.733, 151 "text": "and maintaining velocity\nas fast as it possibly can." 152 }, 153 { 154 "start": 109.5, 155 "duration": 3.05, 156 "text": "For self-awareness,\nwe will feed the neural network" 157 }, 158 { 159 "start": 112.55, 160 "duration": 3.25, 161 "text": "the XYZ,\nthe angles of each controllable bone." 162 }, 163 { 164 "start": 115.8, 165 "duration": 2.75, 166 "text": "This helps the agent understand\nits current pose." 167 }, 168 { 169 "start": 118.55, 170 "duration": 3.216, 171 "text": "The angles will be stacked twice,\nso the AI can also" 172 }, 173 { 174 "start": 121.766, 175 "duration": 3, 176 "text": "infer angular velocities over time." 177 }, 178 { 179 "start": 125.666, 180 "duration": 3.05, 181 "text": "Each leg is equipped\nwith a ground detection raycast." 182 }, 183 { 184 "start": 128.716, 185 "duration": 3.9, 186 "text": "This provides the agent\nlive data of its distance to the ground." 187 }, 188 { 189 "start": 133.933, 190 "duration": 0.667, 191 "text": "Gentlemen and" 192 }, 193 { 194 "start": 134.6, 195 "duration": 3.333, 196 "text": "ladies, we are now ready\nfor our first training session." 197 }, 198 { 199 "start": 137.933, 200 "duration": 5.283, 201 "text": "Let's find out how such a creature\nwould walk if one were to exist in nature." 202 }, 203 { 204 "start": 230.116, 205 "duration": 1.35, 206 "text": "Well, that was dramatic." 207 }, 208 { 209 "start": 231.466, 210 "duration": 1.784, 211 "text": "Do you know what's even more dramatic?" 212 }, 213 { 214 "start": 233.25, 215 "duration": 3, 216 "text": "The second prototype." 217 }, 218 { 219 "start": 237.983, 220 "duration": 1.35, 221 "text": "Our second prototype" 222 }, 223 { 224 "start": 239.333, 225 "duration": 4.117, 226 "text": "is much closer to a real life\ncentipede, boasting 21" 227 }, 228 { 229 "start": 243.45, 230 "duration": 4.033, 231 "text": "body segments\nwith a whopping 42 individual legs." 232 }, 233 { 234 "start": 247.5, 235 "duration": 4.466, 236 "text": "We will train this prototype\nusing similar methods as the first one." 237 }, 238 { 239 "start": 251.966, 240 "duration": 3.384, 241 "text": "Comrades, let's now train\na real centipede." 242 }, 243 { 244 "start": 332.416, 245 "duration": 1.3, 246 "text": "Congratulations!" 247 }, 248 { 249 "start": 333.716, 250 "duration": 5, 251 "text": "Our AI now exhibits locomotion\nsimilar to real life myriapods." 252 }, 253 { 254 "start": 339.233, 255 "duration": 1.983, 256 "text": "A biped has two legs." 257 }, 258 { 259 "start": 341.216, 260 "duration": 3.65, 261 "text": "Quadrupedal\nfour and myriapods have many legs." 262 }, 263 { 264 "start": 344.866, 265 "duration": 3.234, 266 "text": "The resulting movement\nis an excellent achievement." 267 }, 268 { 269 "start": 348.116, 270 "duration": 3.834, 271 "text": "Let me tell you why. You see the goal of an\nAI agent" 272 }, 273 { 274 "start": 351.95, 275 "duration": 4.233, 276 "text": "in a reinforcement learning\nsimulation is to maximize its rewards." 277 }, 278 { 279 "start": 356.216, 280 "duration": 3.45, 281 "text": "We rewarded the\nAI for matching the arrow's direction" 282 }, 283 { 284 "start": 359.666, 285 "duration": 2.934, 286 "text": "and most importantly, maintaining velocity." 287 }, 288 { 289 "start": 362.6, 290 "duration": 3.483, 291 "text": "The resulting gait is very close\nto real life centipedes," 292 }, 293 { 294 "start": 366.083, 295 "duration": 3.417, 296 "text": "and scientifically it's called a metachronal gait." 297 }, 298 { 299 "start": 369.5, 300 "duration": 4.016, 301 "text": "A metachronal gait is a wave-like pattern\nof leg movements" 302 }, 303 { 304 "start": 373.516, 305 "duration": 2.5, 306 "text": "that ripples across the animal's body." 307 }, 308 { 309 "start": 376.016, 310 "duration": 3.834, 311 "text": "The legs don't move all at once,\nbut in a sequence," 312 }, 313 { 314 "start": 379.85, 315 "duration": 4.2, 316 "text": "as some legs push off, others\nprepare to make contact with the ground." 317 }, 318 { 319 "start": 384.05, 320 "duration": 2.516, 321 "text": "This creates an illusion of a wave." 322 }, 323 { 324 "start": 386.566, 325 "duration": 4.717, 326 "text": "What fascinates me is,\ndespite this being a simplified simulation" 327 }, 328 { 329 "start": 391.283, 330 "duration": 5.333, 331 "text": "of a centipede body, the AI still learned\nto move in a metachronal manner." 332 }, 333 { 334 "start": 396.616, 335 "duration": 1.3, 336 "text": "Looks like evolution and" 337 }, 338 { 339 "start": 397.916, 340 "duration": 4.917, 341 "text": "our AI are in harmony\non how a many-legged organisms should walk." 342 }, 343 { 344 "start": 406.616, 345 "duration": 1.384, 346 "text": "Folks programing these" 347 }, 348 { 349 "start": 408, 350 "duration": 3.3, 351 "text": "AIs require knowledge of coding\nand mathematics" 352 }, 353 { 354 "start": 411.316, 355 "duration": 3.067, 356 "text": "concepts which can be difficult to master\non your own." 357 }, 358 { 359 "start": 414.383, 360 "duration": 4.733, 361 "text": "Luckily, it's 2025\nand you don't need an MIT degree" 362 }, 363 { 364 "start": 419.116, 365 "duration": 3.067, 366 "text": "to get started. With Brilliant's\ninteractive lessons," 367 }, 368 { 369 "start": 422.183, 370 "duration": 2.783, 371 "text": "Having fun\nwhile learning has never been easier," 372 }, 373 { 374 "start": 424.966, 375 "duration": 5.2, 376 "text": "they break down complex concepts into bite\nsized, engaging lessons, a method proven" 377 }, 378 { 379 "start": 430.166, 380 "duration": 4.284, 381 "text": "to be six times more effective\nfor learning than boring lecture videos." 382 }, 383 { 384 "start": 434.45, 385 "duration": 3.183, 386 "text": "Their programing courses\nteach you how to think in code" 387 }, 388 { 389 "start": 437.633, 390 "duration": 4.583, 391 "text": "rather than just memorizing syntax\nwith lessons on loops, conditionals," 392 }, 393 { 394 "start": 442.216, 395 "duration": 4.034, 396 "text": "and algorithms that get you building\nreal programs from day one." 397 }, 398 { 399 "start": 446.25, 400 "duration": 5.416, 401 "text": "For those curious about AI, Brilliant's\ncourses on how large language models work" 402 }, 403 { 404 "start": 451.666, 405 "duration": 4.8, 406 "text": "provides fascinating insights into modern\nAI systems, showing you everything" 407 }, 408 { 409 "start": 456.466, 410 "duration": 4.867, 411 "text": "from how models build vocabularies to how\nthey make predictions and generate text." 412 }, 413 { 414 "start": 461.333, 415 "duration": 4.117, 416 "text": "To try, everything brilliant\nhas to offer free for a full 30 days" 417 }, 418 { 419 "start": 465.45, 420 "duration": 4.216, 421 "text": "visit brilliant.org/cozmouz\nor scan the QR code on screen." 422 }, 423 { 424 "start": 469.666, 425 "duration": 3.517, 426 "text": "You will also get 20% off\nan annual premium subscription." 427 }, 428 { 429 "start": 475.583, 430 "duration": 1.433, 431 "text": "It was 3 a.m." 432 }, 433 { 434 "start": 477.016, 435 "duration": 4.184, 436 "text": "and I was done with this project,\nbut something was bugging me." 437 }, 438 { 439 "start": 481.2, 440 "duration": 5.5, 441 "text": "While the model we trained is decent,\nit is very slow, I want faster." 442 }, 443 { 444 "start": 486.716, 445 "duration": 1.584, 446 "text": "So besides leg movement," 447 }, 448 { 449 "start": 488.3, 450 "duration": 3.933, 451 "text": "we will now also allow the\nAI to control its body segments." 452 }, 453 { 454 "start": 492.233, 455 "duration": 0.933, 456 "text": "Remember folks," 457 }, 458 { 459 "start": 493.166, 460 "duration": 4.7, 461 "text": "you enjoy the training montage\nwith delightful music in the background." 462 }, 463 { 464 "start": 497.866, 465 "duration": 4.334, 466 "text": "But for the AI,\nevery second of existence is mere" 467 }, 468 { 469 "start": 502.233, 470 "duration": 3.383, 471 "text": "torture, a negative penalty\nevery time step." 472 }, 473 { 474 "start": 505.616, 475 "duration": 5.9, 476 "text": "To make things worse, we will now\nalso bombard the AI with projectiles." 477 }, 478 { 479 "start": 511.516, 480 "duration": 3.584, 481 "text": "Comrades, this is cozmouz signing out." 482 } 483 ] 484 } 485]
Features and Their Purposes
1. Retrieve Transcripts
- Fetch transcripts for YouTube videos in multiple formats.
- Supports both manual and auto-generated transcripts.
3. Concurrency Control
- Processes multiple videos concurrently to improve efficiency.
- Configurable
maxConcurrency
parameter to control the number of concurrent requests.
4. Retry Mechanism
- Retries failed requests up to 10 times (configurable).
- Handles transient errors like network issues or temporary IP bans.
5. Rate Limiting Handling
- Detects rate limits (HTTP 429) and switches proxies.
- Adds a cooldown period before retrying to avoid further bans.
6. Custom Logging
- Minimal logging in non-debug mode for cleaner output.
- Detailed logs in debug mode for troubleshooting.
7. Language Preference
- Specify preferred languages for transcripts (e.g.,
en
for English,es
for Spanish). - Falls back to auto-generated transcripts if preferred language is unavailable.
Debugging
Enable debug mode by setting the debugMode
parameter to true
in the input JSON. This will:
- Show detailed logs, including errors and stack traces.
- Log sensitive information like proxy details (use with caution).
Error Handling
Common Errors and Solutions
-
YouTube Blocking Requests
- Increase the
cooldownPeriod
to avoid triggering rate limits.
- Increase the
-
No Transcript Found
- Ensure the video has a transcript available.
- Check if the video is restricted or private.