YouTube Scraper avatar
YouTube Scraper

Pricing

$5.00 / 1,000 videos

Go to Store
YouTube Scraper

YouTube Scraper

Developed by

Streamers

Streamers

Maintained by Apify

YouTube crawler and video scraper. Alternative YouTube API with no limits or quotas. Extract and download channel name, likes, number of views, and number of subscribers.

4.6 (29)

Pricing

$5.00 / 1,000 videos

765

Total users

19.7k

Monthly users

3.4k

Runs succeeded

>99%

Issues response

2.5 days

Last modified

a day ago

title

id

url

viewCount

likes

channelName

numberOfSubscribers

duration

Crawlee, the web scraping and browser automation library

g1Ll9OlFwEQ

https://www.youtube.com/watch?v=g1Ll9OlFwEQ

10150

136

Apify

6640

00:03:15

Crawlee for Python: Build reliable crawlers. Fast.

Ejhudr7e-h4

https://www.youtube.com/watch?v=Ejhudr7e-h4

916

23

Apify

6640

00:03:39

Build a Web Scraper from Scratch | JavaScript | Playwright | Crawlee

DOtJEwVsJic

https://www.youtube.com/watch?v=DOtJEwVsJic

4411

109

deejaydev

1680

00:22:43

The data above is synthetic and does not reflect real-world values. View full dataset

XO

Identical runs lacks sometimes the subtitle text property (srt/plaintext)

Open

xobit.net opened this issue
8 days ago

OLD: When I run the same URL but with www, the subtitle is properly included (as srt or plaintext attribute).

NEW: See my comment further down

lukas.prusa avatar

Hi, thanks for opening this issue!

I'm not sure if I properly understand what you mean? I ran both the www. and non-www. version of the URLs and received the subtitles for both runs. Could clarify further or link the affected runs?

Thanks!

XO

xobit.net

7 days ago

I'm sorry I didnt analyze it properly. The problem with missing subtitles seems to be more complicated, not sure what is causing it yet. Is the reason a difference between Web and API runs?

If have two runs (cCdQKqUl4plhJ5Mb0 and OA8ehajbRv589hHg2) with the same input:

{
"downloadSubtitles": true,
"hasCC": false,
"hasLocation": false,
"hasSubtitles": false,
"is360": false,
"is3D": false,
"is4K": false,
"isBought": false,
"isHD": false,
"isHDR": false,
"isLive": false,
"isVR180": false,
"maxResultStreams": 0,
"maxResults": 1,
"maxResultsShorts": 0,
"preferAutoGeneratedSubtitles": false,
"saveSubsToKVS": false,
"startUrls": [
{
"url": "https://youtube.com/watch?v=XSWSLIvSTEU",
"method": "GET"
}
],
"subtitlesFormat": "plaintext",
"subtitlesLanguage": "any"
}

In one run (triggered via API in this case), the output lacks the subtitle text (but has the object):

"subtitles": [
{
"srtUrl": null,
"type": "auto_generated",
"language": "en"
},
{
"srtUrl": null,
"type": "user_generated",
"language": "en-US"
}
],

Log:

2025-05-12T14:18:57.745Z ACTOR: Pulling Docker image of build QEcBFHLESKOMqhI1c from registry.
2025-05-12T14:18:57.855Z ACTOR: Creating Docker container.
2025-05-12T14:18:57.903Z ACTOR: Starting Docker container.
2025-05-12T14:19:00.689Z INFO System info {"apifyVersion":"3.3.2","apifyClientVersion":"2.12.1","crawleeVersion":"3.12.1","osType":"Linux","nodeVersion":"v18.20.8"}
2025-05-12T14:19:01.710Z INFO Starting scraper with startUrls, ignoring searchQueries
2025-05-12T14:19:02.150Z INFO CheerioCrawler: Starting the crawler.
2025-05-12T14:19:09.278Z INFO handling detail url https://youtube.com/watch?v=XSWSLIvSTEU&ucbcb=1
2025-05-12T14:19:12.491Z INFO Fetching subtitles for https://youtube.com/watch?v=XSWSLIvSTEU&ucbcb=1
2025-05-12T14:19:12.891Z WARN Reached global limit of 1 items pushed to the dataset. No more items will be pushed
2025-05-12T14:19:12.986Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2025-05-12T14:19:12.987Z INFO Persisting state for videos...
2025-05-12T14:19:12.988Z INFO Persisting state for comments...
2025-05-12T14:19:13.162Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":10699,"requestsFinishedPerMinute":5,"requestsFailedPerMinute":0,"requestTotalDurationMillis":10699,"requestsTotal":1,"crawlerRuntimeMillis":11327}
2025-05-12T14:19:13.163Z INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}
2025-05-12T14:19:13.233Z INFO Persisting state for videos...
2025-05-12T14:19:13.234Z INFO Persisting state for comments...

Another run (same Input) triggered via Web it has the subtitle:

"subtitles": [
{
"srtUrl": null,
"type": "auto_generated",
"language": "en",
"plaintext": "welcome to the show everyone bernardo\ncastro and tom campbell ... long transcript here"
},
{
"srtUrl": null,
"type": "user_generated",
"language": "en-US",
"plaintext": "Welcome to the show everyone, Bernardo Kastrupp and ... long transcript here"
}
],

Log (same as the other):

2025-05-11T21:20:53.750Z ACTOR: Pulling Docker image of build QEcBFHLESKOMqhI1c from registry.
2025-05-11T21:20:53.873Z ACTOR: Creating Docker container.
2025-05-11T21:20:53.914Z ACTOR: Starting Docker container.
2025-05-11T21:20:55.516Z INFO System info {"apifyVersion":"3.3.2","apifyClientVersion":"2.12.1","crawleeVersion":"3.12.1","osType":"Linux","nodeVersion":"v18.20.8"}
2025-05-11T21:20:56.218Z INFO Starting scraper with startUrls, ignoring searchQueries
2025-05-11T21:20:56.537Z INFO CheerioCrawler: Starting the crawler.
2025-05-11T21:21:01.517Z INFO handling detail url https://youtube.com/watch?v=XSWSLIvSTEU&ucbcb=1
2025-05-11T21:21:03.926Z INFO Fetching subtitles for https://youtube.com/watch?v=XSWSLIvSTEU&ucbcb=1
2025-05-11T21:21:04.328Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2025-05-11T21:21:04.331Z INFO Persisting state for videos...
2025-05-11T21:21:04.333Z INFO Persisting state for comments...
2025-05-11T21:21:04.506Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":7666,"requestsFinishedPerMinute":7,"requestsFailedPerMinute":0,"requestTotalDurationMillis":7666,"requestsTotal":1,"crawlerRuntimeMillis":8180}
2025-05-11T21:21:04.509Z INFO CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}
2025-05-11T21:21:04.566Z INFO Persisting state for videos...
2025-05-11T21:21:04.569Z INFO Persisting state for comments...
lukas.prusa avatar

Oh, thanks for investigating it further. I wasn't thorough enough in my search too...

After checking it out in more detail, it turns out that it's only failing sometimes - about 1 in 5 runs. It isn't affected by the origin of the run, whether it's from an API or the web page. It just seems like it's failing randomly right now.

We will investigate and fix this :)

I will keep you updated here, thanks!