Youtube Scraper

  • streamers/youtube-scraper
  • Modified
  • Users 565
  • Runs 6.8k
  • Created by Author's avatarStreamers

YouTube crawler and video scraper. Alternative YouTube API with no limits or quotas. Extract and download channel name, likes, number of views, and number of subscribers.

Youtube Scraper

This changelog summarizes all changes of the Youtube actors provided by the Streamers organization. If some change affects only a subset of actors, they will be listed in parentheses.

2023-09-12

Features

  • Added basic sorting/filtering for channel videos. More might be coming in the future (streamers/youtube-main, streamers/youtube-channel)

2023-08-28

Fixes

  • For some videos not all comments used to get scraped, this is now fixed (streamers/youtube-comments)

2023-08-17

Fixes

  • Proxy is again used Features
  • You can now apply search filters and sorting in the input (streamers/youtube-main)

2023-08-14

Fixes

  • The scraper now again respects limits, which were broken in the previous release (*). Features
  • The search workflow now can also pick up Shorts and Live videos from the corresponding tabs. Set the appropriate limits for each type of videos.

2023-07-31

Input changes

  • You are allowed to pass 0 to the max limits for shorts/streams (bernardo/youtube-scraper).

2023-07-25

Fixes

  • The scraper now correctly extracts comment count for big videos

2023-07-24

Fixes

  • The scraper can now correctly scrape videos without a description

2023-07-07

Features

  • Now videos have fromYTUrl and inputChannelUrl fields in the output. The first one tells on what page the video was scraped, the other one will point to the channel url as specified in the input (it may differ from channelUrl, although both will lead to the same channel).

2023-07-06

Features

  • Autogenerated channels parsed as "recent" tab

2023-07-04

Fixes

  • Subs are now pushed to dataset items, as they used to be

2023-06-22

Fixes

  • If a channel doesn't exist, the scraper can now detect it, and will not retry to scrape this channel

2023-06-20

Features

  • You can now select in what format to save subtitles: plaintext, vtt, srt or xml

Fixes

  • Now the scraper better handles some subtitle locales (for some of them, it would often not download subtitles, because it couldn't match fr with fr-FR, for example)

2023-06-02

Features

  • You can now input playlist URLs (in the format of https://www.youtube.com/playlist?list=PLObrtcm1Kw6PmbXg8bmfJN-o2Hgx8sidf) and scrape all videos from it.

Fixes

  • You can now submit URLs in the youtu.be/id format. In addition, if you submit some incorrect URLs, they will just be skipped, without exiting the scraper as before.

2023-06-01 (0.0.107)

BREAKING CHANGES

  • Removed dislikes field from the output as they are no longer publicly available.
  • Removed details field from the output, which was a full HTML version of the description. Use text and descriptionLinks instead.

Features

  • Added descriptionLinks field to the output, which contains all links found in the description. Some of them would not be extracted by the text field alone.

Changes

  • The scraper is now significantly faster (and thus cheaper) because it no longer requires a full browser interaction.
  • extendOutputFunction and extendScraperFunction are deprecated. They will still be supported and we will reach out to users that regularly use it before we completely remove them.

Fixes

  • Scrolling through videos can now be restored from any point which makes the scraper much more reliable

2023-04-25

Fixes

  • Video duration is now correctly extracted
  • Description is now correctly extracted

2023-03-29

Update

  • Added new fields to the output in case of processing channelUrl: { "channelTotalVideos": 3200, "channelDescription": "Learn how to speak English with the BBC...", "channelLocation": "United Kingdom", "channelJoinedDate": "Jun 17, 2008", "channelTotalViews": "261,770,375", }

2023-03-29

Feautures

  • Added "saveStreams" feature.

2023-02-22

Features

  • Added thumbnailUrl to video item output

2023-01-13

Fixes

  • Extract the title text only without html
  • Extract the description full urls

2022-11-30

Feautures

  • Added "saveShorts" feature.

2022-07-20

Fixes

  • Correctly handle videos with comments turned off.
  • Add commentsTurnedOff to output.

2022-06-10

Fixes:

  • Channel page without /watch selector

2021-09-15

Features

  • Add possibility to scrape video comments. See maxComments input field.

2021-06-16 Features

  • Revamped subtitles downloading - added possibility to download all available subtitles (availability defined by languages) and to prefer automatically generated subtitles before the user generated ones.

2021-06-14 Features:

  • Add subtitle type to output (extendedOutputFunction). Note: You must set downloadSubtitles variable to true for this feature to take effect.

2021-06-11 Features:

  • Subtitles are now downloadable (saved to KeyValueStore as videoID_languageCode)

2021-05-21 Features:

  • Update SDK

Fixes

  • Random zero results when searching
  • Click consent dialog

2021-04-14 Fixes

  • Fixed changed selector that completely prevented the scrape

2021-03-21 Features:

  • Updated SDK version for session pool changes
  • Add handlePageTimeoutSecs parameter to INPUT_SCHEMA

2021-03-15 Fixes:

  • Fixed selector causing no data scraped
  • Removed stealth causing issues with new layout

2020-09-27

  • Increased waiting timeouts to better handle concurrency
  • Added saving screenshots on errors
  • Better handling of Captchas, a page is automatically retried and the browser is restarted with a new proxy
  • verboseLog is off by default
  • Added info how many videos were enqueued and overall better logging