YouTube Scraper

  • bernardo/youtube-scraper
  • Modified
  • Users 6k
  • Runs 91.3k
  • Created by Author's avatarBernard O.

YouTube crawler and video scraper. Alternative YouTube API with no limits or quotas. Extract and download channel name, likes, number of views, and number of subscribers.

2023-04-25

Fixes

  • Video duration is now correctly extracted
  • Description is now correctly extracted

2023-03-29

Update

  • Added new fields to the output in case of processing channelUrl: { "channelTotalVideos": 3200, "channelDescription": "Learn how to speak English with the BBC...", "channelLocation": "United Kingdom", "channelJoinedDate": "Jun 17, 2008", "channelTotalViews": "261,770,375", }

2023-03-29

Feautures

  • Added "saveStreams" feature.

2023-02-22

Features

  • Added thumbnailUrl to video item output

2023-01-13

Fixes

  • Extract the title text only without html
  • Extract the description full urls

2022-11-30

Feautures

  • Added "saveShorts" feature.

2022-07-20

Fixes

  • Correctly handle videos with comments turned off.
  • Add commentsTurnedOff to output.

2022-06-10

Fixes:

  • Channel page without /watch selector

2021-09-15

Features

  • Add possibility to scrape video comments. See maxComments input field.

2021-06-16 Features

  • Revamped subtitles downloading - added possibility to download all available subtitles (availability defined by languages) and to prefer automatically generated subtitles before the user generated ones.

2021-06-14 Features:

  • Add subtitle type to output (extendedOutputFunction). Note: You must set downloadSubtitles variable to true for this feature to take effect.

2021-06-11 Features:

  • Subtitles are now downloadable (saved to KeyValueStore as videoID_languageCode)

2021-05-21 Features:

  • Update SDK

Fixes

  • Random zero results when searching
  • Click consent dialog

2021-04-14 Fixes

  • Fixed changed selector that completely prevented the scrape

2021-03-21 Features:

  • Updated SDK version for session pool changes
  • Add handlePageTimeoutSecs parameter to INPUT_SCHEMA

2021-03-15 Fixes:

  • Fixed selector causing no data scraped
  • Removed stealth causing issues with new layout

2020-09-27

  • Increased waiting timeouts to better handle concurrency
  • Added saving screenshots on errors
  • Better handling of Captchas, a page is automatically retried and the browser is restarted with a new proxy
  • verboseLog is off by default
  • Added info how many videos were enqueued and overall better logging