
Youtube Scraper
- streamers/youtube-scraper
- Modified
- Users 565
- Runs 6.8k
- Created by
Streamers
YouTube crawler and video scraper. Alternative YouTube API with no limits or quotas. Extract and download channel name, likes, number of views, and number of subscribers.
This changelog summarizes all changes of the Youtube actors provided by the Streamers organization. If some change affects only a subset of actors, they will be listed in parentheses.
2023-09-12
Features
- Added basic sorting/filtering for channel videos. More might be coming in the future (
streamers/youtube-main
,streamers/youtube-channel
)
2023-08-28
Fixes
- For some videos not all comments used to get scraped, this is now fixed (
streamers/youtube-comments
)
2023-08-17
Fixes
- Proxy is again used Features
- You can now apply search filters and sorting in the input (
streamers/youtube-main
)
2023-08-14
Fixes
- The scraper now again respects limits, which were broken in the previous release (
*
). Features - The search workflow now can also pick up Shorts and Live videos from the corresponding tabs. Set the appropriate limits for each type of videos.
2023-07-31
Input changes
- You are allowed to pass 0 to the max limits for shorts/streams (
bernardo/youtube-scraper
).
2023-07-25
Fixes
- The scraper now correctly extracts comment count for big videos
2023-07-24
Fixes
- The scraper can now correctly scrape videos without a description
2023-07-07
Features
- Now videos have
fromYTUrl
andinputChannelUrl
fields in the output. The first one tells on what page the video was scraped, the other one will point to the channel url as specified in the input (it may differ fromchannelUrl
, although both will lead to the same channel).
2023-07-06
Features
- Autogenerated channels parsed as "recent" tab
2023-07-04
Fixes
- Subs are now pushed to dataset items, as they used to be
2023-06-22
Fixes
- If a channel doesn't exist, the scraper can now detect it, and will not retry to scrape this channel
2023-06-20
Features
- You can now select in what format to save subtitles: plaintext, vtt, srt or xml
Fixes
- Now the scraper better handles some subtitle locales (for some of them, it would often not download subtitles, because it
couldn't match
fr
withfr-FR
, for example)
2023-06-02
Features
- You can now input playlist URLs (in the format of
https://www.youtube.com/playlist?list=PLObrtcm1Kw6PmbXg8bmfJN-o2Hgx8sidf
) and scrape all videos from it.
Fixes
- You can now submit URLs in the
youtu.be/id
format. In addition, if you submit some incorrect URLs, they will just be skipped, without exiting the scraper as before.
2023-06-01 (0.0.107)
BREAKING CHANGES
- Removed
dislikes
field from the output as they are no longer publicly available. - Removed
details
field from the output, which was a full HTML version of the description. Usetext
anddescriptionLinks
instead.
Features
- Added
descriptionLinks
field to the output, which contains all links found in the description. Some of them would not be extracted by thetext
field alone.
Changes
- The scraper is now significantly faster (and thus cheaper) because it no longer requires a full browser interaction.
extendOutputFunction
andextendScraperFunction
are deprecated. They will still be supported and we will reach out to users that regularly use it before we completely remove them.
Fixes
- Scrolling through videos can now be restored from any point which makes the scraper much more reliable
2023-04-25
Fixes
- Video duration is now correctly extracted
- Description is now correctly extracted
2023-03-29
Update
- Added new fields to the output in case of processing channelUrl: { "channelTotalVideos": 3200, "channelDescription": "Learn how to speak English with the BBC...", "channelLocation": "United Kingdom", "channelJoinedDate": "Jun 17, 2008", "channelTotalViews": "261,770,375", }
2023-03-29
Feautures
- Added "saveStreams" feature.
2023-02-22
Features
- Added
thumbnailUrl
to video item output
2023-01-13
Fixes
- Extract the title text only without html
- Extract the description full urls
2022-11-30
Feautures
- Added "saveShorts" feature.
2022-07-20
Fixes
- Correctly handle videos with comments turned off.
- Add
commentsTurnedOff
to output.
2022-06-10
Fixes:
- Channel page without
/watch
selector
2021-09-15
Features
- Add possibility to scrape video comments. See
maxComments
input field.
2021-06-16 Features
- Revamped subtitles downloading - added possibility to download all available subtitles (availability defined by languages) and to prefer automatically generated subtitles before the user generated ones.
2021-06-14 Features:
- Add subtitle type to output (extendedOutputFunction). Note: You must set
downloadSubtitles
variable totrue
for this feature to take effect.
2021-06-11 Features:
- Subtitles are now downloadable (saved to KeyValueStore as
videoID_languageCode
)
2021-05-21 Features:
- Update SDK
Fixes
- Random zero results when searching
- Click consent dialog
2021-04-14 Fixes
- Fixed changed selector that completely prevented the scrape
2021-03-21 Features:
- Updated SDK version for session pool changes
- Add
handlePageTimeoutSecs
parameter to INPUT_SCHEMA
2021-03-15 Fixes:
- Fixed selector causing no data scraped
- Removed stealth causing issues with new layout
2020-09-27
- Increased waiting timeouts to better handle concurrency
- Added saving screenshots on errors
- Better handling of Captchas, a page is automatically retried and the browser is restarted with a new proxy
verboseLog
is off by default- Added info how many videos were enqueued and overall better logging