
Tiktok Hashtag Scraper
clockworks/tiktok-hashtag-scraper
Scrape TikTok hashtags the fast and easy way. Just add one or more hashtags and the scraper will extract TikTok data on each video that mentions it: URLs, likes, country of creation, video and music metadata, TikTok creator data. Download data as JSON, HTML and use it in your apps and data projects.
This changelog summarizes all changes of the TikTok actors provided by the Clockworks organization. The specific actors that are affected are listed for each change.
2023-11-18
Fixes
- Fixed a bug where scraping comments and profiles fail. (
tiktok-profile
,tiktok-comments
,tiktok-paid
)
2023-09-29
Features
- You can now search for profiles (Accounts) by username. You can pass a list of "Profile Queries" (
profilesQueries
) and set a limit for each query by providing the "Max profiles per query" (maxProfilesPerQuery
). The actor will then search for the profiles and scrape the videos of each profile. (tiktok-profile
)
2023-08-29
Fixes
- Cheerio has been unblocked, you can now scrape posts quickly again, if you request <= 15 videos for
tiktok-hashtag
,tiktok-sound
or <= 30 fortiktok-profile
Features - You can now disable cheerio/HTTP querying in the input by setting
disableCheerioBoost
anddisableEnrichAuthorStats
totrue
. This will help if the problem with quick scraping arises ever again. (tiktok-hashtag
,tiktok-profile
,tiktok-sound
,tiktok-free
,tiktok-paid
)
2023-08-21
Changes
- Temporarily, boosting with Cheerio and additional querying for missing author stats has been disabled. It will be returned soon, once TikTok blocking is bypassed
2023-08-15
Fixes
- The crawler will rotate proxies if a sound is unavailable under the current country (
tiktok-sound
)
2023-08-14
Features
- If a sound is blocked in some country, the scraper will retry (
tiktok-sound
)
2023-08-10
Features
- You can now download slideshow images by toggling a corresponding input (all, except for
tiktok-comments
) - Output videos now have a field telling whether they are slideshows (all, except for
tiktok-comments
) - If fast Cheerio crawler fails to load a page even with retries, the slower crawler will be used as a fallback (all, except for
tiktok-comments
)
2023-07-28
Features
- Output now contains
submittedVideoUrl
in addition towebVideoUrl
. It copies the post url in the input and may differ from thewebVideoUrl
, but both would lead to the same post. Will be present if you input direct post URLs. (tiktok-video
,tiktok-comments-scraper
)
2023-07-26
Fixes
- Videos with sensitive content, which require a login, are now skipped
gracefully (
tiktok-video
,tiktok-comments-scraper
)
2023-07-25
BREAKING CHANGES
- Proxies have been removed from input. Apify's datacenter proxies are always chosen, as they used to be by default (for all scrapers)
Scrape info about private/empty channels
has also been removed from input, and set totrue
by default. If you applied this option and set it tofalse
previously, you should experience no changes (tiktok-profile
,tiktok-paid
,tiktok-free
)
Changes
- The maximum memory is now limited to 4096 MBs for all pay-per-result actors
(
tiktok-hashtag
,tiktok-profile
,tiktok-sound
,tiktok-video
)
2023-07-11
Fixes
- Now URLs in the format of
https://www.tiktok.com/t/.../
are also recognized as post URLs.
2023-07-04
Features
- Now URLs in the format of
vt.tiktok.com
are also supported as post URLs.
2023-06-29
Fixes
- Now correctly utilizes proxy settings when boosting the scraper with Cheerio
and querying for author stats. Previously it would often fail. (
tiktok-profile-scraper
,tiktok-hashtag-scraper
,tiktok-sound-scraper
)
2023-06-26
Fixes
- Fixed a bug when in some cases it would not return any comments. (
tiktok-comments-scraper
) - Fixed a bug during reply counting. Previously it would sometimes stop too early, especially if the requested number is low. (
tiktok-comments-scraper
) - If the author stats are missing, the scraper will now make an additional quick request to the author page to get them.
These stats get cached, so the query is made only one time. (
tiktok-profile-scraper
,tiktok-hashtag-scraper
,tiktok-sound-scraper
)
2023-06-23
Features
- You can now scrape replies. Note that currently it's not guaranteed that all of them are going to get scraped. (
tiktok-comments-scraper
) - Scraping is up to 4x faster if you limit
maxResultsPerPage
to 30 for posts and 15 for hashtags. This is because the scraper can utilize non-browser requests without the need to scroll. (tiktok-profile-scraper
,tiktok-hashtag-scraper
, will be added totiktok-scraper
once it is converted to a price-per-result system). - Added
oldestPostDate
andscrapeLastNDays
to only scrape posts from now up to a certain date. (tiktok-profile-scraper
)
2023-05-25
Features
- You can now scrape videos using some music at URLs like https://www.tiktok.com/music/plan-7214283318660073474 (
tiktok-sound-scraper
)
2023-05-19
Features
- You can now enable the flag in input (
scrapeEmptyChannelInfo
) that will allow you to save info about private or empty channels even if they don't contain any posts (tiktok-profile-scraper
)
2023-04-19
Fixes
- Won't wait for the first XHR response with hashtags if the data in the initial script is enough.
- Doesn't try to retry on 404 pages or private accounts anymore
2023-04-12
Features
- Progress now tracks the last video reached while scrolling at a certain page (both for hashtags and profiles) and the comments scraped for a post. If scrolling fails (e.g. due to a captcha) the crawler will try to restore the scroll at the last video/comment, so as not to scroll all the way down again. If restoration fails though, it should fall back to the old behavior and print a warning.
- Change default
shouldRetryStuckComments
totrue
, since now it's possible to restore scroll in such cases. - When comment or post list crawlers manage to scrape new videos by scrolling,
retryCount
is reset to 0.
Fixes
- The hashtag route now catches the initial XHR response that adds usually up to 15 new videos which previously weren't scraped
- Added failed request handler to post detail route, so that no matter what error happens during comment scrape, it's going to push partial results if retries are exhausted.
2023-04-03
Features
- New option to download cover images. Can also specify an optional KVStore name (shared with video download). Analogous to video download from 2023-03-27. If opted in, will replace the link in
videoMeta.coverUrl
with link to kvStore. Also addedoriginalCoverUrl
pointing to tiktok CDN cover url.
2023-03-27
Features
- New option to download videos. Can also specify optional KVStore name. If opted in, will replace the link in
mediaUrls
with link to kvStore. Will also placedownloadAddr
invideoMeta
pointing to kvstore andoriginalDownloadAddr
pointing to tiktok CDN
2023-03-07
Fixes
- Hotfixed broken post URLs
2023-03-06
Fixes
- Don't stop scrolling if there are still more videos to load (this happened when the initial videos count was less than exptected, now we check dynamically if there are more videos to load)
2023-02-22
Fixes
- Don't get stuck if all videos or comments were already loaded before scrolling (this bug happened when there were less than 20 videos or comments)
2023-02-03
Fixes
- Improved scrolling for videos that got stuck on 30 videos loaded. It is still a bit clunky and there is a lot of blocking but should work with few retries.
2023-01-27
Features
loginCookies
are no longer required for scraping comments- The comment crawling is rewritten to scrape from underneath the post
- no login sessions are created or managed anymore by the actor
loginCookies
on input are deprecated and generate warning in the log
2022-8-14
Features
- Output was updated to include more properties. New properties:
{ "locationCreated": "CA", "isAd": false, "isMuted": false, "authorMeta": { "bioLink": "https://www.thewhiskyexplorer.ca", "commerceUserInfo": { "commerceUser": true, "category": "Food/Beverage", "categoryButton": false }, "isUnderAge18": false, "privateAccount": false, "region": "CA", "roomId": "", "ttSeller": false }, "musicMeta": { "coverMediumUrl": ..., "musicId": "7105825676251351814" }, "videoMeta": { "coverUrl": ..., "definition": "720p", "format": "mp4" }, }
2022-2-1
Features
- Add the possibility to scrape comments under login using loginCookies
- Add login session management to avoid blocking of the account used for scraping comments
- Searching hashtag and number of views for this hashtag are now stored in the output
Fixes
- Actor does not deduplicate videos for different hashtags - improves the accuracy of the number of outputted items
- Improved logs
2022-1-14
Fixes
- fixed scraping of
authorMeta
data from scripts - affected: post urls and first batch of videos for hashtag and profile
2022-1-10
Fixes
- updated scraping of hashtags and profiles - TikTok randomly displays two types of scripts with data
- fixed number of output items to be the same as
resultsPerPage
- fixed
Timed out error
when waiting for the xhr response with data - the scraper now scrolls until it receives the response or the waiting/scrolling times out - more readable error messages
- fixed progress caching
- empty strings are no longer accepted as
hashtag
,postUrl
orprofile
2022-1-4
Fixes
- updated scraping of individual posts - TikTok randomly displays two types of scripts with data
2021-10-20
Fixes
- when
page.waitingForResponse
timeouts, it retires the session and restarts browser. This should prevent looping of timeouts on request retries
Features
- TikTok sometimes sends a request for the same data two times. This behavior won't affect total number of outputted data, specified on input. (Also duplicity videos for a hashtag/profile searches will be scraped only once, but won't be counted into the number of outputted data for the specific search)
- Sometimes there are more than 6 videos loaded on the search page. The scraper won't push them into the outputted results, so that the number of results remains consistent according to the specification on input.
2021-10-18
Fixes
- computation of
outputLength
is no longer dependent on persisted progress, meaning scraping of more than one hashtag/profile is now working properly
2021-10-15
Fixes
handlePageFunction
does not timeout whenresultsPerPage
are set low
2021-10-14
Features
- New output structure
- Added the possibility to scrape more than the first page of results (regulated by
resultsPerPage
input) - Scrapes user profiles defined on input by username in
profiles
- Added optional attributes
maxConcurrency
,maxRequestRetries
andresultsPerPage
to input - If
resultsPerPage
is not specified, it defaults to 10 and minimal value is 1
Developer
Maintained by Apify
Actor stats
- 668 users
- 5k runs
- Modified about 17 hours ago
Categories