TikTok Discover Scraper
This Actor is paid per event
TikTok Discover Scraper
This Actor is paid per event
Scrape TikTok Discover data. Just add one or more hashtags and the scraper will extract related videos, tag breadcrumbs, similar trends, and subtopics. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.
Do you want to learn more about this Actor?
Get a demoThis changelog summarizes all changes of the TikTok actors provided by the Clockworks organization. The specific actors that are affected are listed for each change.
2024-11-11
Features
- Can now download profile avatars (
tiktok-profile
,tiktok-paid
) and music cover images (tiktok-sound
,tiktok-paid
) - Can now filter profile videos with a "From:" date filter (to scrape old videos)
2024-10-29
Fixes
- Can now scrape slideshows from shortened links again
2024-10-27
Changes
- Slightly decreased the number of collected hashtag videos (up to 25%), but the cost should go around 3 times down (
tiktok-paid
)
2024-10-14
Fixes
- Profile info is now pushed when a user only has pinned videos that do not match date filters
2024-10-11
Fixes
- Fixed rare cases of detecting a profile as "not found", despite it existing, and of collecting 0 videos
2024-10-03
Changes
- Slightly improved waiting times when scraping profiles and post details
2024-09-30
Changes
- Hashtag scraping logic is updated to collect more results, although at a lower speed
Features
- Added extraction of post's "Paid partnership" status under the new
isSponsored
output field- (Works for all input options, except for search by queries, because of a limitation on TikTok's side)
2024-09-20
Fixes
- Profiles can now scrape more than 30 videos (
tiktok-profile
,tiktok-paid
,tiktok-data-extractor
) - Geo-blocking (e.g. for India) is now detected and an attempt to rotate proxies is made
2024-09-17
Fixes
- Captcha when scraping profiles is now quickly detected and RESIDENTIAL proxy is used to bypass it (
tiktok-profile
)
2024-09-11
Features
- Posts now have a profile link under
authorMeta
2024-09-03
BREAKING
- Info for empty profiles is now under
authorMeta
property Fixes input
field is now correct for empty profiles
2024-08-28
Fixes
- Mentions are now correctly extracted
2024-08-19
Fixes
- Decreased cases when less than 100 results would be scraped from a sound page, even though there were more videos (
tiktok-sound
)
2024-08-08
Fixes
- Fixed a very rare case when a profile with just 1 video would get stuck in an infinite retry loop (
tiktok-profile
) Features - A new input option to exclude pinned posts from profiles is now present (
tiktok-profile
,tiktok-paid
,tiktok-data-extractor
)
2024-08-06
Fixes
- Profile scraping has been fixed for certain proxy groups (
tiktok-profile
)
2024-07-26
Changes
- Somewhat improved speed of comment scraping (
tiktok-comments-scraper
)
2024-07-19
Fixes
- Hashtag scraping is more consistent at scraping at least 100 results (if it has this many videos) (
tiktok-hashtag
,tiktok-paid
,tiktok-data-extractor
)
2024-07-16
Fixes
- Hashtag scrape should now again scrape at least 100 results (
tiktok-hashtag
,tiktok-paid
,tiktok-data-extractor
)
2024-07-12
Fixes
- Fixed rare cases when the whole account scrape would fail if a profile contained a faulty video (
tiktok-profile
) - Handled rare cases when profile scraping would get stuck with "No videos returned, but cursor is updated, retrying..." (
tiktok-profile
) - Handled rare cases when profile would timeout waiting for initial user details (
tiktok-profile
)
2024-07-11
Fixes
- Reduced number of cases when search scrapes timed out without results (
tiktok-paid
,tiktok-data-extractor
) - Pinned posts are scraped again
2024-07-04
Fixes
input
output field is now present again for missing profiles (tiktok-profile
)
2024-07-01
Fixes
- Improved success rate for search (
tiktok-paid
,tiktok-data-extractor
)
2024-06-26
Fixes
- Slight improvement of pagination logic (
tiktok-comments
) input
field is back for output items (tiktok-profile
)
2024-06-13
Fixes
- Search scrapes again collect more than one page of results (
tiktok-paid
,tiktok-data-extractor
) - Profiles are now scraped in full again (
tiktok-profile
)
2024-06-11
Fixes
- Updated Crawlee version to avoid "TargetClosedError" error and added retries in case it happens anyway
- Fixed cases when comment scrape would end too early (
tiktok-comments
)
2024-06-10
Fixes
- Improved video download for some proxy groups
- Fixed cases when covers would fail to be downloaded
- Hashtag scrape will now try scraping more results: up to 800-900 for some tags as opposed to 20-50 previously (
tiktok-hashtag
)
2024-06-07
Fixes
- Covers are now stored correctly in the Key-Value store
- Rare cases of scraper thinking video is not found when it is actually found eliminated
2024-06-04
Changes
- If you provide an incorrect URL, the actor won't fail, but will push an error result to the dataset and carry on with valid URLs
2024-05-31
Fixes
- Fixed rare crash occurrence when search crawling fails (
tiktok-search
)
2024-05-29
Fixes
- Fixed account search functionality (
tiktok-paid
,tiktok-data-extractor
) - Fixed scraping of posts from shortened post URLs
2024-05-27
Changes
- During a run, you can see periodic aggregated statistics in the status message: how many profiles/hashtags/etc. have been finished, how many media has been downloaded.
2024-05-28
Changes
- Downloaded media now have the author's name and date posted in the 2nd and 3rd segments of filenames
2024-05-27
Fixes
- Correctly push results for accounts with no videos (
tiktok-profile
)
2024-05-24
Fixes
- A rare occurrence when profile scraping would end prematurely is now fixed (
tiktok-profile
)
2024-05-20
Fixes
- Fixed a bug when the scraper would fail if a profile contained sensitive content (
tiktok-profile
) - Fixed an early scraper finish related to the above bug (
tiktok-profile
) - Fixed scraping of slideshows with
vm.
URL shortened format (tiktok-video
)
2024-05-07
Features
- You can now select country for proxy if you want to scrape geolocked videos under "Proxy settings" section. This will increase costs of scraping (
tiktok-paid
)
2024-05-03
Fixes
- Data duplication is prevented in some cases of post detail scraping (
tiktok-video
)
2024-05-02
Fixes
- Video download links are now not created for slideshow posts
2024-04-30
Fixes
- Fixed faulty video download in some cases
2024-04-16
Fixes
- When downloading videos, fixed cases when the actor would crash from a discovered malformed URL
2024-04-08
Fixes
- Posts with sensitive content are now correctly handled again (
tiktok-video
,tiktok-data-extractor
,tiktok-paid
)
2024-04-05
Fixes
- Improved navigation speed for post scraping (
tiktok-video
,tiktok-data-extractor
,tiktok-paid
)
2024-04-04
Fixes
- Fixed cases when input terms/queries would not be listed in the output items (
tiktok-profile
,tiktok-paid
,tiktok-data-extractor
) - Author info is collected fully, if it is missing
2024-04-03
Fixes
- Fixed a bug when the actor would fail for some profile links that do not exist (
tiktok-profile
) Features - Subtitle download is now supported (where it's present on the video) (tiktok-video, tiktok-paid, tiktok-free, tiktok-sound, tiktok-hashtag, tiktok-profile)
2024-04-01
Changes
- Download logic changes to speed up the process (
*
)
2024-03-25
BREAKING CHANGES
- Max memory limit is now 4GB (
tiktok-data-extractor
) Features - If a profile doesn't have videos that match your date filter, it will push an item to the dataset anyway with account information (
tiktok-profile
)
2024-03-22
Fixes
- Video downloading is restored (tiktok-profile, tiktok-hashtag)
2024-03-21
Fixes
- Cases where search returns captcha are now retried (it used to tell that there are 0 results) (tiktok-search) Features
- Erroneous outputs (e.g. empty profiles or not-found hashtags) now contain
input
field that matches your exact input for that item (e.g. profile name or URL)
2024-03-19
Features
- For comments you now can see a link to the commenter's avatar thumbnail (
tiktok-comments
)
2024-03-06
Fixes
- For some proxy groups the reliability of profile scraping has been improved (tiktok-profile)
2024-03-05
BREAKING CHANGES
- For profiles, hashtags, and posts that are not found, and for search queries that return no results, you will now get an item in the dataset with the URL property and the reason this profile/tag has not been scraped
2024-02-22
Fixes
- Videos are now properly downloaded again (*)
2024-02-21
Fixes
- A rare case of the Actor failing is now fixed (tiktok-comments)
2024-02-19
Fixes
- API boosting has been unblocked
2024-02-12
Changes
- API boosting is temporarily disabled as TikTok has introduced some blocking that is being worked on (tiktok-hashtag, tiktok-profile)
2024-01-31
Fixes
- Hashtag name from input is now again in the output items (
searchHashtag
) (tiktok-hashtag)
2024-01-29
Changes
- New feature for improved scraping speed is now applied by default for 50% of users as part of A/B testing (tiktok-profile, tiktok-hashtag)
2024-01-29
Fixes
- URLs that contain "/photo/" in them are now correctly handled (tiktok-video)
- Several improvements for comment scraping that prevent the actor from getting stuck (tiktok-comments)
2024-01-17
Changes
- Input fields for enabling experimental API scraping have been removed. This behaviour will be enabled by default once it is more polished, and for now you can manualy enable it by passing in
"enableCheerioBoost": true
in the JSON editor.
2024-01-04
Features
- Experimental boosting can now be applied to hashtags. As with profile boosting (see 2023-12-29), tick the option "Boost hashtag scraping" and a faster version will be deployed (
tiktok-hashtag
)
2023-12-29
Features
- Boosting for profiles is back: tick the "Boost profile scraping" option and it will be faster. Its con is that TikTok might suddenly change their website and this way of scraping will likely start failing then. Under the hood it uses API requests as opposed to launching a full-blown browser simulation. (
tiktok-profile
)
2023-12-15
Fixes
- Fixed an issue with validating profile URLs. (
tiktok-profile
,tiktok-comments
,tiktok-paid
,tiktok-free
)
2023-12-15
Fixes
- Fixed bug where waiting for page reload when blocked could crash the actor with
TimeoutError
. (All TikTok actors
)
2023-12-14
Fixes
- Fixed a bug where the Actor is failing to query author data. (
All TikTok actors
) - Fixed issue where the actor sometimes gets stuck (
tiktok-profile
,tiktok-comments
,tiktok-paid
,tiktok-free
)
2023-12-12
Fixes
- Fixed a bug where the Actor is failing to scrape profiles. (
All TikTok actors
)
Changes
- Plain HTTP stopped working for start URLs so instead use full browser for everything. This will make profiles -> posts runs significantly slower and more costly. We are looking into options on how to enable plain HTTP again
2023-11-18
Fixes
- Fixed a bug where scraping comments and profiles fail. (
tiktok-profile
,tiktok-comments
,tiktok-paid
)
2023-09-29
Features
- You can now search for profiles (Accounts) by username. You can pass a list of "Profile Queries" (
profilesQueries
) and set a limit for each query by providing the "Max profiles per query" (maxProfilesPerQuery
). The actor will then search for the profiles and scrape the videos of each profile. (tiktok-profile
)
2023-08-29
Fixes
- Cheerio has been unblocked, you can now scrape posts quickly again, if you request <= 15 videos for
tiktok-hashtag
,tiktok-sound
or <= 30 fortiktok-profile
Features - You can now disable cheerio/HTTP querying in the input by setting
disableCheerioBoost
anddisableEnrichAuthorStats
totrue
. This will help if the problem with quick scraping arises ever again. (tiktok-hashtag
,tiktok-profile
,tiktok-sound
,tiktok-free
,tiktok-paid
)
2023-08-21
Changes
- Temporarily, boosting with Cheerio and additional querying for missing author stats has been disabled. It will be returned soon, once TikTok blocking is bypassed
2023-08-15
Fixes
- The crawler will rotate proxies if a sound is unavailable under the current country (
tiktok-sound
)
2023-08-14
Features
- If a sound is blocked in some country, the scraper will retry (
tiktok-sound
)
2023-08-10
Features
- You can now download slideshow images by toggling a corresponding input (all, except for
tiktok-comments
) - Output videos now have a field telling whether they are slideshows (all, except for
tiktok-comments
) - If fast Cheerio crawler fails to load a page even with retries, the slower crawler will be used as a fallback (all, except for
tiktok-comments
)
2023-07-28
Features
- Output now contains
submittedVideoUrl
in addition towebVideoUrl
. It copies the post url in the input and may differ from thewebVideoUrl
, but both would lead to the same post. Will be present if you input direct post URLs. (tiktok-video
,tiktok-comments-scraper
)
2023-07-26
Fixes
- Videos with sensitive content, which require a login, are now skipped
gracefully (
tiktok-video
,tiktok-comments-scraper
)
2023-07-25
BREAKING CHANGES
- Proxies have been removed from input. Apify's datacenter proxies are always chosen, as they used to be by default (for all scrapers)
Scrape info about private/empty channels
has also been removed from input, and set totrue
by default. If you applied this option and set it tofalse
previously, you should experience no changes (tiktok-profile
,tiktok-paid
,tiktok-free
)
Changes
- The maximum memory is now limited to 4096 MBs for all pay-per-result actors
(
tiktok-hashtag
,tiktok-profile
,tiktok-sound
,tiktok-video
)
2023-07-11
Fixes
- Now URLs in the format of
https://www.tiktok.com/t/.../
are also recognized as post URLs.
2023-07-04
Features
- Now URLs in the format of
vt.tiktok.com
are also supported as post URLs.
2023-06-29
Fixes
- Now correctly utilizes proxy settings when boosting the scraper with Cheerio
and querying for author stats. Previously it would often fail. (
tiktok-profile-scraper
,tiktok-hashtag-scraper
,tiktok-sound-scraper
)
2023-06-26
Fixes
- Fixed a bug when in some cases it would not return any comments. (
tiktok-comments-scraper
) - Fixed a bug during reply counting. Previously it would sometimes stop too early, especially if the requested number is low. (
tiktok-comments-scraper
) - If the author stats are missing, the scraper will now make an additional quick request to the author page to get them.
These stats get cached, so the query is made only one time. (
tiktok-profile-scraper
,tiktok-hashtag-scraper
,tiktok-sound-scraper
)
2023-06-23
Features
- You can now scrape replies. Note that currently it's not guaranteed that all of them are going to get scraped. (
tiktok-comments-scraper
) - Scraping is up to 4x faster if you limit
maxResultsPerPage
to 30 for posts and 15 for hashtags. This is because the scraper can utilize non-browser requests without the need to scroll. (tiktok-profile-scraper
,tiktok-hashtag-scraper
, will be added totiktok-scraper
once it is converted to a price-per-result system). - Added
oldestPostDate
andscrapeLastNDays
to only scrape posts from now up to a certain date. (tiktok-profile-scraper
)
2023-05-25
Features
- You can now scrape videos using some music at URLs like https://www.tiktok.com/music/plan-7214283318660073474 (
tiktok-sound-scraper
)
2023-05-19
Features
- You can now enable the flag in input (
scrapeEmptyChannelInfo
) that will allow you to save info about private or empty channels even if they don't contain any posts (tiktok-profile-scraper
)
2023-04-19
Fixes
- Won't wait for the first XHR response with hashtags if the data in the initial script is enough.
- Doesn't try to retry on 404 pages or private accounts anymore
2023-04-12
Features
- Progress now tracks the last video reached while scrolling at a certain page (both for hashtags and profiles) and the comments scraped for a post. If scrolling fails (e.g. due to a captcha) the crawler will try to restore the scroll at the last video/comment, so as not to scroll all the way down again. If restoration fails though, it should fall back to the old behavior and print a warning.
- Change default
shouldRetryStuckComments
totrue
, since now it's possible to restore scroll in such cases. - When comment or post list crawlers manage to scrape new videos by scrolling,
retryCount
is reset to 0.
Fixes
- The hashtag route now catches the initial XHR response that adds usually up to 15 new videos which previously weren't scraped
- Added failed request handler to post detail route, so that no matter what error happens during comment scrape, it's going to push partial results if retries are exhausted.
2023-04-03
Features
- New option to download cover images. Can also specify an optional KVStore name (shared with video download). Analogous to video download from 2023-03-27. If opted in, will replace the link in
videoMeta.coverUrl
with link to kvStore. Also addedoriginalCoverUrl
pointing to tiktok CDN cover url.
2023-03-27
Features
- New option to download videos. Can also specify optional KVStore name. If opted in, will replace the link in
mediaUrls
with link to kvStore. Will also placedownloadAddr
invideoMeta
pointing to kvstore andoriginalDownloadAddr
pointing to tiktok CDN
2023-03-07
Fixes
- Hotfixed broken post URLs
2023-03-06
Fixes
- Don't stop scrolling if there are still more videos to load (this happened when the initial videos count was less than exptected, now we check dynamically if there are more videos to load)
2023-02-22
Fixes
- Don't get stuck if all videos or comments were already loaded before scrolling (this bug happened when there were less than 20 videos or comments)
2023-02-03
Fixes
- Improved scrolling for videos that got stuck on 30 videos loaded. It is still a bit clunky and there is a lot of blocking but should work with few retries.
2023-01-27
Features
loginCookies
are no longer required for scraping comments- The comment crawling is rewritten to scrape from underneath the post
- no login sessions are created or managed anymore by the actor
loginCookies
on input are deprecated and generate warning in the log
2022-8-14
Features
- Output was updated to include more properties. New properties:
1{ 2 "locationCreated": "CA", 3 "isAd": false, 4 "isMuted": false, 5 "authorMeta": { 6 "bioLink": "https://www.thewhiskyexplorer.ca", 7 "commerceUserInfo": { 8 "commerceUser": true, 9 "category": "Food/Beverage", 10 "categoryButton": false 11 }, 12 "isUnderAge18": false, 13 "privateAccount": false, 14 "region": "CA", 15 "roomId": "", 16 "ttSeller": false 17 }, 18 "musicMeta": { 19 "coverMediumUrl": ..., 20 "musicId": "7105825676251351814" 21 }, 22 "videoMeta": { 23 "coverUrl": ..., 24 "definition": "720p", 25 "format": "mp4" 26 }, 27}
2022-2-1
Features
- Add the possibility to scrape comments under login using loginCookies
- Add login session management to avoid blocking of the account used for scraping comments
- Searching hashtag and number of views for this hashtag are now stored in the output
Fixes
- Actor does not deduplicate videos for different hashtags - improves the accuracy of the number of outputted items
- Improved logs
2022-1-14
Fixes
- fixed scraping of
authorMeta
data from scripts - affected: post urls and first batch of videos for hashtag and profile
2022-1-10
Fixes
- updated scraping of hashtags and profiles - TikTok randomly displays two types of scripts with data
- fixed number of output items to be the same as
resultsPerPage
- fixed
Timed out error
when waiting for the xhr response with data - the scraper now scrolls until it receives the response or the waiting/scrolling times out - more readable error messages
- fixed progress caching
- empty strings are no longer accepted as
hashtag
,postUrl
orprofile
2022-1-4
Fixes
- updated scraping of individual posts - TikTok randomly displays two types of scripts with data
2021-10-20
Fixes
- when
page.waitingForResponse
timeouts, it retires the session and restarts browser. This should prevent looping of timeouts on request retries
Features
- TikTok sometimes sends a request for the same data two times. This behavior won't affect total number of outputted data, specified on input. (Also duplicity videos for a hashtag/profile searches will be scraped only once, but won't be counted into the number of outputted data for the specific search)
- Sometimes there are more than 6 videos loaded on the search page. The scraper won't push them into the outputted results, so that the number of results remains consistent according to the specification on input.
2021-10-18
Fixes
- computation of
outputLength
is no longer dependent on persisted progress, meaning scraping of more than one hashtag/profile is now working properly
2021-10-15
Fixes
handlePageFunction
does not timeout whenresultsPerPage
are set low
2021-10-14
Features
- New output structure
- Added the possibility to scrape more than the first page of results (regulated by
resultsPerPage
input) - Scrapes user profiles defined on input by username in
profiles
- Added optional attributes
maxConcurrency
,maxRequestRetries
andresultsPerPage
to input - If
resultsPerPage
is not specified, it defaults to 10 and minimal value is 1
- 22 monthly users
- 6 stars
- 100.0% runs succeeded
- Created in May 2024
- Modified 2 days ago