Tiktok Hashtag Scraper

Pay $5.00 for 1,000 videos

Tiktok Hashtag Scraper

Tiktok Hashtag Scraper

clockworks/tiktok-hashtag-scraper

Pay $5.00 for 1,000 videos

Scrape TikTok hashtags the fast and easy way. Just add one or more hashtags and the scraper will extract TikTok data on each video that mentions it: URLs, likes, country of creation, video and music metadata, TikTok creator data. Download data as JSON, HTML and use it in your apps and data projects.

This changelog summarizes all changes of the TikTok actors provided by the Clockworks organization. The specific actors that are affected are listed for each change.

2023-11-18

Fixes

  • Fixed a bug where scraping comments and profiles fail. (tiktok-profile, tiktok-comments, tiktok-paid)

2023-09-29

Features

  • You can now search for profiles (Accounts) by username. You can pass a list of "Profile Queries" (profilesQueries) and set a limit for each query by providing the "Max profiles per query" (maxProfilesPerQuery). The actor will then search for the profiles and scrape the videos of each profile. (tiktok-profile)

2023-08-29

Fixes

  • Cheerio has been unblocked, you can now scrape posts quickly again, if you request <= 15 videos for tiktok-hashtag, tiktok-sound or <= 30 for tiktok-profile Features
  • You can now disable cheerio/HTTP querying in the input by setting disableCheerioBoost and disableEnrichAuthorStats to true. This will help if the problem with quick scraping arises ever again. (tiktok-hashtag, tiktok-profile, tiktok-sound, tiktok-free, tiktok-paid)

2023-08-21

Changes

  • Temporarily, boosting with Cheerio and additional querying for missing author stats has been disabled. It will be returned soon, once TikTok blocking is bypassed

2023-08-15

Fixes

  • The crawler will rotate proxies if a sound is unavailable under the current country (tiktok-sound)

2023-08-14

Features

  • If a sound is blocked in some country, the scraper will retry (tiktok-sound)

2023-08-10

Features

  • You can now download slideshow images by toggling a corresponding input (all, except for tiktok-comments)
  • Output videos now have a field telling whether they are slideshows (all, except for tiktok-comments)
  • If fast Cheerio crawler fails to load a page even with retries, the slower crawler will be used as a fallback (all, except for tiktok-comments)

2023-07-28

Features

  • Output now contains submittedVideoUrl in addition to webVideoUrl. It copies the post url in the input and may differ from the webVideoUrl, but both would lead to the same post. Will be present if you input direct post URLs. (tiktok-video, tiktok-comments-scraper)

2023-07-26

Fixes

  • Videos with sensitive content, which require a login, are now skipped gracefully (tiktok-video, tiktok-comments-scraper)

2023-07-25

BREAKING CHANGES

  • Proxies have been removed from input. Apify's datacenter proxies are always chosen, as they used to be by default (for all scrapers)
  • Scrape info about private/empty channels has also been removed from input, and set to true by default. If you applied this option and set it to false previously, you should experience no changes (tiktok-profile, tiktok-paid, tiktok-free)

Changes

  • The maximum memory is now limited to 4096 MBs for all pay-per-result actors (tiktok-hashtag, tiktok-profile, tiktok-sound, tiktok-video)

2023-07-11

Fixes

  • Now URLs in the format of https://www.tiktok.com/t/.../ are also recognized as post URLs.

2023-07-04

Features

  • Now URLs in the format of vt.tiktok.com are also supported as post URLs.

2023-06-29

Fixes

  • Now correctly utilizes proxy settings when boosting the scraper with Cheerio and querying for author stats. Previously it would often fail. (tiktok-profile-scraper, tiktok-hashtag-scraper, tiktok-sound-scraper)

2023-06-26

Fixes

  • Fixed a bug when in some cases it would not return any comments. (tiktok-comments-scraper)
  • Fixed a bug during reply counting. Previously it would sometimes stop too early, especially if the requested number is low. (tiktok-comments-scraper)
  • If the author stats are missing, the scraper will now make an additional quick request to the author page to get them. These stats get cached, so the query is made only one time. (tiktok-profile-scraper, tiktok-hashtag-scraper, tiktok-sound-scraper)

2023-06-23

Features

  • You can now scrape replies. Note that currently it's not guaranteed that all of them are going to get scraped. (tiktok-comments-scraper)
  • Scraping is up to 4x faster if you limit maxResultsPerPage to 30 for posts and 15 for hashtags. This is because the scraper can utilize non-browser requests without the need to scroll. (tiktok-profile-scraper, tiktok-hashtag-scraper, will be added to tiktok-scraper once it is converted to a price-per-result system).
  • Added oldestPostDate and scrapeLastNDays to only scrape posts from now up to a certain date. (tiktok-profile-scraper)

2023-05-25

Features

2023-05-19

Features

  • You can now enable the flag in input (scrapeEmptyChannelInfo) that will allow you to save info about private or empty channels even if they don't contain any posts (tiktok-profile-scraper)

2023-04-19

Fixes

  • Won't wait for the first XHR response with hashtags if the data in the initial script is enough.
  • Doesn't try to retry on 404 pages or private accounts anymore

2023-04-12

Features

  • Progress now tracks the last video reached while scrolling at a certain page (both for hashtags and profiles) and the comments scraped for a post. If scrolling fails (e.g. due to a captcha) the crawler will try to restore the scroll at the last video/comment, so as not to scroll all the way down again. If restoration fails though, it should fall back to the old behavior and print a warning.
  • Change default shouldRetryStuckComments to true, since now it's possible to restore scroll in such cases.
  • When comment or post list crawlers manage to scrape new videos by scrolling, retryCount is reset to 0.

Fixes

  • The hashtag route now catches the initial XHR response that adds usually up to 15 new videos which previously weren't scraped
  • Added failed request handler to post detail route, so that no matter what error happens during comment scrape, it's going to push partial results if retries are exhausted.

2023-04-03

Features

  • New option to download cover images. Can also specify an optional KVStore name (shared with video download). Analogous to video download from 2023-03-27. If opted in, will replace the link in videoMeta.coverUrl with link to kvStore. Also added originalCoverUrl pointing to tiktok CDN cover url.

2023-03-27

Features

  • New option to download videos. Can also specify optional KVStore name. If opted in, will replace the link in mediaUrls with link to kvStore. Will also place downloadAddr in videoMeta pointing to kvstore and originalDownloadAddr pointing to tiktok CDN

2023-03-07

Fixes

  • Hotfixed broken post URLs

2023-03-06

Fixes

  • Don't stop scrolling if there are still more videos to load (this happened when the initial videos count was less than exptected, now we check dynamically if there are more videos to load)

2023-02-22

Fixes

  • Don't get stuck if all videos or comments were already loaded before scrolling (this bug happened when there were less than 20 videos or comments)

2023-02-03

Fixes

  • Improved scrolling for videos that got stuck on 30 videos loaded. It is still a bit clunky and there is a lot of blocking but should work with few retries.

2023-01-27

Features

  • loginCookies are no longer required for scraping comments
  • The comment crawling is rewritten to scrape from underneath the post
  • no login sessions are created or managed anymore by the actor
  • loginCookies on input are deprecated and generate warning in the log

2022-8-14

Features

  • Output was updated to include more properties. New properties:
{ "locationCreated": "CA", "isAd": false, "isMuted": false, "authorMeta": { "bioLink": "https://www.thewhiskyexplorer.ca", "commerceUserInfo": { "commerceUser": true, "category": "Food/Beverage", "categoryButton": false }, "isUnderAge18": false, "privateAccount": false, "region": "CA", "roomId": "", "ttSeller": false }, "musicMeta": { "coverMediumUrl": ..., "musicId": "7105825676251351814" }, "videoMeta": { "coverUrl": ..., "definition": "720p", "format": "mp4" }, }

2022-2-1

Features

  • Add the possibility to scrape comments under login using loginCookies
  • Add login session management to avoid blocking of the account used for scraping comments
  • Searching hashtag and number of views for this hashtag are now stored in the output

Fixes

  • Actor does not deduplicate videos for different hashtags - improves the accuracy of the number of outputted items
  • Improved logs

2022-1-14

Fixes

  • fixed scraping of authorMeta data from scripts - affected: post urls and first batch of videos for hashtag and profile

2022-1-10

Fixes

  • updated scraping of hashtags and profiles - TikTok randomly displays two types of scripts with data
  • fixed number of output items to be the same as resultsPerPage
  • fixed Timed out error when waiting for the xhr response with data - the scraper now scrolls until it receives the response or the waiting/scrolling times out
  • more readable error messages
  • fixed progress caching
  • empty strings are no longer accepted as hashtag, postUrl or profile

2022-1-4

Fixes

  • updated scraping of individual posts - TikTok randomly displays two types of scripts with data

2021-10-20

Fixes

  • when page.waitingForResponse timeouts, it retires the session and restarts browser. This should prevent looping of timeouts on request retries

Features

  • TikTok sometimes sends a request for the same data two times. This behavior won't affect total number of outputted data, specified on input. (Also duplicity videos for a hashtag/profile searches will be scraped only once, but won't be counted into the number of outputted data for the specific search)
  • Sometimes there are more than 6 videos loaded on the search page. The scraper won't push them into the outputted results, so that the number of results remains consistent according to the specification on input.

2021-10-18

Fixes

  • computation of outputLength is no longer dependent on persisted progress, meaning scraping of more than one hashtag/profile is now working properly

2021-10-15

Fixes

  • handlePageFunction does not timeout when resultsPerPage are set low

2021-10-14

Features

  • New output structure
  • Added the possibility to scrape more than the first page of results (regulated by resultsPerPage input)
  • Scrapes user profiles defined on input by username in profiles
  • Added optional attributes maxConcurrency, maxRequestRetries and resultsPerPage to input
  • If resultsPerPage is not specified, it defaults to 10 and minimal value is 1
Developer
Maintained by Apify
Actor stats
  • 668 users
  • 5k runs
  • Modified about 17 hours ago

You might also like these Actors