Tiktok Scraper 2.0 avatar

Tiktok Scraper 2.0

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Tiktok Scraper 2.0

Tiktok Scraper 2.0

Scrape TikTok profiles, user videos, keyword results, video metrics, media links, hashtags, account metadata, and transcript or caption text into clean Apify datasets. Built for monitoring accounts, tracking TikTok trends, and feeding structured TikTok data into analytics workflows.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

1

Bookmarked

27

Total users

6

Monthly active users

8 days ago

Last modified

Share

Scrape TikTok profiles, user videos, keyword search results, video metrics, media links, and transcript text into clean Apify dataset rows.

This Actor is designed for monitoring TikTok accounts, collecting public video analytics, tracking keyword trends, and feeding structured TikTok data into dashboards, warehouses, or enrichment pipelines.

Features

  • Scrape videos from TikTok usernames
  • Scrape videos from TikTok keyword queries
  • Collect profile metadata for user runs when enabled
  • Extract video metrics including plays, likes, comments, shares, saves, duration, hashtags, and music title
  • Extract subtitle transcript text when exposed by TikTok metadata
  • Fall back to cleaned caption text when no transcript track is available
  • Return compact pipeline-friendly rows by default
  • Label row quality with metadata_quality and missing_fields
  • Store continuation checkpoints between runs to skip already-seen videos
  • Write run diagnostics to OUTPUT_SUMMARY

Workflows

Users

Use workflow: "users" to scrape one or more TikTok accounts.

{
"workflow": "users",
"users": ["tiktok", "khaby.lame"],
"maxVideosPerUser": 10,
"skipProfileScrape": false,
"outputMode": "compact"
}

Keywords

Use workflow: "keywords" to scrape videos discovered from TikTok search and hashtag pages.

{
"workflow": "keywords",
"keywords": ["funny", "dog training"],
"maxVideosPerKeyword": 10,
"outputMode": "compact"
}

Input Reference

FieldTypeDefaultDescription
workflowstringusersRequired. Use users or keywords.
usersarray[string]-TikTok usernames, @handles, or profile URLs. Required for users workflow. Max 100.
keywordsarray[string]-Keyword queries. Required for keywords workflow. Max 100.
maxVideosPerUserinteger10Maximum videos to collect per user.
maxVideosPerKeywordinteger10Maximum videos to collect per keyword.
enginestringautoPrimary browser engine: auto, playwright, or pydoll.
outputModestringcompactcompact for smaller records, full for more technical fields.
maxDatasetItemsinteger5000Maximum dataset rows pushed for the run. Use 0 for no additional cap.
skipProfileScrapebooleantrueFaster users workflow. Set to false for richer account metadata.
includePlaybackUrlbooleantrueInclude direct playback URLs. These are often signed and short-lived.
includeTranscriptTextbooleantrueInclude transcript or caption fallback text in rows.
includeMediaLinksMetabooleantrueInclude expiry/signature metadata for media URLs.
enableApifyContinuationbooleantrueLoad checkpoint state from the named key-value store.
resetApifyContinuationbooleanfalseIgnore previous checkpoint state and replace it after the run.
continuationStoreNamestringtiktok-scraper-2-0-continuationNamed key-value store for checkpoints. Normalized to lowercase letters, digits, and hyphens.
continuationStateKeystringCONTINUATION_STATEKey used inside the continuation store.
minDelaySecnumber1.1Minimum randomized browser delay.
maxDelaySecnumber2.4Maximum randomized browser delay.
ytdlpTimeoutSecinteger240Timeout for metadata/transcript fallback calls.
ytdlpChunkSizeinteger40Profile playlist items requested per yt-dlp chunk. Larger values can improve high-volume user runs.
ytdlpMaxRawScaninteger200Maximum raw playlist entries scanned per user. Use 0 for no additional scan cap.
ytdlpMaxVideosPerUserinteger0Optional yt-dlp-specific cap per user. Use 0 to follow maxVideosPerUser.
ytdlpDetailEnrichLimitinteger10Maximum per-video detail enrichments per user. Lower values are faster but may reduce metadata depth.
ytdlpTranscriptDetailEnrichLimitinteger80Maximum per-video detail enrichments used for transcript discovery.
ytdlpTranscriptSubtitleDownloadLimitinteger8Maximum subtitle-download transcript fallback attempts per user.
disableScraplingFallbackbooleanfalseDisable the secondary fallback path.

Provide either users or keywords, not both.

Continuation

Continuation is enabled by default and helps avoid returning the same videos across repeated runs.

The Actor stores checkpoint state in the named key-value store configured by continuationStoreName, under the key configured by continuationStateKey.

For each user or keyword, continuation stores:

  • latest known from_epoch
  • latest continuation_token
  • recent video_ids for duplicate suppression
  • last scrape timestamp

To force a fresh run, set:

{
"resetApifyContinuation": true
}

You can also provide a one-off lower bound directly in an item:

  • tiktok|from=2026-01-01T00:00:00Z
  • khaby.lame|from=1778336344
  • funny|from=2026-01-01

An explicit from value in the input takes priority over stored continuation state.

Dataset Output

Each dataset item is a video row. Compact rows start with:

  • video_id
  • video_url
  • caption_text
  • transcript

Common fields include:

  • create_time and create_time_epoch
  • author_username and author_id
  • hashtags
  • play_count, like_count, comment_count, share_count, save_count
  • duration_s
  • music_title
  • media_links
  • transcript_detail
  • has_transcript
  • has_caption_text_fallback
  • metadata_quality
  • missing_fields
  • account for user workflow rows
  • search_keyword for keyword workflow rows

Transcript Fields

has_transcript is true when subtitle/caption metadata was exposed and parsed.

has_caption_text_fallback is true when no transcript track was available and the Actor used cleaned caption text instead.

transcript_detail.status may be:

  • found
  • fallback_description
  • missing
  • disabled

Metadata Quality

metadata_quality helps downstream systems decide whether a row is complete enough for analytics:

  • full: core metadata is present
  • partial: usable video row with some missing fields
  • url_only: mostly just a discovered video URL

missing_fields lists important absent fields such as create_time_epoch, author_id, play_count, or duration_s.

Run Summary

Every run writes OUTPUT_SUMMARY to the default key-value store.

Useful fields include:

  • status
  • videos_scraped
  • videos_with_transcript
  • videos_with_caption_text_fallback
  • videos_with_any_transcript_text
  • metadata_quality_counts
  • health
  • continuation_loaded_entities
  • continuation_loaded_id_entities
  • continuation_updates
  • dataset_items_pushed

The health object includes:

  • total_rows
  • sparse_row_count
  • empty_entity_count
  • seed_count
  • empty_seed_count
  • challenge_seed_count
  • success_rate

Performance Tips

  • Use outputMode: "compact" for most API and dataset workflows.
  • Set includePlaybackUrl: false when you do not need direct signed playback URLs.
  • Keep skipProfileScrape: true for faster user runs.
  • Set skipProfileScrape: false only when you need richer account metadata.
  • Use smaller maxVideosPerUser and maxVideosPerKeyword for frequent monitoring.
  • Use continuation for repeated monitoring runs.
  • Increase ytdlpTimeoutSec when transcript extraction is more important than speed.

For high-volume user monitoring where speed and lower compute usage matter more than transcript extraction, use compact rows and disable transcript/media enrichment:

{
"workflow": "users",
"users": ["tiktok", "netflix", "nba"],
"maxVideosPerUser": 80,
"maxDatasetItems": 1000,
"outputMode": "compact",
"includePlaybackUrl": false,
"includeTranscriptText": false,
"includeMediaLinksMeta": false,
"skipProfileScrape": true,
"ytdlpChunkSize": 80,
"ytdlpMaxRawScan": 80,
"ytdlpDetailEnrichLimit": 0,
"ytdlpTranscriptDetailEnrichLimit": 0,
"ytdlpTranscriptSubtitleDownloadLimit": 0
}

API Usage

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run_input = {
"workflow": "users",
"users": ["tiktok", "khaby.lame"],
"maxVideosPerUser": 5,
"skipProfileScrape": False,
"outputMode": "compact",
"includePlaybackUrl": False,
}
run = client.actor("thescrapelab/tiktok-scraper-2-0").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["video_url"], item.get("play_count"), item.get("metadata_quality"))

HTTP

curl -sS -X POST \
"https://api.apify.com/v2/acts/thescrapelab~tiktok-scraper-2-0/run-sync-get-dataset-items?format=json&clean=true" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"workflow": "keywords",
"keywords": ["funny"],
"maxVideosPerKeyword": 5,
"outputMode": "compact",
"includePlaybackUrl": false
}'

Notes And Limitations

  • TikTok changes page behavior often, and some runs can encounter challenge or login pages.
  • Keyword scraping uses both search and hashtag discovery paths, but availability can vary by region and timing.
  • Transcript tracks are only returned when exposed by TikTok metadata or fallback sources.
  • Caption fallback is not a true spoken transcript.
  • Direct media URLs can be signed and expire quickly.
  • Public follower/following lists are not reliably available from TikTok public web pages.