Tiktok Scraper 2.0
Pricing
from $2.00 / 1,000 results
Tiktok Scraper 2.0
Scrape TikTok profiles, user videos, keyword results, video metrics, media links, hashtags, account metadata, and transcript or caption text into clean Apify datasets. Built for monitoring accounts, tracking TikTok trends, and feeding structured TikTok data into analytics workflows.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
1
Bookmarked
27
Total users
6
Monthly active users
8 days ago
Last modified
Categories
Share
Scrape TikTok profiles, user videos, keyword search results, video metrics, media links, and transcript text into clean Apify dataset rows.
This Actor is designed for monitoring TikTok accounts, collecting public video analytics, tracking keyword trends, and feeding structured TikTok data into dashboards, warehouses, or enrichment pipelines.
Features
- Scrape videos from TikTok usernames
- Scrape videos from TikTok keyword queries
- Collect profile metadata for user runs when enabled
- Extract video metrics including plays, likes, comments, shares, saves, duration, hashtags, and music title
- Extract subtitle transcript text when exposed by TikTok metadata
- Fall back to cleaned caption text when no transcript track is available
- Return compact pipeline-friendly rows by default
- Label row quality with
metadata_qualityandmissing_fields - Store continuation checkpoints between runs to skip already-seen videos
- Write run diagnostics to
OUTPUT_SUMMARY
Workflows
Users
Use workflow: "users" to scrape one or more TikTok accounts.
{"workflow": "users","users": ["tiktok", "khaby.lame"],"maxVideosPerUser": 10,"skipProfileScrape": false,"outputMode": "compact"}
Keywords
Use workflow: "keywords" to scrape videos discovered from TikTok search and hashtag pages.
{"workflow": "keywords","keywords": ["funny", "dog training"],"maxVideosPerKeyword": 10,"outputMode": "compact"}
Input Reference
| Field | Type | Default | Description |
|---|---|---|---|
workflow | string | users | Required. Use users or keywords. |
users | array[string] | - | TikTok usernames, @handles, or profile URLs. Required for users workflow. Max 100. |
keywords | array[string] | - | Keyword queries. Required for keywords workflow. Max 100. |
maxVideosPerUser | integer | 10 | Maximum videos to collect per user. |
maxVideosPerKeyword | integer | 10 | Maximum videos to collect per keyword. |
engine | string | auto | Primary browser engine: auto, playwright, or pydoll. |
outputMode | string | compact | compact for smaller records, full for more technical fields. |
maxDatasetItems | integer | 5000 | Maximum dataset rows pushed for the run. Use 0 for no additional cap. |
skipProfileScrape | boolean | true | Faster users workflow. Set to false for richer account metadata. |
includePlaybackUrl | boolean | true | Include direct playback URLs. These are often signed and short-lived. |
includeTranscriptText | boolean | true | Include transcript or caption fallback text in rows. |
includeMediaLinksMeta | boolean | true | Include expiry/signature metadata for media URLs. |
enableApifyContinuation | boolean | true | Load checkpoint state from the named key-value store. |
resetApifyContinuation | boolean | false | Ignore previous checkpoint state and replace it after the run. |
continuationStoreName | string | tiktok-scraper-2-0-continuation | Named key-value store for checkpoints. Normalized to lowercase letters, digits, and hyphens. |
continuationStateKey | string | CONTINUATION_STATE | Key used inside the continuation store. |
minDelaySec | number | 1.1 | Minimum randomized browser delay. |
maxDelaySec | number | 2.4 | Maximum randomized browser delay. |
ytdlpTimeoutSec | integer | 240 | Timeout for metadata/transcript fallback calls. |
ytdlpChunkSize | integer | 40 | Profile playlist items requested per yt-dlp chunk. Larger values can improve high-volume user runs. |
ytdlpMaxRawScan | integer | 200 | Maximum raw playlist entries scanned per user. Use 0 for no additional scan cap. |
ytdlpMaxVideosPerUser | integer | 0 | Optional yt-dlp-specific cap per user. Use 0 to follow maxVideosPerUser. |
ytdlpDetailEnrichLimit | integer | 10 | Maximum per-video detail enrichments per user. Lower values are faster but may reduce metadata depth. |
ytdlpTranscriptDetailEnrichLimit | integer | 80 | Maximum per-video detail enrichments used for transcript discovery. |
ytdlpTranscriptSubtitleDownloadLimit | integer | 8 | Maximum subtitle-download transcript fallback attempts per user. |
disableScraplingFallback | boolean | false | Disable the secondary fallback path. |
Provide either users or keywords, not both.
Continuation
Continuation is enabled by default and helps avoid returning the same videos across repeated runs.
The Actor stores checkpoint state in the named key-value store configured by continuationStoreName, under the key configured by continuationStateKey.
For each user or keyword, continuation stores:
- latest known
from_epoch - latest
continuation_token - recent
video_idsfor duplicate suppression - last scrape timestamp
To force a fresh run, set:
{"resetApifyContinuation": true}
You can also provide a one-off lower bound directly in an item:
tiktok|from=2026-01-01T00:00:00Zkhaby.lame|from=1778336344funny|from=2026-01-01
An explicit from value in the input takes priority over stored continuation state.
Dataset Output
Each dataset item is a video row. Compact rows start with:
video_idvideo_urlcaption_texttranscript
Common fields include:
create_timeandcreate_time_epochauthor_usernameandauthor_idhashtagsplay_count,like_count,comment_count,share_count,save_countduration_smusic_titlemedia_linkstranscript_detailhas_transcripthas_caption_text_fallbackmetadata_qualitymissing_fieldsaccountfor user workflow rowssearch_keywordfor keyword workflow rows
Transcript Fields
has_transcript is true when subtitle/caption metadata was exposed and parsed.
has_caption_text_fallback is true when no transcript track was available and the Actor used cleaned caption text instead.
transcript_detail.status may be:
foundfallback_descriptionmissingdisabled
Metadata Quality
metadata_quality helps downstream systems decide whether a row is complete enough for analytics:
full: core metadata is presentpartial: usable video row with some missing fieldsurl_only: mostly just a discovered video URL
missing_fields lists important absent fields such as create_time_epoch, author_id, play_count, or duration_s.
Run Summary
Every run writes OUTPUT_SUMMARY to the default key-value store.
Useful fields include:
statusvideos_scrapedvideos_with_transcriptvideos_with_caption_text_fallbackvideos_with_any_transcript_textmetadata_quality_countshealthcontinuation_loaded_entitiescontinuation_loaded_id_entitiescontinuation_updatesdataset_items_pushed
The health object includes:
total_rowssparse_row_countempty_entity_countseed_countempty_seed_countchallenge_seed_countsuccess_rate
Performance Tips
- Use
outputMode: "compact"for most API and dataset workflows. - Set
includePlaybackUrl: falsewhen you do not need direct signed playback URLs. - Keep
skipProfileScrape: truefor faster user runs. - Set
skipProfileScrape: falseonly when you need richer account metadata. - Use smaller
maxVideosPerUserandmaxVideosPerKeywordfor frequent monitoring. - Use continuation for repeated monitoring runs.
- Increase
ytdlpTimeoutSecwhen transcript extraction is more important than speed.
For high-volume user monitoring where speed and lower compute usage matter more than transcript extraction, use compact rows and disable transcript/media enrichment:
{"workflow": "users","users": ["tiktok", "netflix", "nba"],"maxVideosPerUser": 80,"maxDatasetItems": 1000,"outputMode": "compact","includePlaybackUrl": false,"includeTranscriptText": false,"includeMediaLinksMeta": false,"skipProfileScrape": true,"ytdlpChunkSize": 80,"ytdlpMaxRawScan": 80,"ytdlpDetailEnrichLimit": 0,"ytdlpTranscriptDetailEnrichLimit": 0,"ytdlpTranscriptSubtitleDownloadLimit": 0}
API Usage
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run_input = {"workflow": "users","users": ["tiktok", "khaby.lame"],"maxVideosPerUser": 5,"skipProfileScrape": False,"outputMode": "compact","includePlaybackUrl": False,}run = client.actor("thescrapelab/tiktok-scraper-2-0").call(run_input=run_input)for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["video_url"], item.get("play_count"), item.get("metadata_quality"))
HTTP
curl -sS -X POST \"https://api.apify.com/v2/acts/thescrapelab~tiktok-scraper-2-0/run-sync-get-dataset-items?format=json&clean=true" \-H "Authorization: Bearer YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"workflow": "keywords","keywords": ["funny"],"maxVideosPerKeyword": 5,"outputMode": "compact","includePlaybackUrl": false}'
Notes And Limitations
- TikTok changes page behavior often, and some runs can encounter challenge or login pages.
- Keyword scraping uses both search and hashtag discovery paths, but availability can vary by region and timing.
- Transcript tracks are only returned when exposed by TikTok metadata or fallback sources.
- Caption fallback is not a true spoken transcript.
- Direct media URLs can be signed and expire quickly.
- Public follower/following lists are not reliably available from TikTok public web pages.
