Tiktok Video Text Extractor
Pricing
from $0.90 / 1,000 videos
Tiktok Video Text Extractor
Extract all visible on-screen text from public TikTok videos — overlays, captions, stickers, hashtags, and watermarks — with timestamps, type, position, and language. Use for content intelligence, brand monitoring, creator analytics, and turning short-form video text into searchable structured data.
Pricing
from $0.90 / 1,000 videos
Rating
0.0
(0)
Developer
rainminer
Maintained by CommunityActor stats
0
Bookmarked
4
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
The TikTok Video Text Extractor is an Apify Actor that reads public TikTok videos and returns every piece of visible on-screen text as structured data. Creators and brands overlay text, captions, stickers, hashtags, and watermarks directly onto their videos — this Actor unlocks that content for search, analysis, and database ingestion without manual review.
Each video is processed by AI vision and the results are returned as a flat, structured dataset of text segments — each with its timestamp, type, screen position, language, and confidence rating.
Key Features
- Full on-screen text extraction: Captures animated overlays, editor captions, TikTok stickers, watermarks, hashtags, and @mentions.
- Timestamp-aware: Records the MM:SS timestamp when each text element first appears.
- Type classification: Distinguishes overlay, caption, sticker, watermark, hashtag, mention, and other text types.
- Multilingual: Detects the ISO 639-1 language code for each segment from the text itself.
- Position detection: Classifies text placement as top, center, or bottom of screen.
- Confidence rating: High/medium/low rating based on text clarity in the video frame.
- No login required: Works with any public TikTok video.
Why Extract Text from TikTok Videos?
TikTok is a primary publishing surface for businesses, creators, and brands. Product drops, event announcements, hiring notices, and pricing updates are routinely shared only as video overlays — never structured or indexed. This Actor makes that content machine-readable for:
- Content intelligence and brand monitoring tracking what competitors publish on TikTok.
- Food and hospitality capturing daily specials and seasonal menus announced via video.
- Event aggregators extracting event names, dates, and lineup text from promotional videos.
- Retail and e-commerce indexing product drops, discount codes, and launch dates.
- Market research tracking pricing, offers, and messaging trends across accounts.
- Accessibility tools converting visual video text to readable formats.
Who Is It For?
- Marketing and analytics teams monitoring brand or competitor TikTok content at scale.
- Product and data teams building structured datasets from TikTok video content.
- Developers integrating TikTok text extraction into discovery or monitoring pipelines.
- Researchers studying visual communication trends in short-form video.
Input Schema
{"videoUrls": ["https://www.tiktok.com/@tiktok/video/7018846970735028738"],"maxItems": 10}
videoUrls is required. All other fields are optional.
| Field | Type | Default | Description |
|---|---|---|---|
videoUrls | Array of strings | — | Public TikTok video URLs (@user/video/..., vm.tiktok.com/...) |
maxItems | Integer | 10 | Maximum number of videos to process in a single run |
Output Schema
Each dataset item represents one video and all the on-screen text found in it:
{"videoUrl": "https://www.tiktok.com/@tiktok/video/7018846970735028738","videoId": "7018846970735028738","duration": 42.5,"textSegments": [{"text": "Summer sale — 50% off today only","timestamp": "00:02","type": "overlay","position": "center","language": "en","confidence": "high"}],"scrapedAt": "2026-06-01T08:15:42.000Z"}
| Field | Description |
|---|---|
videoUrl | Normalized canonical URL of the video |
videoId | TikTok numeric video ID or short-link identifier |
duration | Video length in seconds (from the downloaded file) |
textSegments | Array of all on-screen text elements found |
textSegments[].text | The visible text content as it appears on screen |
textSegments[].timestamp | MM:SS when the text first appears — null if indeterminate |
textSegments[].type | overlay | caption | sticker | watermark | hashtag | mention | other |
textSegments[].position | top | center | bottom — vertical screen position, null if it moves |
textSegments[].language | ISO 639-1 language code detected from the text, e.g. "en", "es" — null if indeterminate |
textSegments[].confidence | high | medium | low — extraction confidence based on text clarity |
scrapedAt | ISO timestamp of when this video was processed |
How It Works
- Validate inputs — each URL is checked against accepted TikTok video URL patterns and normalized to a canonical form.
- Download (retriable) — each video is fetched via a dedicated crawler step using Apify Proxy. Failed downloads are retried automatically with a different proxy (up to 5 attempts).
- Process — the downloaded file is analyzed by AI vision to extract all visible on-screen text in a single pass.
- Structured output — each text segment is classified by type, position, language, and confidence.
- Push to dataset — one dataset row is pushed per video containing all its text segments.
Notes and Limitations
- Public videos only: Private accounts and videos that require login to view are not supported.
- Video availability: Deleted or expired videos will fail to download and are skipped with a warning.
- OCR accuracy: Fast-moving, small-font, or low-contrast text may yield lower confidence extractions.
- Rate limiting: TikTok may rate-limit requests at high volume. Reduce concurrency or add delays between runs if you encounter failures.
- Video size: Very large videos are skipped automatically.
- Audio not included: Spoken content is intentionally excluded — only text visually rendered on screen is extracted.