Douyin Transcript Scraper | AI Speech-to-Text (抖音)
Pricing
$25.00 / 1,000 transcript per minutes
Douyin Transcript Scraper | AI Speech-to-Text (抖音)
Turn any Douyin (抖音) video into text. Real AI speech recognition with best-in-class Mandarin Chinese accuracy — works on videos with no captions. Full text + timestamped sentences as clean JSON.
Pricing
$25.00 / 1,000 transcript per minutes
Rating
0.0
(0)
Developer
Jackie Chen
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Categories
Share
Douyin Transcript Scraper — AI Speech-to-Text for 抖音 Videos
Turn any Douyin (抖音) video into text with real AI speech recognition. Paste video links, get back the full spoken transcript plus timestamped sentences as clean JSON — with best-in-class Mandarin Chinese accuracy, ready for LLMs, RAG pipelines, content research, and subtitle workflows.
This is real ASR, not caption scraping. Most "transcript" tools only download captions a creator happened to upload — and Douyin videos almost never have them. This Actor runs industrial speech recognition on the video's audio track, so it works on any Douyin video with speech, captions or not. Chinese (Mandarin) recognition is its strongest language.
Unofficial. This Actor is not affiliated with, authorized, or endorsed by Douyin / ByteDance. It is an independent tool that processes publicly available content. Use it in compliance with Douyin's terms and all applicable laws; you are responsible for how you use the retrieved data.
What you get per video
- Full transcript (
fullText) — the complete spoken content as one string. - Timestamped sentences (
sentences[]) — each withstartMs/endMs, ready for subtitles, deep links, and clip selection. - Video metadata — author, caption, duration, publish time, play / like / comment / share / collect counts, and cover image URL, so every transcript arrives with its engagement context attached.
Quick start
- Open the Actor and press Run — the default input works out of the box.
- Replace the example with your own video links: full URLs
(
https://www.douyin.com/video/123…), short share links (https://v.douyin.com/XXXX/), pasted share text, or bare video IDs. - Language defaults to Chinese; switch to Auto for mixed or non-Chinese content.
- Collect results from the Dataset tab as JSON / CSV / Excel, or pull them via the Apify API and MCP from your own code.
No proxies, no cookies, no login — everything runs server-side.
Example output
{"videoId": "7641539662222270120","videoUrl": "https://www.douyin.com/video/7641539662222270120","description": "东北街头 12元红烧大肘子盖饭,夯爆了~ #地方特色美食 #路边摊美味","author": "jiexiaomeishi","authorName": "街边小美食","durationSec": 67.3,"playCount": 1240000,"diggCount": 98000,"language": "zh","detectedSpeech": true,"sentenceCount": 31,"fullText": "家人们今天来到东北街头,这家红烧大肘子盖饭只要十二块…","sentences": [{ "text": "家人们今天来到东北街头,", "startMs": 320, "endMs": 2480 },{ "text": "这家红烧大肘子盖饭只要十二块。", "startMs": 2480, "endMs": 5100 }]}
What people build with it
- Chinese-market research — index what 抖音 creators actually say (not just captions) to understand trends, slang, and selling points in your category.
- Viral-hook mining — pull transcripts of the top videos in your niche and study the exact opening lines and structures that earn views on Douyin.
- Cross-border content — transcribe Chinese videos, then translate and repurpose them for TikTok, YouTube, or your own market.
- LLM & RAG pipelines — build Chinese-language corpora from real short-video speech, with engagement scores as a free quality signal.
- E-commerce & livestream research — capture how top sellers pitch products in 带货 clips, verbatim.
- Subtitles & translation — timestamped sentences drop straight into SRT/VTT generation and dubbing workflows.
Pricing & billing
Pay per audio minute — $0.025/min, billed by the beginning minute. A 0:50 clip bills 1 minute; a 1:10 clip bills 2 minutes. Timestamped sentences are included free — no separate subtitle or export charge. There is no per-run start fee and no separate compute or platform fee: the price you see is the price you pay.
Failed fetches, deleted videos, image posts, and failed transcriptions are not charged — you only pay for audio we actually transcribe.
Why this Actor
- Best-in-class Chinese ASR — Mandarin recognition is its strongest language, where Western transcript tools struggle most.
- Works without captions — real speech recognition on the audio track; Douyin videos rarely carry captions to scrape.
- Engagement context included — every transcript ships with play / like / comment / share / collect counts, so you can rank by performance immediately.
- Direct API, no headless browser — fast, stable runs with nothing to babysit.
- No login, no cookies — we never touch your accounts, so there's no ban risk.
- Structured JSON — export to CSV, Excel, or JSON, or pull straight from the API / MCP.
Tips for better results
- Feed it the winners: find the top-performing videos in your niche first (search / profile scrapers), then transcribe just those.
- Keep the language on Chinese for 抖音 content — it noticeably improves accuracy over Auto.
- Sentence timestamps let you deep-link to the exact second a phrase is spoken, handy for review and clip selection.
detectedSpeech: falseflags music-only / no-speech videos so you can filter them out downstream.
FAQ
Do I need an account, cookies, or to log in anywhere? No. The Actor talks to fast, direct HTTP APIs server-side — you just provide video links and run it.
Does it work on videos without captions? Yes — that's the point. It runs real speech recognition on the audio, so captions are never required (and Douyin videos rarely have them).
How good is the Chinese accuracy? Mandarin is the model's strongest language; it handles fast colloquial speech, regional accents, and product pitches well, and returns punctuated sentences.
How am I billed? One fixed price per successfully transcribed video. Videos that can't be fetched or transcribed are not charged.
Can I run it on a schedule or call it from my app? Yes — use Apify Schedules, the REST API, the JavaScript / Python clients, or the MCP server. See the API tab.
Is this affiliated with Douyin? No. It's an independent tool that processes publicly available content. Use it in line with the platform's terms and applicable law.