Pricing

from $3.00 / 1,000 results

Twitter / X Video Transcript Scraper

Extract transcripts from Twitter/X video posts. Returns timestamped segments using native Twitter captions (WebVTT) with automatic Whisper AI fallback for uncaptioned videos

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Features

Native captions first — intercepts Twitter's built-in WebVTT subtitle tracks for fastest, most accurate results
Whisper AI fallback — uses faster-whisper to transcribe audio when no native captions are available
Timestamped segments — every output row includes startTime, endTime, and text for precise video navigation
Full transcript — each row also carries the complete joined transcript for easy search
Flexible method control — choose auto (native → Whisper), native only, or Whisper only
Multi-language support — native captions in any language; optional language hint for Whisper
Anti-detection — Playwright Firefox with stealth fingerprinting, randomised viewports/user-agents, and human-like delays

Input

Field	Type	Required	Description
`postUrls`	string[]	✅	Twitter/X video post URLs (`twitter.com` or `x.com` both accepted)
`cookies`	string	✅	Twitter/X session cookies JSON (`auth_token` + `ct0` required)
`transcriptionMethod`	select		`auto` (default), `native`, or `whisper`
`whisperModel`	select		`tiny`, `base` (default), `small`, `medium`, `large-v2`
`language`	string		ISO 639-1 hint for Whisper (e.g. `en`, `es`, `fr`)
`proxyConfiguration`	object		Apify proxy settings

How to get Twitter cookies

Log in to x.com in your browser
Open DevTools → Application → Cookies → https://x.com
Copy the auth_token and ct0 cookie values
Export all cookies as JSON (e.g. using the EditThisCookie browser extension)
Paste the JSON array into the cookies input field

Cookies expire periodically — re-export if you see expired_cookies errors.

Output

Each dataset row represents one transcript segment. Tweet metadata is repeated on every row for easy filtering.

Field	Type	Description
`tweetUrl`	string	Canonical `x.com/…/status/…` URL
`tweetId`	string	Numeric tweet ID
`authorUsername`	string	Twitter handle (without `@`)
`authorName`	string	Display name
`tweetText`	string	Tweet caption / body text
`publishedAt`	string	ISO 8601 publish timestamp
`language`	string	ISO 639-1 language code
`transcriptMethod`	string	`native` or `whisper`
`transcriptAvailable`	boolean	`false` for tweets with no extractable transcript
`segmentIndex`	integer	0-based position within the transcript
`startTime`	float	Segment start time in seconds
`endTime`	float	Segment end time in seconds
`text`	string	Segment transcript text
`fullTranscript`	string	All segments joined into one string
`scrapedAt`	string	ISO 8601 scrape timestamp

Sample output record

{
  "tweetUrl": "https://x.com/NASA/status/1858131747319566780",
  "tweetId": "1858131747319566780",
  "authorUsername": "NASA",
  "authorName": "NASA",
  "tweetText": "Watch our latest discovery announcement…",
  "publishedAt": "2024-11-17T18:30:00.000Z",
  "language": "en",
  "transcriptMethod": "native",
  "transcriptAvailable": true,
  "segmentIndex": 0,
  "startTime": 0.0,
  "endTime": 3.44,
  "text": "We made a remarkable discovery this week",
  "fullTranscript": "We made a remarkable discovery this week that changes our understanding of the solar system.",
  "scrapedAt": "2025-01-15T10:22:33.456Z"
}

Transcription Methods

Method	When to use	Speed	Accuracy
`auto`	Default — tries native first, Whisper fallback	Fast when native available	High
`native`	Only want videos with Twitter captions	Fastest	Highest (verbatim)
`whisper`	All videos, including those without captions	Slower	High (model-dependent)

Whisper Model Selection

Model	Size	Speed	Use case
`tiny`	32 MB	Fastest	Quick drafts, high-volume runs
`base`	74 MB	Fast	Default — good balance
`small`	244 MB	Medium	Better accuracy for accented speech
`medium`	769 MB	Slow	High accuracy
`large-v2`	1550 MB	Slowest	Best quality, multiple languages

Memory Requirements for Long Videos (Whisper)

The actor automatically splits long audio into 10-minute chunks, so there is no video length limit. However, Whisper keeps the model and current chunk in RAM simultaneously:

Video length	Recommended memory
Up to ~30 minutes	2048 MB (default)
30 min – 2 hours	4096 MB
2 hours+	8192 MB

To set memory in the Apify UI: open your actor run → Input → Options → Memory. Native-caption runs have no meaningful memory requirement regardless of video length.

Limitations

Cookies required — Twitter restricts video access to authenticated sessions
Native captions availability — Not all Twitter videos have auto-generated captions; use whisper method for full coverage
Rate limits — Twitter may throttle rapid scraping; the actor applies human-like delays between requests
Proxy recommended — For high-volume runs, use Apify residential proxy to avoid IP bans

FAQ

Q: Why do I need cookies? Twitter requires authentication to serve video pages and caption tracks. Without cookies the actor cannot access video content.

Q: What if a video has no captions and I use method=native? The actor outputs a single row per tweet with transcriptAvailable: false and no segment fields. Switch to method=auto or method=whisper to use Whisper AI for those videos.

Q: Can I scrape multiple videos at once? Yes — add multiple URLs to postUrls. The actor processes them sequentially with delays to avoid rate limiting.

Q: Does this work with Twitter Spaces audio? No — Twitter Spaces use a different streaming format. This actor targets video posts only.

Q: How do I filter by language? All output rows include a language field. Use Apify's dataset filtering to select rows by language code.

Twitter/X AI Video Transcript Extractor

dev00/twitter-x-ai-video-transcript-extractor

Extract structured timestamped transcripts from Twitter/X URLs using AI speech-to-text.

dev00

Twitter / X Scraper

rupom888/twitter-scraper

Syed Rupom

Twitter Video Transcript API – AI Video to Text for X (Twitter)

apple_yang/twitter-video-transcript-api

Twitter Video Transcript API for converting video audio into accurate text using AI. Extract transcripts, spoken content, and metadata from X (Twitter) videos, tweets, and threads. Fast, reliable, and built for developers, AI agents, and automation workflows.