# Twitter / X Video Transcript Scraper (`crawlerbros/twitter-transcript-scraper`) Actor

Extract transcripts from Twitter/X video posts. Returns timestamped segments using native Twitter captions (WebVTT) with automatic Whisper AI fallback for uncaptioned videos

- **URL**: https://apify.com/crawlerbros/twitter-transcript-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** AI, Social media, Videos
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, 7 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Twitter / X Video Transcript Scraper

Extract full, timestamped transcripts from Twitter/X video posts — automatically using native Twitter captions (WebVTT) with Whisper AI speech-to-text as a fallback for uncaptioned videos.

### Features

- **Native captions first** — intercepts Twitter's built-in WebVTT subtitle tracks for fastest, most accurate results
- **Whisper AI fallback** — uses [faster-whisper](https://github.com/guillaumekleinhans/faster-whisper) to transcribe audio when no native captions are available
- **Timestamped segments** — every output row includes `startTime`, `endTime`, and `text` for precise video navigation
- **Full transcript** — each row also carries the complete joined transcript for easy search
- **Flexible method control** — choose `auto` (native → Whisper), `native only`, or `Whisper only`
- **Multi-language support** — native captions in any language; optional language hint for Whisper
- **Anti-detection** — Playwright Firefox with stealth fingerprinting, randomised viewports/user-agents, and human-like delays

### Input

| Field | Type | Required | Description |
|---|---|---|---|
| `postUrls` | string[] | ✅ | Twitter/X video post URLs (`twitter.com` or `x.com` both accepted) |
| `cookies` | string | ✅ | Twitter/X session cookies JSON (`auth_token` + `ct0` required) |
| `transcriptionMethod` | select | | `auto` (default), `native`, or `whisper` |
| `whisperModel` | select | | `tiny`, `base` (default), `small`, `medium`, `large-v2` |
| `language` | string | | ISO 639-1 hint for Whisper (e.g. `en`, `es`, `fr`) |
| `proxyConfiguration` | object | | Apify proxy settings |

#### How to get Twitter cookies

1. Log in to [x.com](https://x.com) in your browser
2. Open DevTools → **Application** → **Cookies** → `https://x.com`
3. Copy the `auth_token` and `ct0` cookie values
4. Export all cookies as JSON (e.g. using the [EditThisCookie](https://chrome.google.com/webstore/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg) browser extension)
5. Paste the JSON array into the `cookies` input field

Cookies expire periodically — re-export if you see `expired_cookies` errors.

### Output

Each dataset row represents **one transcript segment**. Tweet metadata is repeated on every row for easy filtering.

| Field | Type | Description |
|---|---|---|
| `tweetUrl` | string | Canonical `x.com/…/status/…` URL |
| `tweetId` | string | Numeric tweet ID |
| `authorUsername` | string | Twitter handle (without `@`) |
| `authorName` | string | Display name |
| `tweetText` | string | Tweet caption / body text |
| `publishedAt` | string | ISO 8601 publish timestamp |
| `language` | string | ISO 639-1 language code |
| `transcriptMethod` | string | `native` or `whisper` |
| `transcriptAvailable` | boolean | `false` for tweets with no extractable transcript |
| `segmentIndex` | integer | 0-based position within the transcript |
| `startTime` | float | Segment start time in seconds |
| `endTime` | float | Segment end time in seconds |
| `text` | string | Segment transcript text |
| `fullTranscript` | string | All segments joined into one string |
| `scrapedAt` | string | ISO 8601 scrape timestamp |

#### Sample output record

```json
{
  "tweetUrl": "https://x.com/NASA/status/1858131747319566780",
  "tweetId": "1858131747319566780",
  "authorUsername": "NASA",
  "authorName": "NASA",
  "tweetText": "Watch our latest discovery announcement…",
  "publishedAt": "2024-11-17T18:30:00.000Z",
  "language": "en",
  "transcriptMethod": "native",
  "transcriptAvailable": true,
  "segmentIndex": 0,
  "startTime": 0.0,
  "endTime": 3.44,
  "text": "We made a remarkable discovery this week",
  "fullTranscript": "We made a remarkable discovery this week that changes our understanding of the solar system.",
  "scrapedAt": "2025-01-15T10:22:33.456Z"
}
````

### Transcription Methods

| Method | When to use | Speed | Accuracy |
|---|---|---|---|
| `auto` | Default — tries native first, Whisper fallback | Fast when native available | High |
| `native` | Only want videos with Twitter captions | Fastest | Highest (verbatim) |
| `whisper` | All videos, including those without captions | Slower | High (model-dependent) |

### Whisper Model Selection

| Model | Size | Speed | Use case |
|---|---|---|---|
| `tiny` | 32 MB | Fastest | Quick drafts, high-volume runs |
| `base` | 74 MB | Fast | Default — good balance |
| `small` | 244 MB | Medium | Better accuracy for accented speech |
| `medium` | 769 MB | Slow | High accuracy |
| `large-v2` | 1550 MB | Slowest | Best quality, multiple languages |

### Memory Requirements for Long Videos (Whisper)

The actor automatically splits long audio into 10-minute chunks, so **there is no video length limit**. However, Whisper keeps the model and current chunk in RAM simultaneously:

| Video length | Recommended memory |
|---|---|
| Up to ~30 minutes | 2048 MB (default) |
| 30 min – 2 hours | 4096 MB |
| 2 hours+ | 8192 MB |

To set memory in the Apify UI: open your actor run → **Input** → **Options** → **Memory**. Native-caption runs have no meaningful memory requirement regardless of video length.

### Limitations

- **Cookies required** — Twitter restricts video access to authenticated sessions
- **Native captions availability** — Not all Twitter videos have auto-generated captions; use `whisper` method for full coverage
- **Rate limits** — Twitter may throttle rapid scraping; the actor applies human-like delays between requests
- **Proxy recommended** — For high-volume runs, use Apify residential proxy to avoid IP bans

### FAQ

**Q: Why do I need cookies?**
Twitter requires authentication to serve video pages and caption tracks. Without cookies the actor cannot access video content.

**Q: What if a video has no captions and I use `method=native`?**
The actor outputs a single row per tweet with `transcriptAvailable: false` and no segment fields. Switch to `method=auto` or `method=whisper` to use Whisper AI for those videos.

**Q: Can I scrape multiple videos at once?**
Yes — add multiple URLs to `postUrls`. The actor processes them sequentially with delays to avoid rate limiting.

**Q: Does this work with Twitter Spaces audio?**
No — Twitter Spaces use a different streaming format. This actor targets video posts only.

**Q: How do I filter by language?**
All output rows include a `language` field. Use Apify's dataset filtering to select rows by language code.

# Actor input Schema

## `postUrls` (type: `array`):

One or more Twitter/X video post URLs to transcribe. Both `twitter.com` and `x.com` domains are accepted.

## `cookies` (type: `string`):

**Required.** Twitter/X authentication cookies in JSON format. Needed to access video pages and trigger caption loading. Export from your browser's DevTools → Application → Cookies → `x.com`. Must include `auth_token` and `ct0`. Format: `[{"name":"auth_token","value":"...","domain":".x.com"}, ...]`.

## `transcriptionMethod` (type: `string`):

How to obtain the transcript. **Auto** tries native Twitter captions first and falls back to Whisper AI if none are found. **Native only** skips videos without captions. **Whisper only** always uses Whisper AI speech-to-text.

## `whisperModel` (type: `string`):

Whisper AI model to use when transcribing with speech-to-text. Larger models are more accurate but slower. `base` is a good balance for most use cases. Long videos are automatically split into 10-minute chunks — there is no length limit.

## `language` (type: `string`):

Optional ISO 639-1 language code (e.g. `en`, `es`, `fr`) to hint Whisper AI. Improves accuracy when the video language is known. Leave blank for auto-detection. Has no effect when using native captions.

## `proxyConfiguration` (type: `object`):

Optional Apify proxy settings. Recommended if you encounter rate limits. Leave empty to run without a proxy.

## Actor input object example

```json
{
  "postUrls": [
    "https://x.com/NASA/status/1234567890123456789",
    "https://twitter.com/elonmusk/status/9876543210987654321"
  ],
  "transcriptionMethod": "auto",
  "whisperModel": "base",
  "language": "en"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "postUrls": [
        "https://x.com/TuckerCarlson/status/1843375397024485778"
    ],
    "cookies": "",
    "transcriptionMethod": "auto",
    "whisperModel": "base"
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/twitter-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "postUrls": ["https://x.com/TuckerCarlson/status/1843375397024485778"],
    "cookies": "",
    "transcriptionMethod": "auto",
    "whisperModel": "base",
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/twitter-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "postUrls": [
    "https://x.com/TuckerCarlson/status/1843375397024485778"
  ],
  "cookies": "",
  "transcriptionMethod": "auto",
  "whisperModel": "base"
}' |
apify call crawlerbros/twitter-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/twitter-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Twitter / X Video Transcript Scraper",
        "description": "Extract transcripts from Twitter/X video posts. Returns timestamped segments using native Twitter captions (WebVTT) with automatic Whisper AI fallback for uncaptioned videos",
        "version": "1.0",
        "x-build-id": "BfJGevRdlkcog0j22"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~twitter-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-twitter-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~twitter-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-twitter-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~twitter-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-twitter-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "postUrls",
                    "cookies"
                ],
                "properties": {
                    "postUrls": {
                        "title": "Tweet / Post URLs",
                        "minItems": 1,
                        "type": "array",
                        "description": "One or more Twitter/X video post URLs to transcribe. Both `twitter.com` and `x.com` domains are accepted.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "cookies": {
                        "title": "Twitter / X Cookies (Required)",
                        "type": "string",
                        "description": "**Required.** Twitter/X authentication cookies in JSON format. Needed to access video pages and trigger caption loading. Export from your browser's DevTools → Application → Cookies → `x.com`. Must include `auth_token` and `ct0`. Format: `[{\"name\":\"auth_token\",\"value\":\"...\",\"domain\":\".x.com\"}, ...]`."
                    },
                    "transcriptionMethod": {
                        "title": "Transcription Method",
                        "enum": [
                            "auto",
                            "native",
                            "whisper"
                        ],
                        "type": "string",
                        "description": "How to obtain the transcript. **Auto** tries native Twitter captions first and falls back to Whisper AI if none are found. **Native only** skips videos without captions. **Whisper only** always uses Whisper AI speech-to-text.",
                        "default": "auto"
                    },
                    "whisperModel": {
                        "title": "Whisper Model Size",
                        "enum": [
                            "tiny",
                            "base",
                            "small",
                            "medium",
                            "large-v2"
                        ],
                        "type": "string",
                        "description": "Whisper AI model to use when transcribing with speech-to-text. Larger models are more accurate but slower. `base` is a good balance for most use cases. Long videos are automatically split into 10-minute chunks — there is no length limit.",
                        "default": "base"
                    },
                    "language": {
                        "title": "Language Hint (Whisper only)",
                        "type": "string",
                        "description": "Optional ISO 639-1 language code (e.g. `en`, `es`, `fr`) to hint Whisper AI. Improves accuracy when the video language is known. Leave blank for auto-detection. Has no effect when using native captions."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional Apify proxy settings. Recommended if you encounter rate limits. Leave empty to run without a proxy."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
