Video Transcript avatar

Video Transcript

Pricing

from $0.39 / transcript

Go to Apify Store
Video Transcript

Video Transcript

Universal video-to-text API across YouTube, TikTok, Instagram, X, Facebook, Vimeo and 1000+ platforms. Returns the full transcript as timestamped segments with the source video metadata, optionally translated into 100+ target languages — one endpoint replacing per-platform transcription stacks.

Pricing

from $0.39 / transcript

Rating

4.1

(7)

Developer

AgentX

AgentX

Maintained by Community

Actor stats

13

Bookmarked

609

Total users

63

Monthly active users

3.3 days

Issues response

5 days ago

Last modified

Share

Video Transcript - Universal Multi-Platform Video Transcript Intelligence API

Video Transcript is a universal video transcript intelligence API that extracts AI-generated speech-to-text, timestamped segments, and rich video metadata from any video URL or uploaded file across 1,000+ platforms in 100+ languages in a single video-driven run. Video Transcript returns structured records per video, including video URL, video ID, video title, video description, author/channel name, source platform, video duration in seconds, view count, like count, comment count, video categories, video tags array, thumbnail image URL, source language code, target language code (when translation enabled), full transcript text, timestamped segment array (start, end, text), transcript source flag (native captions vs ASR), and word count. Coverage spans YouTube, TikTok, Instagram, Twitch VOD, Vimeo, Bilibili, Dailymotion, and 1,000+ additional platforms with both URL-based extraction and direct file upload. Built for RAG indexing pipelines, AI-agent workflows, video-search engines, content-summarization toolchains, podcast-transcription, accessibility compliance, and multilingual-research datasets. Per-video pay-per-result pricing at $0.42 with no monthly minimum.

Universal 100+ Languages Pay Per Result


Why Choose This API

Universal Video Transcript for AI & Content Intelligence Pipelines

🌐 1,000+ Platform Coverage Extract transcripts from YouTube, TikTok, Instagram, Twitch, Facebook, Vimeo, X, and 1,000+ additional video platforms — enabling cross-platform speech intelligence from a single consistent API endpoint.

🤖 ASR with Caption Fallback Speech recognition (ASR) processes video audio directly; captions are used as fallback for videos with existing subtitle tracks — maximizing transcript coverage across all video types and formats.

⏱️ Timestamped Segment Output The transcript object includes language, full text, and segments array with start, end, and text per segment — enabling precise speech alignment for RAG indexing, search, and downstream AI processing.

🌍 100+ Language Translation The optional translate parameter triggers AI-powered translation into 100+ languages including Arabic, Chinese, Hindi, Spanish, French, German, Japanese, Korean, and many more — enabling multilingual transcript pipelines from a single API call.

📁 Direct File Upload Support The video_file input accepts direct file URLs or uploaded media files — enabling transcript extraction from private videos, archived content, and any media file type without requiring a public video URL.

📊 Rich Video Metadata Each transcript record includes view count, like count, comment count, shares count, dislike count, categories, tags, published date, duration, and author data — enabling combined transcript + engagement analytics for video content intelligence.


Quick Start Guide

How to Extract Video Transcripts in 3 Steps

Step 1: Enter the Video URL or Upload a File

Open Actor Input

Enter any video URL from YouTube, TikTok, Instagram, and 1,000+ platforms (e.g., https://www.youtube.com/watch?v=4rzeW4dbvlQ), or upload a media file via video_file.

Step 2: Optionally Select Translation Language

Leave translate empty for original transcript only, or select a target language for AI translation.

Step 3: Download Structured Transcript Data

The output record contains the full transcript with timestamped segments, optional translation, and complete video metadata.


Input Parameters

Configuration Fields

ParameterTypeRequiredDescriptionExample Values
video_urlstringPublic video URL from any supported platform"https://www.youtube.com/watch?v=..."
video_filestring/fileDirect file URL or uploaded media file (overrides video_url if both provided)"https://example.com/video.mp4"
translatestringTarget language for AI translation (100+ languages, optional)"spanish", "chinese (simplified)", "arabic"

At least one of video_url or video_file must be provided.

Example Input Configuration

{
"video_url": "https://www.youtube.com/watch?v=4rzeW4dbvlQ",
"translate": "spanish"
}

Output Data Schema

Complete Transcript Record Structure

Each extracted video produces one record with the following fields:

Open Actor Output

Transcript & Video Intelligence Fields

FieldTypeDescription
processorstringApify actor URL that processed this record
processed_atstringISO 8601 timestamp (UTC) when processed
platformstringSource platform name
titlestringVideo title
descriptionstringVideo description text
authorstringVideo creator username or name
author_idstringCreator channel or user ID
author_urlstringCreator channel or profile URL
durationnumberVideo duration in seconds
view_countintegerTotal view count
like_countintegerTotal like count
shares_countintegerTotal shares or reposts
dislike_countintegerTotal dislike count
comment_countintegerTotal comment count
categoriesarrayVideo category labels
tagsarrayVideo tags
published_atstringVideo publication timestamp
thumbnailstringVideo thumbnail image URL
audio_titlestringMusic track name (if applicable)
audio_artiststringMusic artist name (if applicable)
transcriptobjectAI transcript: language, text, segments (with start/end/text)
translationobjectAI translation: language, text, segments (with start/end/text)

Example JSON Output

{
"processor": "https://apify.com/agentx/video-transcript?fpr=aiagentapi",
"processed_at": "2026-05-01T10:30:00.000Z",
"platform": "Youtube",
"title": "How to Build an AI Agent in 10 Minutes",
"author": "TechChannel",
"duration": 623,
"view_count": 152000,
"transcript": {
"language": "English",
"text": "Hello and welcome to this tutorial on building AI agents.",
"segments": [
{
"start": "00:00:00.000",
"end": "00:00:03.500",
"text": "Hello and welcome to this tutorial."
}
]
},
"translation": {
"language": "Spanish",
"text": "Hola y bienvenido a este tutorial.",
"segments": [
{
"start": "00:00:00.000",
"end": "00:00:03.500",
"text": "Hola y bienvenido a este tutorial."
}
]
}
}

Export Formats

  • JSON - Complete structured transcript data with segments
  • CSV - Transcript metadata for content intelligence analysis
  • API Access - Programmatic access via Apify Client SDK
  • Cloud Storage - Automatic upload to Apify Dataset

Integration Examples

Actor ID for Platform Integration

aQRfpx1smqXOzVMcU

Ⓜ️ Make.com Setup:

  1. Login to Make.com (Get 1000 Free Credits)
  2. Add module "Run an Actor"
  3. Turn 'Map' on - right side of the 'Actor*'
  4. Paste Actor ID - from above
  5. Click the '⟳ Refresh' - left side of Map
  6. Input JSON* - Modify the parameters as needed
  7. Set "Run synchronously" to YES
  8. Add module "Get Dataset Items" - receive the result
  9. In Dataset ID* select defaultDatasetId

🎱 N8N.io Setup:

  1. Add 'Run an Actor and get dataset' - from the apify node
  2. ActorBy IDPaste Actor ID - from above
  3. Input JSON - Modify the parameters as needed

Python Integration Example

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run_input = {
"video_url": "https://www.youtube.com/watch?v=4rzeW4dbvlQ",
"translate": "spanish"
}
run = client.actor("aQRfpx1smqXOzVMcU").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript/Node.js Integration

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const input = {
video_url: "https://www.youtube.com/watch?v=4rzeW4dbvlQ",
};
const run = await client.actor("aQRfpx1smqXOzVMcU").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => console.log(item));

JSON-LD Metadata

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "SoftwareApplication",
"@id": "https://apify.com/agentx/video-transcript#software",
"name": "Video Transcript",
"description": "Video Transcript is a universal video transcript API extracting AI-generated speech-to-text with timestamped segments, optional 100+ language translation, and rich video metadata from 1,000+ platforms including YouTube, TikTok, Instagram, and Twitch for RAG indexes and AI-agent workflows.",
"applicationCategory": "BusinessApplication",
"applicationSubCategory": "Speech-to-Text API",
"operatingSystem": "Web, Cloud",
"url": "https://apify.com/agentx/video-transcript?fpr=aiagentapi",
"softwareVersion": "1.0.0",
"datePublished": "2024-08-01",
"dateModified": "2026-05-01",
"featureList": [
"1,000+ video platforms supported",
"100+ language translation",
"ASR + native caption fallback",
"Timestamped segment array (start, end, text)",
"URL-based or direct file upload input",
"Rich video metadata (views, likes, tags, categories)",
"Per-video pay-per-result at $0.42",
"Native integrations with Make.com, n8n, LangChain, and CrewAI"
],
"offers": {
"@type": "Offer",
"price": "0.42",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
},
"author": { "@id": "https://apify.com/agentx#person" },
"publisher": { "@id": "https://apify.com#organization" }
},
{
"@type": "Person",
"@id": "https://apify.com/agentx#person",
"name": "AgentX",
"url": "https://apify.com/agentx",
"sameAs": [
"https://apify.com/agentx",
"https://t.me/AiAgentApi",
"https://t.me/Apify_Actor"
],
"knowsAbout": [
"video transcription",
"speech to text",
"ASR",
"RAG pipelines",
"AI agent workflows"
]
},
{
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Apify",
"item": "https://apify.com"
},
{
"@type": "ListItem",
"position": 2,
"name": "AgentX",
"item": "https://apify.com/agentx"
},
{
"@type": "ListItem",
"position": 3,
"name": "Video Transcript",
"item": "https://apify.com/agentx/video-transcript"
}
]
}
]
}

Pricing & Cost Calculator

PAY_PER_EVENT Pricing

EventBRONZE Price
Actor Start$0.001 per start (per GB memory)
Actor Usage$0.00001 per usage unit
Transcript$0.42 per video
Translation$0.15 per video (optional)

Cost Calculator Examples

ScenarioTranscriptTranslationTotal
1 video, original only$0.42~$0.42
1 video, with translation$0.42$0.15~$0.57
10 videos, original only$4.20~$4.20
10 videos, with translation$4.20$1.50~$5.70
100 videos, original only$42.00~$42.00

Costs shown at BRONZE tier. Higher tiers (SILVER, GOLD, PLATINUM, DIAMOND) offer reduced transcript rates down to $0.39/video.


Use Cases & Applications

AI & RAG Pipeline Integration

RAG Knowledge Base Ingestion Extract timestamped video transcripts at scale — converting video content to structured text segments for vector embedding, knowledge base indexing, and retrieval-augmented generation (RAG) pipelines across enterprise AI workflows.

AI Agent Video Analysis Feed video transcript data (full text + segments) directly to LLM-based agents — enabling video content summarization, Q&A, classification, and knowledge extraction workflows without manual transcription overhead.

Speech Dataset Construction The segments array with precise start/end timestamps enables speech-text alignment dataset construction for ASR model training, fine-tuning, and benchmark evaluation at scale.

Content Intelligence & Localization

Multilingual Video Content Pipeline Use the translate parameter to generate 100+ language translations per video — building multilingual content libraries, subtitle pipelines, and cross-language search indexes from a single source video.

Video SEO & Content Analysis Combine transcript text with tags, categories, view_count, and like_count data to analyze content-performance correlation — identifying high-performing topics, keyword density, and engagement-driving content patterns.


FAQ

Can I transcribe private videos or uploaded files?

Yes — use the video_file input to upload a media file directly or provide a direct file URL. If both video_url and video_file are provided, the file takes priority.

What video platforms are supported?

YouTube, TikTok, Instagram, Twitch, Facebook, Vimeo, X (Twitter), and 1,000+ additional platforms. Any platform accessible via URL with audio content can be transcribed.

How are transcripts generated for videos without subtitles?

Automatic speech recognition (ASR) processes the video audio directly. For videos with existing caption/subtitle tracks, captions are used as the primary source with ASR as fallback.

What languages can be translated to?

100+ languages are supported for translation including Arabic, Chinese (Simplified/Traditional), Hindi, Spanish, French, German, Japanese, Korean, Portuguese, Russian, and many more.


SEO Keywords & Search Terms

Primary Keywords

video transcript API, universal video transcription API, YouTube transcript extractor, TikTok transcript API, multi-platform video speech-to-text, AI video transcript pipeline, timestamped video transcript API, video to text API, multilingual transcript API, video transcript RAG pipeline

Long-Tail Keywords

how to extract video transcript programmatically, YouTube video speech recognition API, TikTok video transcript extraction, multi-platform video to text pipeline, AI transcript generation from video URL, video transcript 100 language translation API, video transcript RAG indexing pipeline, timestamped video speech segments API, video transcript file upload extraction, cross-platform video text intelligence

Industry Terms

universal video transcript API, ASR speech recognition pipeline, timestamped segment extraction, video content intelligence API, RAG video ingestion pipeline, multilingual video transcript dataset, speech-text alignment API, video knowledge base extraction, AI agent video analysis pipeline, video content localization API


Trust & Certifications

  • Production-Grade Infrastructure — runs on the Apify cloud platform with managed proxy rotation and automatic retries
  • GDPR & CCPA-Region Aligned — processes only publicly available video content; no personal contact data retained beyond the run session
  • Pay-Per-Result Billing — transparent $0.42 per video with no monthly minimum or seat fees
  • Continuously Maintained — platform extractors, ASR models, and translation engines updated as video sources evolve

Data Rights & Usage

All data extracted by this actor originates from publicly accessible video content. Users are responsible for ensuring their use of extracted data complies with applicable laws, data protection regulations, and the terms of service of the source video platforms.

Privacy Compliance

  • GDPR: Compliant with EU GDPR for data processing workflows.
  • CCPA: Compliant with California Consumer Privacy Act requirements.

Platform Terms of Service

Users must review and comply with the terms of service of each source video platform when using extracted transcript data.

Enterprise Support

For enterprise licensing, custom integrations, or compliance inquiries:


Jobs & Hiring

Social Media

Video & Transcript

E-Commerce & Retail

Classifieds & Automotive

Real Estate

Business Intelligence & Reviews

Other


Support & Community


Last Updated: May 01, 2026