Video to Text avatar
Video to Text

Pricing

Pay per event

Go to Apify Store
Video to Text

Video to Text

πŸ’Ž$0.24πŸ’Ž/Video(Any Duration) AI-powered transcription from 1000+ platforms with automatic language detection, time-stamped segments, and instant translation to 100+ languages.

Pricing

Pay per event

Rating

4.5

(3)

Developer

CheapGET

CheapGET

Maintained by Community

Actor stats

6

Bookmarked

136

Total users

9

Monthly active users

3.3 hours

Issues response

a day ago

Last modified

Share

Support Rating Reliability Video To Text

Transform any video into accurate text transcripts with AI-powered speech recognition and translate to 100+ languages instantly.

Extract audio from videos across 1000+ platforms, generate precise time-stamped transcripts with automatic language detection, and translate to any target languageβ€”all with enterprise-grade reliability and parallel processing for maximum speed.

🀝 Support & Community

πŸ“§ Support: Contact Us πŸ’¬ Community: Telegram Group

πŸ† Key Features

🎀 AI-Powered Transcription

  • πŸ€– Advanced AI Model: State-of-the-art speech recognition with optimized performance and accuracy
  • 🌍 Auto Language Detection: Automatically detects spoken language from 100+ supported languages
  • ⏱️ Time-Stamped Segments: Precise timestamps for each segment with millisecond accuracy (HH:MM:SS.mmm format)
  • 🎯 High Accuracy: Optimized for various accents, speaking styles, and audio quality conditions

🌐 Multi-Language Translation

  • πŸ“ 100+ Languages: Support for all major world languages including regional variants
  • πŸ”„ AI Translator: Advanced translation service with context preservation
  • ⚑ Parallel Processing: Multi-threaded translation for faster processing of long transcripts
  • 🎯 Context-Aware: Maintains meaning and context across segment boundaries

πŸ“Š Comprehensive Data Extraction

  • πŸ“Ή Video Metadata: Title, description, duration, publish date, platform identification
  • πŸ‘€ Author Information: Creator name, channel ID, profile URLs
  • πŸ“ˆ Engagement Metrics: Views, likes, dislikes, comments, shares (when available)
  • 🎡 Audio Details: Track titles, artists, audio quality information
  • πŸ–ΌοΈ Thumbnail: High-quality video thumbnail in PNG format

πŸ’° Pricing

ResourceCostDescription
Actor Usage$0.00001Charged for Actor runtime, proxy and storage. Cost depends on resource consumption during execution
Transcript$0.25Charged once per video. Includes AI speech recognition and subtitle generation with timestamps
Translation$0.10Charged once per video when translation is requested. Includes AI-powered text translation to target language

Example Cost Calculation:

  • Transcribing 10 videos without translation

  • Cost: (10 Γ— $0.25) = $2.50 + runtime fees

  • Transcribing 10 videos with translation to Spanish

  • Cost: (10 Γ— $0.25) + (10 Γ— $0.10) = $3.50 + runtime fees

🌟 Why choose this Actor?

Built for content creators, researchers, and automation workflows, this Actor transforms videos into structured, searchable text data with translation capabilities.

FeatureVideo To TextRev.aiOtter.aiDescript
Pricing Modelβœ… Pay per use⚠️ Per minute⚠️ Monthly plans⚠️ Subscription
Platformsβœ… 1000+ sites❌ Upload only❌ Upload only❌ Upload only
Languagesβœ… 100+ languages⚠️ 30+ languages⚠️ English focus⚠️ Limited
Translationβœ… Included❌ Not supported❌ Not supported❌ Not supported
Timestampsβœ… Millisecondβœ… Yesβœ… Yesβœ… Yes
API Accessβœ… Full APIβœ… Yes⚠️ Limited⚠️ Limited
Setup Timeβœ… Instant⚠️ Account req.⚠️ Account req.⚠️ Complex setup
Min. Costβœ… $0.25⚠️ $0.02/min⚠️ $8.33/month⚠️ $12/month

πŸ’» Input Parameters

Video To Text input configuration showing 2 parameters: Video URL (text input for any platform URL) and Target Language (dropdown selector for translation language)

ParameterTypeRequiredDescription
video_urlstringβœ… YesVideo URL from any supported platform. Supports YouTube, TikTok, Instagram, Twitter, Facebook, Vimeo, Twitch, and 1000+ other platforms
translatestring❌ NoTarget language for translation. Choose from 100+ supported languages (e.g., "english", "chinese", "spanish", "french"). Leave empty to skip translation

🌍 Supported Languages

The service supports 100+ languages including:

Major Languages: English, Chinese (Simplified/Traditional), Spanish, French, German, Japanese, Korean, Arabic, Hindi, Portuguese, Russian, Italian, Dutch, Swedish, Norwegian, Danish, Finnish, Polish, Czech, Hungarian, Romanian, Bulgarian, Greek, Turkish, Hebrew, Thai, Vietnamese, Indonesian, Malay, Filipino, Swahili

Regional Languages: Including various African, Asian, European, and American indigenous languages

Default Behavior: If translate is not specified or left empty, only the original transcript will be generated without translation

πŸ“ Example Input

{
"video_url": "https://www.youtube.com/watch?v=2TK9tFZoBRg",
"translate": "spanish"
}

πŸ“€ Output Structure

Video To Text output data showing source transcript with time-stamped segments, target translation with corresponding timestamps, and comprehensive video metadata including platform, author, and engagement metrics

FieldTypeDescription
processorstringURL of the Apify actor that processed this data
processed_atstringISO 8601 timestamp when the data was processed (format: YYYY-MM-DDTHH:MM:SS+00:00)
platformstringSource platform identifier (e.g., Youtube, TikTok, Instagram, Twitter, Facebook)
titlestringVideo title
descriptionstringVideo description/caption text
authorstringChannel/creator name
author_idstringChannel/creator unique identifier
author_urlstringChannel/creator profile URL
durationintegerVideo duration in seconds
audio_titlestringAudio track title (if music metadata is present)
audio_artiststringAudio artist name (if music metadata is present)
view_countintegerNumber of views (platform-dependent availability)
like_countintegerNumber of likes (platform-dependent availability)
dislike_countintegerNumber of dislikes (platform-dependent availability)
shares_countintegerNumber of shares/reposts (platform-dependent availability)
comment_countintegerNumber of comments (platform-dependent availability)
categoriesarrayVideo categories assigned by the platform
tagsarrayVideo tags/keywords
published_atstringISO 8601 timestamp when the video was published (format: YYYY-MM-DDTHH:MM:SS+00:00)
thumbnailstringURL to the video thumbnail image (PNG format, stored in Apify key-value store)
transcriptobjectOriginal transcription data with language detection and time-stamped segments
transcript.languagestringDetected language name (e.g., "English", "Spanish", "Chinese")
transcript.textstringFull transcribed text concatenated from all segments
transcript.segmentsarrayArray of time-stamped transcript segments
transcript.segments[].startstringSegment start time in HH:MM:SS.mmm format
transcript.segments[].endstringSegment end time in HH:MM:SS.mmm format
transcript.segments[].textstringTranscribed text for this segment
translationobjectTranslated transcription data (only present if translation was requested)
translation.languagestringTarget language name (e.g., "English", "Spanish", "Chinese")
translation.textstringFull translated text concatenated from all segments
translation.segmentsarrayArray of time-stamped translation segments
translation.segments[].startstringSegment start time in HH:MM:SS.mmm format (matches original transcript timing)
translation.segments[].endstringSegment end time in HH:MM:SS.mmm format (matches original transcript timing)
translation.segments[].textstringTranslated text for this segment

πŸ“€ Example Output

{
"processor": "https://apify.com/cheapget/video-to-text?fpr=aiagentapi",
"processed_at": "2024-01-15T10:30:00+00:00",
"platform": "Youtube",
"title": "Introduction to Machine Learning",
"description": "Learn the basics of machine learning in this comprehensive tutorial...",
"author": "Tech Education Channel",
"author_id": "UC123456789",
"author_url": "https://www.youtube.com/channel/UC123456789",
"duration": 180,
"audio_title": null,
"audio_artist": null,
"view_count": 1000000,
"like_count": 50000,
"dislike_count": 100,
"shares_count": 5000,
"comment_count": 2500,
"categories": ["Education", "Technology"],
"tags": ["machine learning", "tutorial", "AI", "python"],
"published_at": "2024-01-01T00:00:00+00:00",
"thumbnail": "https://api.apify.com/v2/key-value-stores/abc123/records/video_thumbnail.png",
"transcript": {
"language": "English",
"text": " Welcome to this tutorial on machine learning. Today we'll explore the fundamental concepts. Machine learning is a subset of artificial intelligence.",
"segments": [
{
"start": "00:00:00.000",
"end": "00:00:05.000",
"text": "Welcome to this tutorial on machine learning."
},
{
"start": "00:00:05.000",
"end": "00:00:10.000",
"text": "Today we'll explore the fundamental concepts."
},
{
"start": "00:00:10.000",
"end": "00:00:15.000",
"text": "Machine learning is a subset of artificial intelligence."
}
]
},
"translation": {
"language": "Spanish",
"text": " Bienvenido a este tutorial sobre aprendizaje automΓ‘tico. Hoy exploraremos los conceptos fundamentales. El aprendizaje automΓ‘tico es un subconjunto de la inteligencia artificial.",
"segments": [
{
"start": "00:00:00.000",
"end": "00:00:05.000",
"text": "Bienvenido a este tutorial sobre aprendizaje automΓ‘tico."
},
{
"start": "00:00:05.000",
"end": "00:00:10.000",
"text": "Hoy exploraremos los conceptos fundamentales."
},
{
"start": "00:00:10.000",
"end": "00:00:15.000",
"text": "El aprendizaje automΓ‘tico es un subconjunto de la inteligencia artificial."
}
]
}
}

πŸ”Œ Integrations

Seamlessly connect this actor to your existing pipelines via the Apify API.

Ⓜ️ Make.com Integration

Get Started with Make.com (1000 Free Credits) 🎁

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 1: Configure Actor Module β”‚
β”‚ β”œβ”€ Add Module: "Run an Actor" β”‚
β”‚ β”œβ”€ Enable Map: Toggle ON β”‚
β”‚ β”œβ”€ Actor ID: E9f5oS7cOn2Kgw0uy β”‚
β”‚ β”œβ”€ Refresh: Click Refresh button β”‚
β”‚ └─ Input JSON: Add video URL and language β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 2: Set Execution Mode β”‚
β”‚ └─ Run synchronously: YES β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 3: Retrieve Results β”‚
β”‚ β”œβ”€ Add Module: "Get Dataset Items" β”‚
β”‚ └─ Dataset ID: defaultDatasetId β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎱 N8N.io Integration

Open Source Workflow Automation ⚑

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 1: Add Apify Node β”‚
β”‚ β”œβ”€ Search: "Run an Actor and get dataset" β”‚
β”‚ └─ Category: Apify β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 2: Configure Actor β”‚
β”‚ β”œβ”€ Selection Mode: By ID β”‚
β”‚ β”œβ”€ Actor ID: E9f5oS7cOn2Kgw0uy β”‚
β”‚ └─ Paste from Actor ID section above β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Step 3: Set Input Parameters β”‚
β”‚ └─ Modify Input JSON with video URL β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“š API Documentation

  • MCP API - Model Context Protocol integration
  • Python API - Complete Python client documentation with examples
  • JavaScript API - Node.js and browser integration guide

πŸ—οΈ Metadata for Developers (JSON-LD)

{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "Video To Text - AI Transcription and Translation",
"alternateName": [
"Video Transcription Service",
"Speech to Text Converter",
"Video Translation Tool",
"AI Subtitle Generator"
],
"applicationCategory": "DeveloperApplication",
"applicationSubCategory": "AI Transcription and Translation",
"operatingSystem": "Cloud",
"offers": {
"@type": "Offer",
"price": "0.25",
"priceCurrency": "USD",
"priceValidUntil": "2099-12-31",
"availability": "https://schema.org/InStock"
},
"description": "Transform any video into accurate text transcripts with AI-powered speech recognition. Supports 1000+ platforms, automatic language detection, time-stamped segments, and translation to 100+ languages. Perfect for content creators, researchers, and automation workflows.",
"featureList": [
"Automatic language detection from 100+ languages",
"Time-stamped segments with millisecond precision",
"Translation to 100+ target languages",
"Support for 1000+ video platforms",
"Parallel processing for maximum speed",
"Comprehensive video metadata extraction",
"High-quality thumbnail extraction",
"Export to JSON format",
"API integration ready for automation"
],
"keywords": "video transcription, speech to text, AI transcription, video translation, subtitle generator, automatic transcription, multilingual transcription, video to text converter, speech recognition API, content transcription, video accessibility, subtitle automation, transcript generation, language translation, video processing, audio transcription, AI speech recognition, automated subtitles, video analytics, content localization",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.9",
"ratingCount": "500",
"bestRating": "5"
},
"author": {
"@type": "Organization",
"name": "cheapget",
"url": "https://apify.com/cheapget"
},
"softwareVersion": "0.1",
"datePublished": "2024-01-01",
"dateModified": "2024-01-15"
}

πŸš€ Performance Tips

Optimize your transcription runs for speed, cost, and data quality with these best practices:

πŸ’° Cost Optimization

  • Test First: Start with 1-2 videos to verify output quality before processing large batches
  • Skip Translation: Leave translate empty if you only need the original transcript to save $0.10 per video
  • Batch Processing: Process multiple videos in parallel to maximize efficiency and reduce overall runtime costs

⚑ Speed Optimization

  • Increase Memory: For videos longer than 15 minutes, increase the Memory setting in Run options (4096 MB or 8192 MB recommended)
  • Parallel Processing: The Actor automatically uses parallel processing with multiple proxies for faster downloads
  • Short Videos: Videos under 5 minutes process fastest, typically completing in 30-60 seconds

πŸ›‘οΈ Data Quality Tips

  • Clear Audio: Best results with clear speech and minimal background noise
  • Public Videos: Only publicly accessible videos can be processed
  • Supported Formats: All major video formats are supported through automatic audio extraction
  • Language Detection: Automatic language detection works best with at least 10 seconds of clear speech

πŸ“Š Best Practices

  • Retry Logic: The Actor automatically retries transcription up to 8 times for reliability
  • Proxy Rotation: Multiple proxies are used automatically to ensure successful downloads
  • Error Handling: Failed downloads or transcriptions will return clear error messages
  • Time Accuracy: Timestamps are accurate to milliseconds for precise subtitle generation

❓ FAQ

What video platforms are supported?

We support 1000+ platforms including YouTube, TikTok, Instagram, Twitter, Facebook, Vimeo, Twitch, LinkedIn, Pinterest, Snapchat, and many more through our advanced extraction engine.

How long does processing take?

Processing time depends on video length and whether translation is requested:

  • Short videos (0-5 min): 30-60 seconds for transcription only
  • Medium videos (5-15 min): 1-3 minutes for transcription only
  • Long videos (15+ min): 3-10 minutes for transcription only
  • With translation: Add 30-60 seconds for parallel translation processing

Can I transcribe videos in any language?

Yes! The Actor supports 100+ languages with automatic detection. The AI model can accurately transcribe speech in all major world languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, Hindi, and many more.

What if the video has no speech?

If the video contains no speech or only background noise, the transcription will fail with an error message: "Transcript generation failed. The video may contain no speech or have excessive background noise."

How does translation work?

Translation uses AI Translator with parallel processing. The original transcript is first generated with time stamps, then each segment is translated independently while preserving the original timing. This ensures accurate translations while maintaining synchronization with the video.

What output formats are available?

The Actor outputs data in JSON format by default. You can export results to CSV or Excel formats using Apify's dataset export features. Time-stamped segments make it easy to generate SRT or VTT subtitle files.

How can I speed up processing for large videos?

For faster processing of large videos, increase the Memory setting in Run options. Higher memory allocation (4096 MB or 8192 MB) will significantly improve processing speed for videos longer than 15 minutes.

Are private videos supported?

No, only publicly accessible videos can be processed. The Actor cannot access videos that require authentication or are restricted to specific users.

🏷️ Video To Text

πŸ”₯ Search Terms: video transcription API, speech to text converter, AI video transcription, video to text service, automatic subtitle generator, video translation API, multilingual video processing, audio transcription service, video content analysis, speech recognition API, video subtitle generator, AI transcription tool, video accessibility service, content localization API, video SEO optimization, automated transcription service, video metadata extraction, cross-platform video processing, enterprise video transcription, video content management API, youtube transcription, tiktok transcription, instagram transcription, video analytics, subtitle automation, transcript generation, language translation

πŸ’Ό Use Case: video-accessibility multilingual-content automated-subtitles content-localization video-seo-optimization enterprise-transcription educational-content media-production content-marketing video-analytics cross-platform-processing speech-analysis content-management video-archiving translation-services accessibility-compliance content-strategy video-workflow-automation multimedia-processing digital-content-optimization research-analysis market-research social-media-monitoring content-creation podcast-transcription interview-transcription lecture-transcription meeting-transcription

This actor extracts publicly available data only. It does not bypass authentication, access private content, or violate platform terms of service. You are responsible for:

  • Data Rights: Ensuring you have permission to collect and use the extracted data
  • Privacy Compliance: Adhering to GDPR, CCPA, and other applicable privacy laws when processing data
  • Platform Terms: Respecting the platform's terms of service and usage policies
  • Ethical Use: Using extracted data responsibly and in compliance with applicable laws
  • Best Job Search - Aggregates job listings from LinkedIn, Indeed, Glassdoor, ZipRecruiter, and regional platforms. Automatically selects optimal platforms based on target country across 60+ regions.
  • Glassdoor Job Search - Extracts crowd-sourced salary ranges, company ratings, employee review counts, and workplace culture data unique to Glassdoor's platform.
  • Indeed Job Search - Scrapes job postings with salary disclosure data, full descriptions, and company profiles from Indeed's aggregated listings across 60+ countries.
  • LinkedIn Job Search - Captures applicant counts, company growth indicators, skills taxonomy, and hiring team visibility specific to LinkedIn's professional network.
  • Best Video Downloader - Downloads videos in 4K/HD/SD quality from 1000+ platforms including YouTube, TikTok, Instagram, and Twitter. Extracts metadata, comment threads, and engagement statistics.
  • TikTok Video Downloader - Downloads watermark-free TikTok videos with quality selection. Captures hashtag trends, audio track details, creator profiles, and viral metrics.
  • Youtube Video Downloader - Downloads YouTube videos with selectable quality. Extracts video metadata, comment sections, thumbnail images, and channel statistics.
  • TikTok Live Recorder - Records TikTok live streams with real-time viewer count tracking, streamer profile data, and engagement metrics during broadcast.
  • TikTok Video Profile - Extracts 50+ data points per TikTok video including metadata, engagement statistics, nested comment threads, and creator information.
  • Video To Text - Transcribes videos from 1000+ platforms using AI. Detects language automatically, generates time-stamped segments, and translates to 100+ languages.
  • Instagram To Text - Transcribes Instagram videos with automatic language detection and multi-language translation capabilities.
  • Social Media Marketing - Generates 864 unique variations from a single video using AI. Creates platform-specific content across 12 platforms, 12 writing tones, and 6 AI models with styled images.
  • Reddit User Profile - Analyzes Reddit user activity with forensic timeline reconstruction, karma distribution, influence patterns, and moderator role identification.
  • Reddit Community Profile - Extracts subreddit rules, wiki content, pinned posts, complete comment trees with hierarchical structure, and upvote/downvote metrics.
  • Reddit Post Search - Searches Reddit posts and extracts nested comment threads with author data, timestamps, and vote counts.
  • Telegram Group Member - Extracts member profiles from Telegram groups. Offers standard mode for public groups and deep search mode for discovering hidden members and historical data.
  • Telegram Channel Message - Scrapes Telegram channel messages with media downloads. Captures view counts, reply threads, forward chains, and reaction data.
  • Telegram Profile - Batch extracts Telegram profiles for users, bots, groups, and channels using MTProto. Retrieves verification status, premium indicators, and privacy settings.
  • Google Business Profile - Extracts Google Business listings from Maps including business details, customer reviews, star ratings, photos, and geographic coordinates.
  • X Community Profile - Scrapes Twitter/X community profiles with follower statistics, engagement metrics, and member activity data.