Video to Text
Pricing
Pay per event
Video to Text
π$0.24π/Video(Any Duration) AI-powered transcription from 1000+ platforms with automatic language detection, time-stamped segments, and instant translation to 100+ languages.
Pricing
Pay per event
Rating
4.5
(3)
Developer

CheapGET
Actor stats
6
Bookmarked
136
Total users
9
Monthly active users
3.3 hours
Issues response
a day ago
Last modified
Categories
Share
Transform any video into accurate text transcripts with AI-powered speech recognition and translate to 100+ languages instantly.
Extract audio from videos across 1000+ platforms, generate precise time-stamped transcripts with automatic language detection, and translate to any target languageβall with enterprise-grade reliability and parallel processing for maximum speed.
π€ Support & Community
π§ Support: Contact Us π¬ Community: Telegram Group
π Key Features
π€ AI-Powered Transcription
- π€ Advanced AI Model: State-of-the-art speech recognition with optimized performance and accuracy
- π Auto Language Detection: Automatically detects spoken language from 100+ supported languages
- β±οΈ Time-Stamped Segments: Precise timestamps for each segment with millisecond accuracy (HH:MM:SS.mmm format)
- π― High Accuracy: Optimized for various accents, speaking styles, and audio quality conditions
π Multi-Language Translation
- π 100+ Languages: Support for all major world languages including regional variants
- π AI Translator: Advanced translation service with context preservation
- β‘ Parallel Processing: Multi-threaded translation for faster processing of long transcripts
- π― Context-Aware: Maintains meaning and context across segment boundaries
π Comprehensive Data Extraction
- πΉ Video Metadata: Title, description, duration, publish date, platform identification
- π€ Author Information: Creator name, channel ID, profile URLs
- π Engagement Metrics: Views, likes, dislikes, comments, shares (when available)
- π΅ Audio Details: Track titles, artists, audio quality information
- πΌοΈ Thumbnail: High-quality video thumbnail in PNG format
π° Pricing
| Resource | Cost | Description |
|---|---|---|
| Actor Usage | $0.00001 | Charged for Actor runtime, proxy and storage. Cost depends on resource consumption during execution |
| Transcript | $0.25 | Charged once per video. Includes AI speech recognition and subtitle generation with timestamps |
| Translation | $0.10 | Charged once per video when translation is requested. Includes AI-powered text translation to target language |
Example Cost Calculation:
-
Transcribing 10 videos without translation
-
Cost: (10 Γ $0.25) = $2.50 + runtime fees
-
Transcribing 10 videos with translation to Spanish
-
Cost: (10 Γ $0.25) + (10 Γ $0.10) = $3.50 + runtime fees
π Why choose this Actor?
Built for content creators, researchers, and automation workflows, this Actor transforms videos into structured, searchable text data with translation capabilities.
| Feature | Video To Text | Rev.ai | Otter.ai | Descript |
|---|---|---|---|---|
| Pricing Model | β Pay per use | β οΈ Per minute | β οΈ Monthly plans | β οΈ Subscription |
| Platforms | β 1000+ sites | β Upload only | β Upload only | β Upload only |
| Languages | β 100+ languages | β οΈ 30+ languages | β οΈ English focus | β οΈ Limited |
| Translation | β Included | β Not supported | β Not supported | β Not supported |
| Timestamps | β Millisecond | β Yes | β Yes | β Yes |
| API Access | β Full API | β Yes | β οΈ Limited | β οΈ Limited |
| Setup Time | β Instant | β οΈ Account req. | β οΈ Account req. | β οΈ Complex setup |
| Min. Cost | β $0.25 | β οΈ $0.02/min | β οΈ $8.33/month | β οΈ $12/month |
π» Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
video_url | string | β Yes | Video URL from any supported platform. Supports YouTube, TikTok, Instagram, Twitter, Facebook, Vimeo, Twitch, and 1000+ other platforms |
translate | string | β No | Target language for translation. Choose from 100+ supported languages (e.g., "english", "chinese", "spanish", "french"). Leave empty to skip translation |
π Supported Languages
The service supports 100+ languages including:
Major Languages: English, Chinese (Simplified/Traditional), Spanish, French, German, Japanese, Korean, Arabic, Hindi, Portuguese, Russian, Italian, Dutch, Swedish, Norwegian, Danish, Finnish, Polish, Czech, Hungarian, Romanian, Bulgarian, Greek, Turkish, Hebrew, Thai, Vietnamese, Indonesian, Malay, Filipino, Swahili
Regional Languages: Including various African, Asian, European, and American indigenous languages
Default Behavior: If translate is not specified or left empty, only the original transcript will be generated without translation
π Example Input
{"video_url": "https://www.youtube.com/watch?v=2TK9tFZoBRg","translate": "spanish"}
π€ Output Structure
| Field | Type | Description |
|---|---|---|
processor | string | URL of the Apify actor that processed this data |
processed_at | string | ISO 8601 timestamp when the data was processed (format: YYYY-MM-DDTHH:MM:SS+00:00) |
platform | string | Source platform identifier (e.g., Youtube, TikTok, Instagram, Twitter, Facebook) |
title | string | Video title |
description | string | Video description/caption text |
author | string | Channel/creator name |
author_id | string | Channel/creator unique identifier |
author_url | string | Channel/creator profile URL |
duration | integer | Video duration in seconds |
audio_title | string | Audio track title (if music metadata is present) |
audio_artist | string | Audio artist name (if music metadata is present) |
view_count | integer | Number of views (platform-dependent availability) |
like_count | integer | Number of likes (platform-dependent availability) |
dislike_count | integer | Number of dislikes (platform-dependent availability) |
shares_count | integer | Number of shares/reposts (platform-dependent availability) |
comment_count | integer | Number of comments (platform-dependent availability) |
categories | array | Video categories assigned by the platform |
tags | array | Video tags/keywords |
published_at | string | ISO 8601 timestamp when the video was published (format: YYYY-MM-DDTHH:MM:SS+00:00) |
thumbnail | string | URL to the video thumbnail image (PNG format, stored in Apify key-value store) |
transcript | object | Original transcription data with language detection and time-stamped segments |
transcript.language | string | Detected language name (e.g., "English", "Spanish", "Chinese") |
transcript.text | string | Full transcribed text concatenated from all segments |
transcript.segments | array | Array of time-stamped transcript segments |
transcript.segments[].start | string | Segment start time in HH:MM:SS.mmm format |
transcript.segments[].end | string | Segment end time in HH:MM:SS.mmm format |
transcript.segments[].text | string | Transcribed text for this segment |
translation | object | Translated transcription data (only present if translation was requested) |
translation.language | string | Target language name (e.g., "English", "Spanish", "Chinese") |
translation.text | string | Full translated text concatenated from all segments |
translation.segments | array | Array of time-stamped translation segments |
translation.segments[].start | string | Segment start time in HH:MM:SS.mmm format (matches original transcript timing) |
translation.segments[].end | string | Segment end time in HH:MM:SS.mmm format (matches original transcript timing) |
translation.segments[].text | string | Translated text for this segment |
π€ Example Output
{"processor": "https://apify.com/cheapget/video-to-text?fpr=aiagentapi","processed_at": "2024-01-15T10:30:00+00:00","platform": "Youtube","title": "Introduction to Machine Learning","description": "Learn the basics of machine learning in this comprehensive tutorial...","author": "Tech Education Channel","author_id": "UC123456789","author_url": "https://www.youtube.com/channel/UC123456789","duration": 180,"audio_title": null,"audio_artist": null,"view_count": 1000000,"like_count": 50000,"dislike_count": 100,"shares_count": 5000,"comment_count": 2500,"categories": ["Education", "Technology"],"tags": ["machine learning", "tutorial", "AI", "python"],"published_at": "2024-01-01T00:00:00+00:00","thumbnail": "https://api.apify.com/v2/key-value-stores/abc123/records/video_thumbnail.png","transcript": {"language": "English","text": " Welcome to this tutorial on machine learning. Today we'll explore the fundamental concepts. Machine learning is a subset of artificial intelligence.","segments": [{"start": "00:00:00.000","end": "00:00:05.000","text": "Welcome to this tutorial on machine learning."},{"start": "00:00:05.000","end": "00:00:10.000","text": "Today we'll explore the fundamental concepts."},{"start": "00:00:10.000","end": "00:00:15.000","text": "Machine learning is a subset of artificial intelligence."}]},"translation": {"language": "Spanish","text": " Bienvenido a este tutorial sobre aprendizaje automΓ‘tico. Hoy exploraremos los conceptos fundamentales. El aprendizaje automΓ‘tico es un subconjunto de la inteligencia artificial.","segments": [{"start": "00:00:00.000","end": "00:00:05.000","text": "Bienvenido a este tutorial sobre aprendizaje automΓ‘tico."},{"start": "00:00:05.000","end": "00:00:10.000","text": "Hoy exploraremos los conceptos fundamentales."},{"start": "00:00:10.000","end": "00:00:15.000","text": "El aprendizaje automΓ‘tico es un subconjunto de la inteligencia artificial."}]}}
π Integrations
Seamlessly connect this actor to your existing pipelines via the Apify API.
βοΈ Make.com Integration
Get Started with Make.com (1000 Free Credits) π
βββββββββββββββββββββββββββββββββββββββββββββββ Step 1: Configure Actor Module ββ ββ Add Module: "Run an Actor" ββ ββ Enable Map: Toggle ON ββ ββ Actor ID: E9f5oS7cOn2Kgw0uy ββ ββ Refresh: Click Refresh button ββ ββ Input JSON: Add video URL and language βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Step 2: Set Execution Mode ββ ββ Run synchronously: YES βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Step 3: Retrieve Results ββ ββ Add Module: "Get Dataset Items" ββ ββ Dataset ID: defaultDatasetId βββββββββββββββββββββββββββββββββββββββββββββββ
π± N8N.io Integration
Open Source Workflow Automation β‘
βββββββββββββββββββββββββββββββββββββββββββββββ Step 1: Add Apify Node ββ ββ Search: "Run an Actor and get dataset" ββ ββ Category: Apify βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Step 2: Configure Actor ββ ββ Selection Mode: By ID ββ ββ Actor ID: E9f5oS7cOn2Kgw0uy ββ ββ Paste from Actor ID section above βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Step 3: Set Input Parameters ββ ββ Modify Input JSON with video URL βββββββββββββββββββββββββββββββββββββββββββββββ
π API Documentation
- MCP API - Model Context Protocol integration
- Python API - Complete Python client documentation with examples
- JavaScript API - Node.js and browser integration guide
ποΈ Metadata for Developers (JSON-LD)
{"@context": "https://schema.org","@type": "SoftwareApplication","name": "Video To Text - AI Transcription and Translation","alternateName": ["Video Transcription Service","Speech to Text Converter","Video Translation Tool","AI Subtitle Generator"],"applicationCategory": "DeveloperApplication","applicationSubCategory": "AI Transcription and Translation","operatingSystem": "Cloud","offers": {"@type": "Offer","price": "0.25","priceCurrency": "USD","priceValidUntil": "2099-12-31","availability": "https://schema.org/InStock"},"description": "Transform any video into accurate text transcripts with AI-powered speech recognition. Supports 1000+ platforms, automatic language detection, time-stamped segments, and translation to 100+ languages. Perfect for content creators, researchers, and automation workflows.","featureList": ["Automatic language detection from 100+ languages","Time-stamped segments with millisecond precision","Translation to 100+ target languages","Support for 1000+ video platforms","Parallel processing for maximum speed","Comprehensive video metadata extraction","High-quality thumbnail extraction","Export to JSON format","API integration ready for automation"],"keywords": "video transcription, speech to text, AI transcription, video translation, subtitle generator, automatic transcription, multilingual transcription, video to text converter, speech recognition API, content transcription, video accessibility, subtitle automation, transcript generation, language translation, video processing, audio transcription, AI speech recognition, automated subtitles, video analytics, content localization","aggregateRating": {"@type": "AggregateRating","ratingValue": "4.9","ratingCount": "500","bestRating": "5"},"author": {"@type": "Organization","name": "cheapget","url": "https://apify.com/cheapget"},"softwareVersion": "0.1","datePublished": "2024-01-01","dateModified": "2024-01-15"}
π Performance Tips
Optimize your transcription runs for speed, cost, and data quality with these best practices:
π° Cost Optimization
- Test First: Start with 1-2 videos to verify output quality before processing large batches
- Skip Translation: Leave
translateempty if you only need the original transcript to save $0.10 per video - Batch Processing: Process multiple videos in parallel to maximize efficiency and reduce overall runtime costs
β‘ Speed Optimization
- Increase Memory: For videos longer than 15 minutes, increase the Memory setting in Run options (4096 MB or 8192 MB recommended)
- Parallel Processing: The Actor automatically uses parallel processing with multiple proxies for faster downloads
- Short Videos: Videos under 5 minutes process fastest, typically completing in 30-60 seconds
π‘οΈ Data Quality Tips
- Clear Audio: Best results with clear speech and minimal background noise
- Public Videos: Only publicly accessible videos can be processed
- Supported Formats: All major video formats are supported through automatic audio extraction
- Language Detection: Automatic language detection works best with at least 10 seconds of clear speech
π Best Practices
- Retry Logic: The Actor automatically retries transcription up to 8 times for reliability
- Proxy Rotation: Multiple proxies are used automatically to ensure successful downloads
- Error Handling: Failed downloads or transcriptions will return clear error messages
- Time Accuracy: Timestamps are accurate to milliseconds for precise subtitle generation
β FAQ
What video platforms are supported?
We support 1000+ platforms including YouTube, TikTok, Instagram, Twitter, Facebook, Vimeo, Twitch, LinkedIn, Pinterest, Snapchat, and many more through our advanced extraction engine.
How long does processing take?
Processing time depends on video length and whether translation is requested:
- Short videos (0-5 min): 30-60 seconds for transcription only
- Medium videos (5-15 min): 1-3 minutes for transcription only
- Long videos (15+ min): 3-10 minutes for transcription only
- With translation: Add 30-60 seconds for parallel translation processing
Can I transcribe videos in any language?
Yes! The Actor supports 100+ languages with automatic detection. The AI model can accurately transcribe speech in all major world languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, Hindi, and many more.
What if the video has no speech?
If the video contains no speech or only background noise, the transcription will fail with an error message: "Transcript generation failed. The video may contain no speech or have excessive background noise."
How does translation work?
Translation uses AI Translator with parallel processing. The original transcript is first generated with time stamps, then each segment is translated independently while preserving the original timing. This ensures accurate translations while maintaining synchronization with the video.
What output formats are available?
The Actor outputs data in JSON format by default. You can export results to CSV or Excel formats using Apify's dataset export features. Time-stamped segments make it easy to generate SRT or VTT subtitle files.
How can I speed up processing for large videos?
For faster processing of large videos, increase the Memory setting in Run options. Higher memory allocation (4096 MB or 8192 MB) will significantly improve processing speed for videos longer than 15 minutes.
Are private videos supported?
No, only publicly accessible videos can be processed. The Actor cannot access videos that require authentication or are restricted to specific users.
π·οΈ Video To Text
π₯ Search Terms: video transcription API, speech to text converter, AI video transcription, video to text service, automatic subtitle generator, video translation API, multilingual video processing, audio transcription service, video content analysis, speech recognition API, video subtitle generator, AI transcription tool, video accessibility service, content localization API, video SEO optimization, automated transcription service, video metadata extraction, cross-platform video processing, enterprise video transcription, video content management API, youtube transcription, tiktok transcription, instagram transcription, video analytics, subtitle automation, transcript generation, language translation
πΌ Use Case: video-accessibility multilingual-content automated-subtitles content-localization video-seo-optimization enterprise-transcription educational-content media-production content-marketing video-analytics cross-platform-processing speech-analysis content-management video-archiving translation-services accessibility-compliance content-strategy video-workflow-automation multimedia-processing digital-content-optimization research-analysis market-research social-media-monitoring content-creation podcast-transcription interview-transcription lecture-transcription meeting-transcription
βοΈ Legal & Compliance
This actor extracts publicly available data only. It does not bypass authentication, access private content, or violate platform terms of service. You are responsible for:
- Data Rights: Ensuring you have permission to collect and use the extracted data
- Privacy Compliance: Adhering to GDPR, CCPA, and other applicable privacy laws when processing data
- Platform Terms: Respecting the platform's terms of service and usage policies
- Ethical Use: Using extracted data responsibly and in compliance with applicable laws
π Related Actors
- Best Job Search - Aggregates job listings from LinkedIn, Indeed, Glassdoor, ZipRecruiter, and regional platforms. Automatically selects optimal platforms based on target country across 60+ regions.
- Glassdoor Job Search - Extracts crowd-sourced salary ranges, company ratings, employee review counts, and workplace culture data unique to Glassdoor's platform.
- Indeed Job Search - Scrapes job postings with salary disclosure data, full descriptions, and company profiles from Indeed's aggregated listings across 60+ countries.
- LinkedIn Job Search - Captures applicant counts, company growth indicators, skills taxonomy, and hiring team visibility specific to LinkedIn's professional network.
- Best Video Downloader - Downloads videos in 4K/HD/SD quality from 1000+ platforms including YouTube, TikTok, Instagram, and Twitter. Extracts metadata, comment threads, and engagement statistics.
- TikTok Video Downloader - Downloads watermark-free TikTok videos with quality selection. Captures hashtag trends, audio track details, creator profiles, and viral metrics.
- Youtube Video Downloader - Downloads YouTube videos with selectable quality. Extracts video metadata, comment sections, thumbnail images, and channel statistics.
- TikTok Live Recorder - Records TikTok live streams with real-time viewer count tracking, streamer profile data, and engagement metrics during broadcast.
- TikTok Video Profile - Extracts 50+ data points per TikTok video including metadata, engagement statistics, nested comment threads, and creator information.
- Video To Text - Transcribes videos from 1000+ platforms using AI. Detects language automatically, generates time-stamped segments, and translates to 100+ languages.
- Instagram To Text - Transcribes Instagram videos with automatic language detection and multi-language translation capabilities.
- Social Media Marketing - Generates 864 unique variations from a single video using AI. Creates platform-specific content across 12 platforms, 12 writing tones, and 6 AI models with styled images.
- Reddit User Profile - Analyzes Reddit user activity with forensic timeline reconstruction, karma distribution, influence patterns, and moderator role identification.
- Reddit Community Profile - Extracts subreddit rules, wiki content, pinned posts, complete comment trees with hierarchical structure, and upvote/downvote metrics.
- Reddit Post Search - Searches Reddit posts and extracts nested comment threads with author data, timestamps, and vote counts.
- Telegram Group Member - Extracts member profiles from Telegram groups. Offers standard mode for public groups and deep search mode for discovering hidden members and historical data.
- Telegram Channel Message - Scrapes Telegram channel messages with media downloads. Captures view counts, reply threads, forward chains, and reaction data.
- Telegram Profile - Batch extracts Telegram profiles for users, bots, groups, and channels using MTProto. Retrieves verification status, premium indicators, and privacy settings.
- Google Business Profile - Extracts Google Business listings from Maps including business details, customer reviews, star ratings, photos, and geographic coordinates.
- X Community Profile - Scrapes Twitter/X community profiles with follower statistics, engagement metrics, and member activity data.

