Universal Speech to Text Transcriber avatar

Universal Speech to Text Transcriber

Pricing

Pay per event

Go to Apify Store
Universal Speech to Text Transcriber

Universal Speech to Text Transcriber

Transcribe audio from videos stored on Google Drive, Dropbox, GitHub raw, OneDrive, Sound Cloud, iCloud, AWS S3, GCS, X, Tiktok and much more. Convert share links to direct downloads for fast, accurate transcripts with timestamps and easy API integration.

Pricing

Pay per event

Rating

5.0

(1)

Developer

TicTech

TicTech

Maintained by Community

Actor stats

4

Bookmarked

162

Total users

27

Monthly active users

8 days ago

Last modified

Share

Multi-Provider Audio Transcriber

Extract and transcribe audio from video and audio files across cloud storage, social platforms, and direct media links. Paste a public URL, get a timestamped transcript with automatic language detection.

What You Can Transcribe

Paste one public URL per run from any of the sources below.

Video & audio platforms

  • TikTok — public video links
  • SoundCloud — public tracks and podcasts (single track URLs only, not /sets/ playlists)
  • Vimeo — public videos
  • Twitter / X — public video posts
  • Facebook — public videos (facebook.com, fb.watch)
  • Twitch — public clips and VODs
  • Reddit — public video posts
  • Dailymotion — public videos
  • LinkedIn — public video posts

Many other public video/audio URLs are attempted automatically when a link looks downloadable.

Cloud storage

  • Google Drive, Dropbox, GitHub Raw, OneDrive / SharePoint, Box, iCloud Drive
  • AWS S3, Google Cloud Storage, Azure Blob, Backblaze B2

Share links are converted to direct downloads automatically where needed.

  • Public URLs that return the actual file (not an HTML page)
  • CDN links — CloudFront, Akamai, custom CDNs
  • Direct file URLs.mp4, .mp3, .wav, .m4a, and other common formats
  • Presigned or signed URLs that point straight to media
{ "start_urls": "https://soundcloud.com/artist/track-name" }
{ "start_urls": "https://cdn.example.com/podcast/episode.mp3" }

Note: Only publicly accessible content is supported. Private, login-required, geo-blocked, or age-restricted media may fail.

Smart processing: Detects file type automatically — extracts audio from video or transcribes audio files directly.

Why Use This Tool?

  • Automate tedious work: No more manual transcription—get results in minutes, not hours.
  • Unlock hidden insights: Search, analyze, and repurpose your video and audio content for new opportunities.
  • Boost accessibility: Make your content inclusive for all audiences with accurate transcripts.
  • Enhance discoverability: Improve SEO and compliance with rich, timestamped text.
  • Stay ahead: Track trends, monitor competitors, and optimize your content strategy with ease.

Why Choose This Tool?

  • 🔄 Multi-Provider Support: Cloud storage, social/video platforms, and direct CDN/media links
  • 🎯 Dual File Processing: Handle both video and audio files with a single tool - no need for separate solutions
  • ⚡ Smart Automation: Automatic link conversion, file type detection, and processing optimization
  • 🛡️ Enterprise Reliability: Built-in retry logic, memory optimization, and robust error handling
  • 🌍 Language Intelligence: Automatic language detection across 20+ languages
  • 📊 Rich Output: Timestamped transcripts with metadata for comprehensive content analysis
  • 💾 Memory Efficient: Optimized for cloud environments with automatic cleanup and resource management

Features

  • Multi-Provider Support: Cloud storage, video platforms (TikTok, Vimeo, etc.), and direct CDN links
  • Dual File Support: Handles both video and audio files seamlessly
  • Smart Processing: Automatically detects file type and processes accordingly
  • Automatic Link Conversion: Converts share links to direct download URLs automatically
  • Smart Download Strategy: Platform-specific optimization for each provider
  • Retry Logic: Robust download with automatic retry mechanisms
  • Outputs transcript with timestamps and detected language
  • Detects and outputs the dominant spoken language (see supported languages below)
  • Memory-optimized processing with automatic cleanup
  • File Size Limits: Maximum 3GB per file to ensure reliable processing and cost efficiency
  • Memory Requirements: Larger files require increased Apify run memory allocation (see Usage section for details)

Pricing

Base + Per-Second Model

This Actor uses a pay-per-event pricing model with a base charge plus per-second transcription:

Pricing Structure

  • actor start: $0.01 per Actor run (actor start cost)
  • transcription-second: $0.0025 per second of audio transcribed

Key Features

  • Actor start cost - $0.01 charged once per run
  • Per-second transcription - additional cost based on actual audio duration
  • Rounded up to the nearest second for transcription duration
  • No hidden fees or additional charges
  • Automatic event tracking by the Apify platform

Pricing Examples

Successful Transcription (30-second file)

  • actor start: $0.01
  • transcription-second × 30: $0.075
  • Total: $0.085

Short Video (2 seconds)

  • actor start: $0.01
  • transcription-second × 2: $0.005
  • Total: $0.015

Failed Transcription

  • actor start: $0.01
  • Total: $0.01

Long Video (5 minutes)

  • actor start: $0.01
  • transcription-second × 300: $0.75
  • Total: $0.76

Why This Pricing Model?

  • Fair: Base charge covers setup costs, per-second covers actual processing
  • Transparent: Clear base + usage pricing structure
  • Predictable: Easy to estimate costs before running
  • Cost-effective: Only pay for actual audio duration beyond base charge
  • Simple: Just two event types to track
  • Attempt events: Minimal cost for failed operations

Note: Pricing is based on actual processing events, so you only pay for what the Actor does. No hidden fees or minimum charges.

Supported Sources (reference)

The list below mirrors What You Can Transcribe above with a few extra format notes.

Cloud Storage

  • Google Drive – share links auto-converted to direct downloads
  • Dropbox – share links auto-converted to direct downloads
  • GitHub Raw – direct links to files in repositories (/raw/ URLs)
  • OneDrive & SharePoint – share links auto-converted to direct downloads
  • Box – share links auto-converted to direct downloads
  • iCloud Drive – share links auto-converted to direct downloads
  • AWS S3 – public buckets, objects, and presigned URLs
  • Google Cloud Storage – public objects and signed URLs
  • Azure Blob Storage – public blobs and SAS URLs
  • Backblaze B2 – public files and download URLs

Video & Social Platforms

Public posts and videos from these platforms (via yt-dlp):

  • TikToktiktok.com, vm.tiktok.com
  • Vimeo – public videos
  • Twitter / X – public video posts
  • Facebook – public videos (facebook.com, fb.watch)
  • Twitch – public clips and VODs
  • Reddit – public video posts
  • Dailymotion – public videos
  • SoundCloud – public single tracks (not /sets/ playlists)
  • LinkedIn – public video posts

Note: Only publicly accessible content is supported. Private, login-required, geo-blocked, or age-restricted media may fail.

Any direct file URL that returns the actual audio or video file over HTTP(S):

  • CDN links (e.g. CloudFront, Akamai, custom CDNs)
  • Signed or presigned URLs that point directly to .mp4, .mp3, .wav, etc.
  • Public hosting URLs where the response is the media file (not an HTML page)

Examples:

{ "start_urls": "https://cdn.example.com/media/episode.mp3" }
{ "start_urls": "https://d111111abcdef8.cloudfront.net/video.mp4" }

Supported File Types

Video Files

  • MP4, AVI, MOV, WMV, FLV, WebM, MKV, 3GP, and other common video formats
  • Audio extraction: Automatically extracts audio track for transcription
  • Optimized audio extraction: 16kHz mono output with 64k bitrate for minimal file size
  • Format detection: Automatically detects video format and handles accordingly

Audio Files

  • MP3, WAV, FLAC, M4A, AAC, OGG, WMA, AIFF, and other common audio formats
  • Direct transcription: Transcribes audio files without conversion
  • High-quality processing: Maintains original audio quality during transcription
  • Format detection: Automatically identifies audio format and handles accordingly
  • Longer duration support: Audio files are much smaller than video files, allowing transcription of longer content within the 3GB limit

Note: This tool intelligently processes both video and audio files. For videos, it extracts the audio track before transcription. For audio files, it transcribes directly for maximum efficiency.

💡 Audio vs Video: A 1-hour audio file (MP3) is typically 50-100MB, while a 1-hour video file can be 1-5GB. Use audio files for longer content to maximize transcription duration within the file size limit!

Use Cases

  • Content Creators: Generate captions, repurpose content, and analyze performance.
  • Marketers: Track brand mentions, analyze trends, and optimize strategy.
  • Researchers: Study video and audio content and extract insights from public discourse.
  • Businesses: Create training materials, monitor feedback, and maintain compliance records.
  • Developers: Transcribe documentation videos, tutorials, and demos stored in cloud platforms.
  • Podcasters: Transcribe audio episodes for accessibility and SEO.
  • Educators: Convert lecture recordings to searchable text.

Supported Languages

  • Spanish: es
  • English: en
  • Hindi: hi
  • Japanese: ja
  • Russian: ru
  • Ukrainian: uk
  • Swedish: sv
  • Chinese: zh
  • Portuguese: pt
  • Dutch: nl
  • Turkish: tr
  • French: fr
  • German: de
  • Indonesian: id
  • Korean: ko
  • Italian: it

Input Examples

Basic Input

{
"start_urls": "https://drive.google.com/file/d/example/view"
}

Provider-Specific Examples

Google Drive

{
"start_urls": "https://drive.google.com/file/d/1ABC123DEF456/view"
}

Dropbox

{
"start_urls": "https://www.dropbox.com/s/abc123def456/video.mp4?dl=0"
}

GitHub Raw

{
"start_urls": "https://raw.githubusercontent.com/username/repo/main/video.mp4"
}

OneDrive

{
"start_urls": "https://1drv.ms/v/s!ABC123DEF456"
}

Box

{
"start_urls": "https://app.box.com/s/abc123def456"
}

AWS S3

{
"start_urls": "https://bucket-name.s3.amazonaws.com/video.mp4"
}

TikTok

{
"start_urls": "https://www.tiktok.com/@user/video/VIDEO_ID"
}

Direct CDN / media file

{
"start_urls": "https://cdn.example.com/files/podcast.mp3"
}

Output Example (Success)

{
"sourceUrl": "https://drive.google.com/file/d/example/view",
"videoId": "unique-video-id",
"status": "success",
"durationSec": 45.2,
"transcript": "[0.00s - 2.50s] Welcome to my channel! [2.50s - 8.10s] Today I'm going to show you how to make the perfect pasta dish.",
"detected_language": "en",
"timestamp": "2024-01-01T12:00:00.000Z"
}

Output Example (Unsupported URL)

{
"sourceUrl": "https://example.com/not-a-media-link",
"videoId": "unique-video-id",
"status": "success",
"transcript": "The input link is not supported by this Actor. Please review the README for supported cloud storage services, video platforms, and direct media/CDN link formats, then submit a publicly accessible URL.",
"durationSec": 0,
"timestamp": "2024-01-01T12:00:00.000Z"
}

Output Fields

  • sourceUrl: Original file URL
  • videoId: Unique identifier for the file
  • status: "success" or "failed"
  • durationSec: File duration in seconds (only for success)
  • transcript: Transcription with timestamps (only for success) or error message (for failed outputs)
  • detected_language: BCP-47 code for the detected language (only for success)
  • timestamp: Processing time

Usage

  1. Provide a single URL – cloud storage, video platform, or direct media/CDN link
  2. Ensure the file is publicly accessible
  3. Check file size - maximum 3GB per file
  4. Set appropriate Apify run memory - larger files require more memory (see Memory Requirements below)
  5. Generate transcript with automatic language detection
  6. Receive timestamped transcript in JSON format

Note: Only one URL can be processed per actor run. For multiple files, run the actor separately for each URL.

Memory Requirements

Important: Larger files require more memory allocation. Please adjust the Apify platform's run memory setting based on your file size:

  • Small files (< 100MB): 128MB - 512MB memory is sufficient
  • Medium files (100MB - 1GB): 512MB - 2GB memory recommended
  • Large files (1GB - 2GB): 2GB - 4GB memory recommended
  • Very large files (2GB - 3GB): 4GB - 8GB memory required

⚠️ Memory Warning: If you encounter memory errors or timeouts when processing larger files, increase the Apify run memory setting in the actor configuration. The actor supports up to 8GB of memory. Bigger files use proportionally more memory during download, audio extraction, and transcription.

💡 Pro Tip: Audio files are much smaller than video files! If you have long audio content to transcribe, consider using audio files (MP3, WAV, etc.) instead of video files. This allows you to transcribe much longer durations while staying within the 3GB file size limit.

Technical Details

  • Unified Processing: Single interface handles all supported providers and file types
  • Intelligent Detection: Automatically identifies cloud storage provider and file format
  • Smart Link Conversion: Converts share links to direct download URLs for maximum compatibility
  • Robust Download: Built-in retry logic and error handling for reliable file processing
  • Optimized Processing:
    • Videos: Efficient audio extraction with minimal quality loss
    • Audio: Direct transcription without unnecessary conversion
  • Quality Assurance: Automatic duration validation and file size limits
  • Memory Management: Optimized for cloud environments with automatic cleanup
  • Security: No persistent data storage - files are processed and immediately deleted

Memory Optimization Features

  • Efficient Processing: Optimized for cloud environments with automatic cleanup
  • Chunked downloads (8KB chunks) to minimize memory usage
  • Automatic file cleanup after processing
  • Garbage collection at key processing stages
  • Memory monitoring throughout the process
  • Optimized audio extraction with minimal quality loss
  • Retry logic prevents memory waste from failed downloads

⚠️ Important: While the actor is optimized for memory efficiency, processing larger files (especially videos over 1GB) requires adequate memory allocation. Always increase the Apify run memory setting in the actor configuration when processing larger files to avoid out-of-memory errors. See the Memory Requirements section above for recommended memory allocations based on file size.

Example INPUT.json

{
"start_urls": "https://drive.google.com/file/d/1ABC123DEF456/view"
}

Supported URL formats:

  • Google Drive: https://drive.google.com/file/d/FILE_ID/view
  • Dropbox: https://www.dropbox.com/s/FILE_ID/filename.mp4?dl=0
  • GitHub Raw: https://raw.githubusercontent.com/user/repo/main/video.mp4
  • OneDrive: https://1drv.ms/v/s!FILE_ID
  • Box: https://app.box.com/s/FILE_ID
  • TikTok: https://www.tiktok.com/@user/video/VIDEO_ID
  • Direct CDN: https://cdn.example.com/media/file.mp3

For more details, contact the maintainer or email us at contact@tictech.id