Pricing

Pay per event

Try for free

Go to Apify Store

Audio Transcriber

Try for free

Automates audio transcription from multiple sources (files or links). Normalizes input format to ensure optimal processing. Generates word-for-word transcriptions maintaining references to source audio, perfect for datasets requiring traceability and regulatory compliance.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

🎤 Audio Transcriber

🚀 Convert speech to text in seconds. Upload audio files and get accurate transcriptions. Supports multiple languages. No coding, no transcription accounts required.

🕒 Last updated: 2026-05-08 · 🌐 Multi-language · 🎧 Any audio format · 📝 Full transcription · 🚫 No auth required

Convert audio recordings to clean, structured text without juggling transcription tools or paying per-minute fees. The Actor accepts one or more audio file URLs (MP3, WAV, AIFF, AAC, OGG, FLAC, M4A and similar), runs each through an AI transcription pipeline, and returns the full transcript in your dataset. Built for podcasters, journalists, researchers, meeting teams, and any workflow that turns spoken audio into searchable text.

The output is a structured record per file: a back-reference to the input URL, the full transcription, a timestamp, and an error field if something fails. Hand the dataset off to your editor, summarizer, or downstream pipeline. Every run is processed live, so there is no upload cap or vendor lock-in.

👥 Built for	🎯 Primary use cases
Podcasters and creators	Generate episode transcripts and show notes
Journalists and researchers	Convert recorded interviews into searchable text
Meeting and operations teams	Auto-transcribe Zoom and Teams recordings
Content marketing	Repurpose webinars into blog posts and shorts
Accessibility teams	Produce captions and transcripts for compliance
Localization workflows	Get base text ready for translation pipelines

📋 What the Audio Transcriber does

🎧 Audio input. Accepts one or more audio file URLs in common formats (MP3, WAV, AIFF, AAC, OGG, FLAC, M4A).
🌐 Language hint. Pass an ISO 639-1 language code (e.g. en, es, fr, pt) to bias the model toward the right phonetics and vocabulary.
📝 Full transcription. Returns the complete text of each audio file as a single string per record.
🆔 Back-reference. Every record includes the original audio URL so you can rejoin transcripts to source files.
⏱️ Timestamp. Every record carries a timestamp field with the time the transcript was produced.
❗ Per-file error reporting. If a file fails (corrupt, unsupported, unreadable URL) the error appears on its own record without breaking the run.

The actor processes uploads in the order you provide them. Records stream into the dataset as transcripts complete, so you can start consuming results before the run is fully finished. Manual transcription typically takes 4-6 hours per hour of audio; this Actor returns the same text in minutes.

💡 Why it matters: spoken audio is everywhere (podcasts, interviews, meetings) but most data tooling is text-first. A reliable speech-to-text step unlocks search, summarization, translation, and analytics workflows that would otherwise be impossible.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing audio upload, a live run, and how to feed the transcript into a summarization workflow.

⚙️ Input

Field	Type	Name	Description
`audioFileUrl`	array of strings	Audio File URL	Required. One or more audio file URLs (MP3, WAV, AIFF, AAC, OGG, FLAC, M4A). Files are processed in the order you provide them.
`language`	string	Language	ISO 639-1 language code (e.g. `en`, `es`, `fr`, `pt`) to guide the model. Leave empty for auto-detect.

Example 1. English podcast episode transcription.

{
  "audioFileUrl": [
    "https://example.com/podcast/episode-12.mp3"
  ],
  "language": "en"
}

Example 2. Batch processing of 3 Spanish-language interviews.

{
  "audioFileUrl": [
    "https://example.com/interview-1.wav",
    "https://example.com/interview-2.wav",
    "https://example.com/interview-3.wav"
  ],
  "language": "es"
}

⚠️ Good to Know: the audio URL must be publicly reachable. If your file lives in a private bucket, generate a signed URL valid for the run's duration before passing it in.

📊 Output

The dataset returns one record per audio file. Each record carries the original URL, the full transcription text, a timestamp, and an optional error message if processing failed. Consume the dataset as JSON, CSV, Excel, XML, or RSS via the Apify console or API.

🧾 Schema

Field	Type	Example
🔗 `audioReference`	string (url)	`https://example.com/podcast/episode-12.mp3`
📝 `transcription`	string	`Welcome back to the show, today we're talking about...`
📅 `timestamp`	ISO datetime	`2026-05-08T12:00:00.000Z`
❗ `error`	string or null	`null`

📦 Sample records

1. Typical record (English podcast)

{
  "audioReference": "https://example.com/podcast/episode-12.mp3",
  "transcription": "Welcome back to the show, today we're talking about how small teams can ship faster without burning out. Our guest today has been building products at venture-backed startups for over a decade and has a lot to share.",
  "timestamp": "2026-05-08T12:00:00.000Z",
  "error": null
}

2. Spanish interview (multilingual hint)

{
  "audioReference": "https://example.com/interview-2.wav",
  "transcription": "Buenos dias, gracias por tomarse el tiempo de hablar conmigo. Mi primera pregunta tiene que ver con como empezo el proyecto y que les motivo a escoger esa direccion en particular.",
  "timestamp": "2026-05-08T12:00:00.000Z",
  "error": null
}

3. Error record (file unreadable)

{
  "audioReference": "https://example.com/broken-link.mp3",
  "transcription": null,
  "timestamp": "2026-05-08T12:00:00.000Z",
  "error": "Could not download audio: HTTP 404"
}

✨ Why choose this Actor

	Capability
🎯	Built for the job. Audio-to-text only, no extra knobs to learn or configure.
🌐	Multi-language. Pass an ISO code for biased decoding or leave empty for auto-detect.
⚡	Fast. Most files transcribe in 1-3 minutes per minute of audio.
🔁	Live processing. Every run runs end to end with no caching of input audio.
🌐	No infra to manage. Apify handles compute, scaling, scheduling, and storage.
🛡️	Reliable. Per-file error reporting means one bad URL does not kill the whole run.
🚫	No code required. Configure in the UI, run from CLI, schedule via cron, or call from any language with the Apify SDK.

📊 Production-grade speech-to-text without the engineering overhead of building and maintaining your own pipeline.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Accuracy	Setup
⭐ Audio Transcriber (this Actor)	$5 free credit, then pay-per-use	Any audio URL	Live per run	High	⚡ 2 min
Manual transcription	Hours of human time	High control	Per file	Highest	🐢 Hours per file
Paid transcription SaaS	$$$ monthly	High	Live	High	⏳ Hours of integration
Self-hosted models	Engineering hours	High once built	Live	Variable	🐢 Days to weeks

Pick this Actor when you want fast, reliable transcription without owning the infrastructure or paying per-minute SaaS fees.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the Audio Transcriber page on the Apify Store.
🎯 Add your audio. Paste one or more audio URLs into audioFileUrl and (optionally) set language.
🚀 Run it. Click Start and let the Actor transcribe each file.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to first transcript: 3-5 minutes for a short clip.

💼 Business use cases

📊 Content and editorial

Generate searchable transcripts for podcasts and webinars
Repurpose audio into blog posts, social clips, and newsletters
Build show notes and chapter markers from spoken word
Index your back-catalog for full-text search

🏢 Operations and meetings

Auto-transcribe internal recordings and standups
Build searchable archives of customer calls
Pull action items from leadership all-hands
Capture compliance evidence from recorded sessions

🎯 Research and journalism

Turn interview recordings into editable text
Speed up qualitative research and coding
Build court-of-public-opinion archives from speeches
Localize source audio for translation pipelines

🛠️ Engineering and product

Add speech-to-text to your product without owning a model
Wire transcripts into AI summarization workflows
Build accessibility features and captions
Prototype voice-driven features quickly

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Empirical datasets for papers, thesis work, and coursework
Longitudinal studies tracking changes across snapshots
Reproducible research with cited, versioned data pulls
Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

Side projects, portfolio demos, and indie app launches
Data visualizations, dashboards, and infographics
Content research for bloggers, YouTubers, and podcasters
Hobbyist collections and personal trackers

🤝 Non-profit and civic

Transparency reporting and accountability projects
Advocacy campaigns backed by public-interest data
Community-run databases for local issues
Investigative journalism on public records

🧪 Experimentation

Prototype AI and machine-learning pipelines with real data
Validate product-market hypotheses before engineering spend
Train small domain-specific models on niche corpora
Test dashboard concepts with live input

🔌 Automating Audio Transcriber

This Actor exposes a REST endpoint, so you can drive it from any language or workflow tool.

Node.js - call it via the Apify JS SDK.
Python - call it via the Apify Python SDK.
REST - hit it directly through the Apify v2 API.

Schedules. Use Apify Scheduler to transcribe a folder of audio URLs on a cron cadence. Combine with webhooks to trigger downstream summarization or translation workflows the moment a transcript is ready.

❓ Frequently Asked Questions

💳 Do I need a paid Apify plan to run this actor?

No. You can start right now on the free Apify plan, which includes $5 in monthly credit. That is enough to run the actor several times and explore the output. Paid plans unlock higher item caps, more concurrent runs, and larger datasets. Create a free Apify account here.

🚨 What happens if my run fails or returns no results?

Failed runs are not charged. If a single audio URL fails, the actor records the error on that record only and continues with the rest of the batch. If the whole run fails, re-run it or open our contact form and we will look into it.

📏 How long can my audio files be?

There is no hard cap. Longer files take proportionally longer to process. We recommend splitting recordings longer than 60 minutes into smaller chunks for faster results and easier downstream editing.

🎼 What audio formats are supported?

MP3, WAV, AIFF, AAC, OGG, FLAC, and M4A are all supported. Pass any public URL pointing to one of these formats. Streams (HLS, DASH) are not supported.

🌐 Which languages does it handle?

Most major languages are supported. Pass an ISO 639-1 code like en, es, fr, de, pt, it, ja, zh to bias the model. Leave the field empty for auto-detect.

🧑‍💻 Can I call this actor from my own code?

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for Node.js and Python. You can start a run, read the dataset, and handle webhooks from your own app in a few lines.

📤 How do I export the data?

Every Apify dataset can be downloaded in one click as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the Apify API or stream into BigQuery, S3, and other destinations through built-in integrations.

📅 Can I schedule the actor to run automatically?

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Drop new audio URLs into the input each cycle, or wire the actor to fire on a webhook from your CMS or recording platform.

🏪 Can I use the data commercially?

Yes. Transcripts of audio you have rights to are yours to use in your own internal pipelines, products, and reports.

💼 Which plan should I pick for production use?

Apify's Starter and Scale plans are designed for production workloads. They give you faster instances, more concurrent runs, and higher quotas. Pick the plan that matches your audio volume and refresh cadence; the in-app pricing calculator will help you size it.

🛠️ Can you add timestamps or speaker labels?

Open the contact form and tell us about your use case. We add features regularly when there is a clear use case behind the request.

⚖️ Is it legal to transcribe audio with this Actor?

Yes, provided you have rights to the audio. You are responsible for compliance with copyright, privacy, and consent laws in your jurisdiction.

🔌 Integrate with any app

Audio Transcriber connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications in your channels
Airbyte - Pipe results into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a transcript completes, like firing a summarization actor or pinging a Slack channel.

🔗 Recommended Actors

🎬 YouTube AI Transcriber - Transcribe YouTube videos via URL with full metadata
🖼️ Auto Video Thumbnail Generator - Auto-generate thumbnails from video uploads
📰 Article Extractor - Extract clean article text from any URL
📄 PDF to JSON Parser - Convert PDFs into structured JSON
🔍 RAG Web Browser - Fetch clean text for AI retrieval pipelines

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

🆘 Need Help? Open our contact form to request a new actor, propose a custom project, or report an issue.

⚠️ Disclaimer. This Actor is an independent tool. The scraper accesses only audio you supply by URL and is intended for legitimate research, productivity, and content workflows. Users are responsible for ensuring they hold the rights to transcribe the audio they submit and for compliance with copyright, privacy, and consent laws in their jurisdiction.

Audio And Video Transcriber (OpenAI GPT-4o-transcribe)

stanvanrooy6/audio-video-transcriber

Downloads videos from public URLs, extracts audio, and transcribes them using OpenAI

Stan Van Rooy

Instagram Reel Data Extractor

husseinbuilds/instagram-content-intelligence-pro

Extract complete data from Instagram Reels — captions, hashtags, views, likes, comments, author info, and AI-generated transcripts. Supports bulk links, no login needed, fast and reliable for creators, analysts, and automation workflows.

Hussein Sbeiti

177

Audio & Video to Text

donjuan_mime/audio-video-to-text

Transcribes video and audio files into plain text and subtitle formats (TXT, SRT, VTT, TSV, JSON) using OpenAI's Whisper model. Supports preloaded tiny, base, and small models.

Donjuan

Video To Text

cheapget/video-to-text

Transcribe videos from 1,000+ platforms to text — auto language detection, timestamps, subtitle file download, and translation to 100+ languages. No file uploads. $0.30 per video.

CheapGET

166

1.9

Instagram AI Transcript Extractor

sian.agency/instagram-ai-transcript-extractor

Instagram Transcript Generator — 🎬 AI Reel Transcription | 🗣️ Speaker Diarization | 🌍 Language Detection | 📊 30+ Metrics | 💰 Best Price. Extract entire channels with word-perfect transcripts and speaker identification. Try 5 reels free!

SIÁN OÜ

1.6K

4.0

Video Transcript

agentx/video-transcript

Universal video-to-text API across YouTube, TikTok, Instagram, X, Facebook, Vimeo and 1000+ platforms. Returns the full transcript as timestamped segments with the source video metadata, optionally translated into 100+ target languages — one endpoint replacing per-platform transcription stacks.

AgentX

607

4.1

Video Transcriber: Instagram, X, Facebook, TikTok

invideoiq/video-transcriber

Retrieves transcripts from online video content from multiple plateforms (Instagram, X, ..) using speech-to-text models. It delivers outputs in JSON and LLM-ready formats, making it ideal for analytics, and AI-based applications. Perfect for research and building intelligent conversational agents

InVideoIQ

536

5.0

Instagram Transcript Scraper

crawlerbros/instagram-transcript-scraper

Extract transcripts from Instagram videos and reels using auto-generated captions or AI-powered speech-to-text. Returns clean, timestamped transcript segments with full video metadata.

Crawler Bros

292

5.0

Instagram reels transcribe

linen_snack/instagram-reels-transcribe

Effortlessly convert any public Instagram reels videos into accurate text, subtitles, or translations with this powerful Google Gemini API actor.

ius iyb

Instagram Reel AI Transcript Extractor

linen_snack/instagram-reel-transcript-ai-extractor

Extract word-perfect transcripts from Instagram Reels with AI-powered sentiment analysis, entity detection, SRT/VTT subtitle export, and full channel scraping. 10 free reels included.