Transcribe Zoom Meeting to Text — Bulk Meeting Transcription avatar

Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

Pricing

from $0.15 / 1,000 audio second processeds

Go to Apify Store
Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

Transcribe Zoom recordings to text in bulk. Speaker labels for host and participants, word-level timestamps, SRT/VTT export. 99+ languages. Try free.

Pricing

from $0.15 / 1,000 audio second processeds

Rating

0.0

(0)

Developer

SIÁN OÜ

SIÁN OÜ

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

SIÁN Agency Store Telegram Support Instagram AI Transcript Extractor Best TikTok AI Transcript Extractor YouTube Shorts AI Transcript Extractor Facebook AI Transcript Extractor

Transcribe Zoom recordings to text in bulk. Plug into n8n, Zapier, or Make to auto-transcribe every cloud recording the moment it lands — no Otter subscription, no manual upload. Speaker labels for host and participants, word-level timestamps, SRT/VTT subtitles, 99+ languages.


How to transcribe a Zoom recording in 4 steps

  1. Drop in Zoom recording URLs — pull URLs from the Zoom Cloud Recording API (/v2/users/{userId}/recordings) and feed them straight into the audioUrls field. Or upload local .mp4/.m4a exports via audioFiles.
  2. Pick your options — auto-detect language, toggle speaker diarization to separate host from participants, optionally translate non-English meetings to English.
  3. Run the actor — recordings process 10 at a time in parallel on the paid tier; an entire team's weekly recordings can be transcribed in minutes.
  4. Download results — every recording lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: MP4, MOV, M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, WebM. Max 1 GB per file on the paid tier.


Example output — Zoom meeting transcript with speaker labels

{
"transcript": "Thanks for joining everyone — let's start with the launch checklist. So Sarah, where are we on the email sequence?...",
"detected_language": "en",
"duration": 1842.5,
"segments": [
{
"id": 0,
"text": "Thanks for joining everyone — let's start with the launch checklist.",
"start": 0.18,
"end": 4.92,
"speaker": "SPEAKER_00",
"language": "en",
"words": [
{ "word": "Thanks", "start": 0.18, "end": 0.62, "speaker": "SPEAKER_00" }
]
},
{
"id": 1,
"text": "So Sarah, where are we on the email sequence?",
"start": 5.40,
"end": 8.12,
"speaker": "SPEAKER_00",
"language": "en",
"words": []
}
],
"srt": "1\n00:00:00,180 --> 00:00:04,920\nThanks for joining everyone — let's start with the launch checklist.\n\n2\n00:00:05,400 --> 00:00:08,120\nSo Sarah, where are we on the email sequence?",
"vtt": "WEBVTT\n\n00:00:00.180 --> 00:00:04.920\nThanks for joining everyone — let's start with the launch checklist.\n\n00:00:05.400 --> 00:00:08.120\nSo Sarah, where are we on the email sequence?",
"speakers": ["SPEAKER_00", "SPEAKER_01", "SPEAKER_02"],
"languages": ["en"],
"fileSizeMB": 84.3,
"success": true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, recording duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.


Built for sales, support, and ops teams

  • 📞 Sales call coaching — review every demo, identify objection-handling gaps, share quote-level highlights
  • 🎧 Customer support QA — sample escalations, score CSAT-affecting moments, train new agents
  • 🗂️ Meeting archives — searchable transcripts of every quarterly review, planning meeting, and stakeholder sync
  • ⚖️ Compliance review — tamper-evident transcripts of regulated conversations
  • 🤝 Sales hand-off notes — auto-generate transcripts from discovery calls so AEs and CSMs share the same source of truth

The killer combo: n8n / Zapier trigger on Zoom Cloud Recording → this actor → your Notion/Slack/CRM. Zero manual steps.


Speaker diarization for host and participants

Toggle the Speaker Diarization input to separate every speaker in the meeting. Each segment and each word receives a speaker label (SPEAKER_00, SPEAKER_01, …), so you can build clean transcripts where the host's questions and each participant's answers are clearly distinguished. Powered by pyannote-audio. Charged per audio second; only billed when enabled.


SRT / VTT export for internal video archives

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

  • Caption an internal video library (Notion, Confluence, SharePoint, Drive)
  • Build a searchable training-material archive for new hire onboarding
  • Add HTML5 <track> accessibility captions to embedded recordings

Set Timestamp Granularities to word for cue precision down to individual words.


Why teams choose this Zoom transcriber

  • API-first integration — built to plug into n8n, Zapier, Make, or your own webhook on the Zoom Cloud Recording event
  • 🎤 Host vs participant separation via pyannote-audio diarization — clean attributed transcripts for sales coaching and QA
  • ⏱️ Word-level timestamps for every word — find any quote in a 90-minute call in seconds
  • 🎬 SRT and VTT included on every successful run — caption internal training videos with no extra step
  • 🌍 99+ languages — global teams, multilingual customers, EU offices all supported
  • 🇪🇺 EU-region processing for GDPR-aligned workflows
  • 💰 Pay per audio second — no Otter subscription, no minimums; only pay for the recordings you actually transcribe
  • 🚀 10× parallel on the paid tier — an entire team's weekly recordings done in minutes, not hours

Use cases

  • 📞 Sales teams auto-transcribing every demo and discovery call for coaching, deal review, and CRM enrichment
  • 🎧 Customer support leaders sampling escalation calls for QA scoring and agent training
  • 🗂️ Operations teams archiving every all-hands, planning meeting, and quarterly review as a searchable text record
  • ⚖️ Compliance officers maintaining tamper-evident transcripts of regulated conversations (finance, healthcare, legal)
  • 🤝 Account managers generating clean handoff notes between SDR → AE → CSM stages
  • 🌐 Localization teams translating multilingual customer calls to English for internal review
  • 🧪 Product researchers transcribing user interviews and usability sessions for thematic coding

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tierPAID tier
Perfect for testing and small jobsBuilt for production volume
Up to 5 recordings per runUnlimited recordings per run
50 MB max per file1 GB max per file
200 MB / 20 minutes monthlyUnlimited monthly volume
3 concurrent files10 concurrent files (10× parallel)
No credit card required$0.0005 per audio second

Optional add-ons (only billed when enabled):

FeaturePrice
Speaker diarization$0.0001 per audio second
Translate to English$0.0003 per audio second
EU-region processing$0.0007 per audio second (replaces base $0.0005)

A 60-minute meeting with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). Compare to per-seat meeting-transcription SaaS pricing.


Integration examples

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('sian.agency/transcribe-zoom-meeting-to-text').call({
audioUrls: ['https://your-org.zoom.us/recording/share/...'],
speakerDiarization: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('sian.agency/transcribe-zoom-meeting-to-text').call(run_input={
'audioUrls': ['https://your-org.zoom.us/recording/share/...'],
'speakerDiarization': True,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-zoom-meeting-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"audioUrls": ["https://your-org.zoom.us/recording/share/..."],
"speakerDiarization": true
}'

n8n / Zapier / Make

Wire this actor onto the Zoom Cloud Recording Completed webhook. Pass the recording URL into audioUrls, capture transcript, segments[].words[], srt, and vtt from the dataset, then route to Notion (meeting notes), Slack (digest), Salesforce/HubSpot (call attached to deal), or Google Sheets (compliance log) — no transformation step needed.


FAQ

How accurate is Zoom meeting transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on Zoom audio with clean microphones, lower on speakerphone or noisy environments. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? MP4, MOV, M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, WebM. Zoom Cloud Recordings export as MP4 or M4A, both work directly. Max 50 MB per file on the free tier, 1 GB on the paid tier.

Can I transcribe non-English meetings? Yes — auto-detection across 99+ languages including Spanish, French, German, Mandarin, Japanese, Portuguese, Arabic, Hindi, and many more. Toggle Translate to English to receive an English transcript alongside the timestamps.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00, SPEAKER_01, etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio, plus optional add-ons for diarization, translation, and EU processing. No subscriptions, no per-seat fees.

Can I integrate this with n8n, Zapier, or Make? Yes — that's the primary use case. The actor exposes a standard Apify run/dataset API. Trigger on the Zoom Cloud Recording Completed event, run the actor, route the dataset record into your downstream tools.

Does this work with Zoom cloud recordings or only local files? Both. Pass cloud recording URLs from the Zoom API directly, or upload local MP4/M4A exports. The actor downloads and transcribes either source.

What about Microsoft Teams, Google Meet, or Webex recordings? Yes. Despite the niche name, this actor accepts any meeting recording URL or file from any platform. Zoom is just the most common source.


Use this actor only on meeting recordings you have rights to transcribe — your own meetings, recordings with all participants' consent, or properly licensed content. Some jurisdictions require all-party consent for recording calls; you are responsible for compliance. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.


Support

Telegram Support Email SIÁN Agency

Join the Telegram support group, email support@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.


More from SIÁN Agency

Platform-specific scrapers + transcribers:

Browse the full SIÁN Agency Apify Store for all available actors.