Pricing

from $0.15 / 1,000 audio second processeds

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

Pricing

from $0.15 / 1,000 audio second processeds

Rating

0.0

(0)

Developer

SIÁN OÜ

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

How to transcribe a podcast in 4 steps

Paste your episode URLs — most podcast hosts expose the raw .mp3 URL on the episode page or in the RSS feed. Drop them into the Podcast Episode URLs field one per line, or upload local files via audioFiles.
Pick your options — auto-detect language or pick from 99+, toggle speaker diarization to label host vs guests, optionally translate non-English shows to English.
Run the actor — episodes process 10 at a time in parallel on the paid tier; an entire season's backlog can be transcribed in one run.
Download results — every episode lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: MP3, M4A, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.

Example output — podcast transcript with speaker labels

{
  "transcript": "Welcome back to the show. Today I'm joined by... Thanks for having me, this is awesome...",
  "detected_language": "en",
  "duration": 2647.34,
  "segments": [
    {
      "id": 0,
      "text": "Welcome back to the show.",
      "start": 0.32,
      "end": 1.96,
      "speaker": "SPEAKER_00",
      "language": "en",
      "words": [
        { "word": "Welcome", "start": 0.32, "end": 0.74, "speaker": "SPEAKER_00" },
        { "word": "back",    "start": 0.74, "end": 0.98, "speaker": "SPEAKER_00" }
      ]
    },
    {
      "id": 1,
      "text": "Thanks for having me, this is awesome.",
      "start": 4.18,
      "end": 6.92,
      "speaker": "SPEAKER_01",
      "language": "en",
      "words": []
    }
  ],
  "srt": "1\n00:00:00,320 --> 00:00:01,960\nWelcome back to the show.\n\n2\n00:00:04,180 --> 00:00:06,920\nThanks for having me, this is awesome.",
  "vtt": "WEBVTT\n\n00:00:00.320 --> 00:00:01.960\nWelcome back to the show.\n\n00:00:04.180 --> 00:00:06.920\nThanks for having me, this is awesome.",
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "languages": ["en"],
  "fileSizeMB": 38.4,
  "success": true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, episode duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.

Built for podcasters

📝 Show notes — auto-generate timestamped show notes for every episode and ship them with publication
✍️ Blog repurposing — turn an interview episode into 2,000-word SEO content in minutes
🎬 YouTube subtitles — upload SRT files alongside your video podcast for instant captions
🦻 Accessibility — give listeners with hearing impairments a full text version of every show
🔍 Episode SEO — searchable transcripts mean your back catalog gets discovered via Google for every quote
📊 Quote pulling — find the exact 12-second clip with the soundbite, with word-level timestamps for precise editing

Speaker diarization for hosts and guests

Toggle the Speaker Diarization input to automatically separate the host from each guest. Each segment and each word receives a speaker label (SPEAKER_00, SPEAKER_01, …) — perfect for interview-style shows where attribution matters. Powered by pyannote-audio. Charged per audio second; only billed when enabled.

SRT / VTT export for YouTube uploads

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

Upload alongside your video podcast on YouTube for instant accurate captions
Add HTML5 <track> accessibility captions to embedded episodes on your site
Build a chapter-and-quote searchable archive of your back catalog

Set Timestamp Granularities to word for cue precision down to individual words.

Why podcasters choose this transcriber

✅ Episode-URL-first — works directly with raw .mp3 URLs from Spotify for Podcasters, Buzzsprout, Libsyn, Transistor, Acast, Megaphone, RSS feeds, and any other host
🎤 Host vs guest separation via pyannote-audio diarization — perfect for interview shows
⏱️ Word-level timestamps — pull the exact 12-second clip for soundbites without scrubbing audio
🎬 SRT and VTT included on every successful run — drop straight into YouTube uploads of your video podcast
📝 Show-notes-ready output — segments[] gives you natural chapter boundaries and quotable lines
🌍 99+ languages — supports multilingual podcasts and international shows
🇪🇺 EU-region processing for GDPR-aligned workflows
💰 Pay per audio second — a 60-minute episode with diarization is ~$2.16, vs $30+ at typical podcast transcription services

Use cases

🎙️ Interview-show podcasters — generate publication-ready transcripts of every guest episode for show notes and blog posts
📺 Video podcast creators — get SRT files ready to upload to YouTube alongside the episode video
📊 Podcast network operators — bulk-transcribe an entire show's back catalog overnight for SEO and discovery
✍️ Solo content creators — turn a 60-minute episode into a 2,000-word blog post in one workflow
🔍 Researchers studying podcasts — academic, journalistic, or market research projects requiring searchable text corpora
🦻 Accessibility-conscious shows — provide a full transcript on every episode page for hearing-impaired listeners
🌐 Translators and localizers — transcribe foreign-language podcasts and translate them to English in a single pass

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tier	PAID tier
Perfect for testing and small jobs	Built for production volume
Up to 5 episodes per run	Unlimited episodes per run
50 MB max per file	1 GB max per file
200 MB / 20 minutes monthly	Unlimited monthly volume
3 concurrent files	10 concurrent files (10× parallel)
No credit card required	$0.0005 per audio second

Optional add-ons (only billed when enabled):

Feature	Price
Speaker diarization	$0.0001 per audio second
Translate to English	$0.0003 per audio second
EU-region processing	$0.0007 per audio second (replaces base $0.0005)

A 60-minute episode with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). A whole 30-episode season with diarization: ~$65.

Integration examples

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('sian.agency/transcribe-podcast-to-text').call({
    audioUrls: ['https://traffic.libsyn.com/yourshow/episode-42.mp3'],
    speakerDiarization: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt); // ready to upload to YouTube

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')

run = client.actor('sian.agency/transcribe-podcast-to-text').call(run_input={
    'audioUrls': ['https://traffic.libsyn.com/yourshow/episode-42.mp3'],
    'speakerDiarization': True,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-podcast-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "audioUrls": ["https://traffic.libsyn.com/yourshow/episode-42.mp3"],
    "speakerDiarization": true
  }'

n8n / Zapier / Make

Wire this actor onto an "RSS feed updated" trigger or your podcast host's webhook. Pass the new episode URL into audioUrls, capture transcript, segments[].words[], srt, and vtt from the dataset, then route to your CMS (publish show notes), YouTube (upload SRT), or your blog (auto-draft a repurposed post).

FAQ

How accurate is podcast transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on professionally recorded podcasts, lower on remote-recorded shows with poor audio. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? MP3 (most common podcast format), M4A, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB per file on the paid tier (long-form interview shows up to ~10 hours fit comfortably).

Can I transcribe non-English podcasts? Yes — auto-detection across 99+ languages. Toggle Translate to English to receive an English transcript alongside the timestamps — perfect for translating foreign-language shows for English-speaking audiences.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (host), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 60-minute episode with diarization is ~$2.16 — versus dedicated podcast transcription services typically charging $30+ for the same length.

Can I integrate this with my podcast hosting workflow? Yes. The actor exposes a standard Apify run/dataset API. Trigger on an RSS feed update or your podcast host's webhook (Buzzsprout, Transistor, Acast, Megaphone), run the actor, route the dataset record into your CMS, blog, YouTube uploader, or social media scheduler.

Does this work with Spotify for Podcasters, Apple Podcasts, etc.? The actor consumes the raw .mp3 URL from your podcast host, not the Spotify or Apple Podcasts page. All major hosts (Spotify for Podcasters, Buzzsprout, Libsyn, Transistor, Acast, Megaphone, Captivate, RedCircle) expose direct .mp3 URLs in their RSS feed or episode page.

How long does a transcription take? A 60-minute episode takes 1–3 minutes on the paid tier. A 30-episode season can be transcribed in 30–60 minutes (parallelized).

Legal disclaimer

Use this actor only on podcast episodes you have rights to transcribe — your own shows, content with creator consent, or properly licensed material. Many podcast feeds are publicly accessible, but transcript publishing rights vary by show and platform. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.

Support

Join the Telegram support group, email apify@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.

More from SIÁN Agency

Platform-specific scrapers + transcribers:

Browse the full SIÁN Agency Apify Store for all available actors.

Transcribe Video to Text & Audio to Text — 99+ Languages

sian.agency/INCREDIBLY-FAST-audio-transcriber

Transcribe video to text and audio to text in bulk on Apify. 99+ languages, word-level timestamps, speaker diarization, SRT/VTT export. Try free.

SIÁN OÜ

135

5.0

Transcribe Voice Memo to Text — Speaker Labels & Timestamps

sian.agency/transcribe-voice-memo-to-text

Transcribe iPhone and Android voice memos to text. Speaker labels, word-level timestamps, SRT/VTT. Bulk upload, 99+ languages. Try free.

SIÁN OÜ

Transcribe Interview to Text — for Journalists & Researchers

sian.agency/transcribe-interview-to-text

Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.

SIÁN OÜ

Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

sian.agency/transcribe-zoom-meeting-to-text

Transcribe Zoom recordings to text in bulk. Speaker labels for host and participants, word-level timestamps, SRT/VTT export. 99+ languages. Try free.

SIÁN OÜ

Podcast Host & Creator Lead Scraper

meticulous_snail/podcast-host-lead-scraper

Find podcast hosts and show owners by category and keyword. Extract show name, host name, email (from show notes), website, episode count, and listener signals from Listen Notes. Best for podcast advertising, sponsorship outreach, and media tools.

Beatsync Pro

Podcast Show Notes Generator — AI Transcription & Chapters

toshiusklay/wisprs-podcast-show-notes

Transcribe any podcast episode and auto-generate show notes, timestamped chapters, and guest quotes. Accepts MP3, RSS feeds, M4A, Spotify embed URLs. Speaker diarization. 100+ languages. No Wisprs account needed.

Gitonga Mwaura

Wisprs — AI Transcription & Subtitle Generator

toshiusklay/wisprs-transcription

Transcribe any YouTube video, podcast, TikTok, or audio/video URL to text. Export as SRT, VTT, TXT, JSON, Markdown, or DOCX. Generate AI summaries, chapters, show notes, and Twitter threads. 100+ languages. No Wisprs account needed.

Gitonga Mwaura

Podcast Transcriber & Analyzer

hgservices/podcast-transcriber

Download, Transcribe and analyze any podcast from an RSS feed, Apple Podcasts link, show name, or episode URL.

Harish Garg

5.0

Speech to Text — Audio Transcription API, 100+ Languages

vivid_astronaut/speech-to-text

Transcribe audio to text with high accuracy in 100+ languages, with speaker detection and word timestamps. Input an audio file, get structured transcript JSON — ready for subtitles, meeting notes, and voice apps.

BRAINIALL Team

Subtitle Translator — SRT & VTT

dami_studio/subtitle-translator

Translate subtitles into many languages at once. Paste an SRT/VTT file (or give a video URL to auto-transcribe), pick target languages, and get clean translated SRT + VTT back — timings preserved. For localization, accessibility, and multi-language publishing.