Transcribe Podcast to Text — Show Notes, SRT & Timestamps avatar

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

Pricing

from $0.15 / 1,000 audio second processeds

Go to Apify Store
Transcribe Podcast to Text — Show Notes, SRT & Timestamps

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

Pricing

from $0.15 / 1,000 audio second processeds

Rating

0.0

(0)

Developer

SIÁN OÜ

SIÁN OÜ

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

SIÁN Agency Store Telegram Support Instagram AI Transcript Extractor Best TikTok AI Transcript Extractor YouTube Shorts AI Transcript Extractor Facebook AI Transcript Extractor

Transcribe podcast episodes to text in bulk. Drop direct episode URLs from your hosting provider (Spotify for Podcasters, Buzzsprout, Libsyn, Transistor, Acast, Megaphone, …) or upload local episode files. Speaker labels for host and guests, word-level timestamps for SEO-friendly show notes, ready-to-use SRT/VTT for video podcast exports. 99+ languages.


How to transcribe a podcast in 4 steps

  1. Paste your episode URLs — most podcast hosts expose the raw .mp3 URL on the episode page or in the RSS feed. Drop them into the Podcast Episode URLs field one per line, or upload local files via audioFiles.
  2. Pick your options — auto-detect language or pick from 99+, toggle speaker diarization to label host vs guests, optionally translate non-English shows to English.
  3. Run the actor — episodes process 10 at a time in parallel on the paid tier; an entire season's backlog can be transcribed in one run.
  4. Download results — every episode lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: MP3, M4A, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.


Example output — podcast transcript with speaker labels

{
"transcript": "Welcome back to the show. Today I'm joined by... Thanks for having me, this is awesome...",
"detected_language": "en",
"duration": 2647.34,
"segments": [
{
"id": 0,
"text": "Welcome back to the show.",
"start": 0.32,
"end": 1.96,
"speaker": "SPEAKER_00",
"language": "en",
"words": [
{ "word": "Welcome", "start": 0.32, "end": 0.74, "speaker": "SPEAKER_00" },
{ "word": "back", "start": 0.74, "end": 0.98, "speaker": "SPEAKER_00" }
]
},
{
"id": 1,
"text": "Thanks for having me, this is awesome.",
"start": 4.18,
"end": 6.92,
"speaker": "SPEAKER_01",
"language": "en",
"words": []
}
],
"srt": "1\n00:00:00,320 --> 00:00:01,960\nWelcome back to the show.\n\n2\n00:00:04,180 --> 00:00:06,920\nThanks for having me, this is awesome.",
"vtt": "WEBVTT\n\n00:00:00.320 --> 00:00:01.960\nWelcome back to the show.\n\n00:00:04.180 --> 00:00:06.920\nThanks for having me, this is awesome.",
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"languages": ["en"],
"fileSizeMB": 38.4,
"success": true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, episode duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.


Built for podcasters

  • 📝 Show notes — auto-generate timestamped show notes for every episode and ship them with publication
  • ✍️ Blog repurposing — turn an interview episode into 2,000-word SEO content in minutes
  • 🎬 YouTube subtitles — upload SRT files alongside your video podcast for instant captions
  • 🦻 Accessibility — give listeners with hearing impairments a full text version of every show
  • 🔍 Episode SEO — searchable transcripts mean your back catalog gets discovered via Google for every quote
  • 📊 Quote pulling — find the exact 12-second clip with the soundbite, with word-level timestamps for precise editing

Speaker diarization for hosts and guests

Toggle the Speaker Diarization input to automatically separate the host from each guest. Each segment and each word receives a speaker label (SPEAKER_00, SPEAKER_01, …) — perfect for interview-style shows where attribution matters. Powered by pyannote-audio. Charged per audio second; only billed when enabled.


SRT / VTT export for YouTube uploads

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

  • Upload alongside your video podcast on YouTube for instant accurate captions
  • Add HTML5 <track> accessibility captions to embedded episodes on your site
  • Build a chapter-and-quote searchable archive of your back catalog

Set Timestamp Granularities to word for cue precision down to individual words.


Why podcasters choose this transcriber

  • Episode-URL-first — works directly with raw .mp3 URLs from Spotify for Podcasters, Buzzsprout, Libsyn, Transistor, Acast, Megaphone, RSS feeds, and any other host
  • 🎤 Host vs guest separation via pyannote-audio diarization — perfect for interview shows
  • ⏱️ Word-level timestamps — pull the exact 12-second clip for soundbites without scrubbing audio
  • 🎬 SRT and VTT included on every successful run — drop straight into YouTube uploads of your video podcast
  • 📝 Show-notes-ready output — segments[] gives you natural chapter boundaries and quotable lines
  • 🌍 99+ languages — supports multilingual podcasts and international shows
  • 🇪🇺 EU-region processing for GDPR-aligned workflows
  • 💰 Pay per audio second — a 60-minute episode with diarization is ~$2.16, vs $30+ at typical podcast transcription services

Use cases

  • 🎙️ Interview-show podcasters — generate publication-ready transcripts of every guest episode for show notes and blog posts
  • 📺 Video podcast creators — get SRT files ready to upload to YouTube alongside the episode video
  • 📊 Podcast network operators — bulk-transcribe an entire show's back catalog overnight for SEO and discovery
  • ✍️ Solo content creators — turn a 60-minute episode into a 2,000-word blog post in one workflow
  • 🔍 Researchers studying podcasts — academic, journalistic, or market research projects requiring searchable text corpora
  • 🦻 Accessibility-conscious shows — provide a full transcript on every episode page for hearing-impaired listeners
  • 🌐 Translators and localizers — transcribe foreign-language podcasts and translate them to English in a single pass

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tierPAID tier
Perfect for testing and small jobsBuilt for production volume
Up to 5 episodes per runUnlimited episodes per run
50 MB max per file1 GB max per file
200 MB / 20 minutes monthlyUnlimited monthly volume
3 concurrent files10 concurrent files (10× parallel)
No credit card required$0.0005 per audio second

Optional add-ons (only billed when enabled):

FeaturePrice
Speaker diarization$0.0001 per audio second
Translate to English$0.0003 per audio second
EU-region processing$0.0007 per audio second (replaces base $0.0005)

A 60-minute episode with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). A whole 30-episode season with diarization: ~$65.


Integration examples

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('sian.agency/transcribe-podcast-to-text').call({
audioUrls: ['https://traffic.libsyn.com/yourshow/episode-42.mp3'],
speakerDiarization: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt); // ready to upload to YouTube

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('sian.agency/transcribe-podcast-to-text').call(run_input={
'audioUrls': ['https://traffic.libsyn.com/yourshow/episode-42.mp3'],
'speakerDiarization': True,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-podcast-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"audioUrls": ["https://traffic.libsyn.com/yourshow/episode-42.mp3"],
"speakerDiarization": true
}'

n8n / Zapier / Make

Wire this actor onto an "RSS feed updated" trigger or your podcast host's webhook. Pass the new episode URL into audioUrls, capture transcript, segments[].words[], srt, and vtt from the dataset, then route to your CMS (publish show notes), YouTube (upload SRT), or your blog (auto-draft a repurposed post).


FAQ

How accurate is podcast transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on professionally recorded podcasts, lower on remote-recorded shows with poor audio. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? MP3 (most common podcast format), M4A, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB per file on the paid tier (long-form interview shows up to ~10 hours fit comfortably).

Can I transcribe non-English podcasts? Yes — auto-detection across 99+ languages. Toggle Translate to English to receive an English transcript alongside the timestamps — perfect for translating foreign-language shows for English-speaking audiences.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (host), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 60-minute episode with diarization is ~$2.16 — versus dedicated podcast transcription services typically charging $30+ for the same length.

Can I integrate this with my podcast hosting workflow? Yes. The actor exposes a standard Apify run/dataset API. Trigger on an RSS feed update or your podcast host's webhook (Buzzsprout, Transistor, Acast, Megaphone), run the actor, route the dataset record into your CMS, blog, YouTube uploader, or social media scheduler.

Does this work with Spotify for Podcasters, Apple Podcasts, etc.? The actor consumes the raw .mp3 URL from your podcast host, not the Spotify or Apple Podcasts page. All major hosts (Spotify for Podcasters, Buzzsprout, Libsyn, Transistor, Acast, Megaphone, Captivate, RedCircle) expose direct .mp3 URLs in their RSS feed or episode page.

How long does a transcription take? A 60-minute episode takes 1–3 minutes on the paid tier. A 30-episode season can be transcribed in 30–60 minutes (parallelized).


Use this actor only on podcast episodes you have rights to transcribe — your own shows, content with creator consent, or properly licensed material. Many podcast feeds are publicly accessible, but transcript publishing rights vary by show and platform. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.


Support

Telegram Support Email SIÁN Agency

Join the Telegram support group, email support@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.


More from SIÁN Agency

Platform-specific scrapers + transcribers:

Browse the full SIÁN Agency Apify Store for all available actors.