Pricing

from $0.15 / 1,000 second of video processeds

Transcribe Interview to Text — for Journalists & Researchers

Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.

Pricing

from $0.15 / 1,000 second of video processeds

Rating

0.0

(0)

Developer

SIÁN OÜ

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

How to transcribe an interview in 4 steps

Upload your interview recordings — drop .m4a, .mp3, .wav, .mp4, or any common format into the Upload Interview Recordings field. Bulk uploads supported.
Pick your options — auto-detect language or pick from 99+, toggle speaker diarization to separate the interviewer from each guest, optionally translate non-English interviews to English.
Run the actor — recordings process 10 at a time in parallel on the paid tier; an entire project's interviews can be transcribed in one run.
Download results — every recording lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.

Example output — interview transcript with speaker labels

{
  "transcript": "Interviewer: Tell me about the first time you realized... Guest: Honestly, it was when my mentor pulled me aside and said...",
  "detected_language": "en",
  "duration": 1432.7,
  "segments": [
    {
      "id": 0,
      "text": "Tell me about the first time you realized you wanted to do this.",
      "start": 0.42,
      "end": 4.18,
      "speaker": "SPEAKER_00",
      "language": "en",
      "words": [
        { "word": "Tell",  "start": 0.42, "end": 0.61, "speaker": "SPEAKER_00" },
        { "word": "me",    "start": 0.61, "end": 0.74, "speaker": "SPEAKER_00" }
      ]
    },
    {
      "id": 1,
      "text": "Honestly, it was when my mentor pulled me aside.",
      "start": 4.86,
      "end": 8.94,
      "speaker": "SPEAKER_01",
      "language": "en",
      "words": []
    }
  ],
  "srt": "1\n00:00:00,420 --> 00:00:04,180\nTell me about the first time you realized you wanted to do this.\n\n2\n00:00:04,860 --> 00:00:08,940\nHonestly, it was when my mentor pulled me aside.",
  "vtt": "WEBVTT\n\n00:00:00.420 --> 00:00:04.180\nTell me about the first time you realized you wanted to do this.\n\n00:00:04.860 --> 00:00:08.940\nHonestly, it was when my mentor pulled me aside.",
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "languages": ["en"],
  "fileSizeMB": 12.6,
  "success": true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, recording duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.

Built for journalists, qualitative researchers, market researchers

📰 Journalists — turn phone-recorded interviews into clean transcripts ready for quote pulling
🧪 Qualitative researchers — preserve participant voices with speaker separation for thematic analysis
📊 Market researchers — bulk-transcribe focus group and 1:1 interview tapes
📚 Oral historians — searchable, time-stamped archives of long-form interviews
🎙️ Podcasters publishing interview shows — transcripts for show notes, blog repurposing, and SEO

Speaker diarization (interviewer / guest separation)

Toggle the Speaker Diarization input and the actor automatically labels every segment and every word with the speaker it came from (SPEAKER_00 for the interviewer, SPEAKER_01 for the first guest, SPEAKER_02 for the second, etc.). This makes it trivial to extract clean quote attributions for journalism or coding qualitative data for research. Powered by pyannote-audio. Charged per audio second; only billed when enabled.

Translate foreign-language interviews to English

Toggle Translate to English and the actor returns the transcript translated into English while preserving timing — perfect for conducting interviews in your subject's native language and publishing in English. Combine with Speaker Diarization to get clean, attributed quotes in both directions. Charged separately when enabled.

SRT / VTT subtitle export

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

Publish a video version of the interview with subtitles for YouTube, Vimeo, or your CMS
Add HTML5 <track> accessibility captions to embedded video
Build a searchable interview archive with timestamps

Set Timestamp Granularities to word for cue precision down to individual words.

Why interviewers choose this transcriber

✅ Interviewer ↔ guest separation out of the box via pyannote-audio diarization — clean attributed quotes ready for publication
⏱️ Word-level timestamps for every word — find any quote in a 90-minute interview in seconds
🌐 Translate non-English interviews to English in the same run — perfect for international journalism and cross-cultural research
🎬 SRT and VTT subtitles included for video versions of interviews
🌍 99+ languages — automatic detection, no manual selection
🇪🇺 EU-region processing for GDPR-aligned research workflows
💰 Pay per audio second — no per-minute Rev.com markups, no Otter subscription
🚀 10× parallel on the paid tier — an entire research project's worth of interviews done in one run

Use cases

📰 Investigative journalists transcribing source interviews and pulling attributed quotes for stories
🎙️ Long-form podcasters generating publication-ready transcripts of every guest interview
🧪 Qualitative researchers coding participant transcripts in NVivo, Atlas.ti, or MAXQDA
📊 Market research firms transcribing focus groups and customer 1:1 sessions for thematic analysis
📚 Oral history projects preserving long-form recorded interviews with timestamped speaker tracks
🎓 Academic researchers conducting qualitative fieldwork in foreign-language contexts (transcribe + translate in one pass)
✍️ Authors and biographers working from hours of recorded conversations for book material
🎬 Documentary filmmakers preparing rough-cut transcripts of interview tape for editing

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tier	PAID tier
Perfect for testing and small jobs	Built for production volume
Up to 5 interviews per run	Unlimited interviews per run
50 MB max per file	1 GB max per file
200 MB / 20 minutes monthly	Unlimited monthly volume
3 concurrent files	10 concurrent files (10× parallel)
No credit card required	$0.0005 per audio second

Optional add-ons (only billed when enabled):

Feature	Price
Speaker diarization	$0.0001 per audio second
Translate to English	$0.0003 per audio second
EU-region processing	$0.0007 per audio second (replaces base $0.0005)

A 60-minute interview with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). Compare to Rev.com's $1.50/min ($90 for the same interview).

Integration examples

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('sian.agency/transcribe-interview-to-text').call({
    audioFiles: ['https://example.com/interview-with-source.m4a'],
    speakerDiarization: true,
    translateToEnglish: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')

run = client.actor('sian.agency/transcribe-interview-to-text').call(run_input={
    'audioFiles': ['https://example.com/interview-with-source.m4a'],
    'speakerDiarization': True,
    'translateToEnglish': False,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-interview-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "audioFiles": ["https://example.com/interview.m4a"],
    "speakerDiarization": true
  }'

n8n / Zapier / Make

Wire this actor onto a "new file in research-recordings folder" trigger (Dropbox, Google Drive, OneDrive). The dataset record returned per item includes transcript, segments[].words[], srt, and vtt — drop them into Notion (research database), Airtable (interview log), MAXQDA/NVivo (qualitative coding), or Google Docs (story drafts).

FAQ

How accurate is interview transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on clean studio or quiet-room interviews, lower on phone-recorded or noisy field interviews. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB on the paid tier.

Can I transcribe foreign-language interviews? Yes — auto-detection across 99+ languages including Spanish, French, German, Mandarin, Japanese, Portuguese, Arabic, Hindi, Russian, and many more. Toggle Translate to English to receive an English transcript alongside the timestamped original.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (interviewer), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 1-hour interview with diarization is approximately $2.16 — versus Rev.com at ~$90 for the same length.

Can I integrate this into my qualitative research workflow? Yes. The actor exposes a standard Apify run/dataset API. The dataset record includes transcript, segments[].words[], srt, and vtt ready to feed into NVivo, MAXQDA, Atlas.ti, Dovetail, or any qualitative analysis tool that accepts plain text or VTT.

What if my interview is multi-speaker (panel, focus group)? Speaker diarization handles up to ~6 distinct speakers reliably. Each speaker is labeled SPEAKER_00 through SPEAKER_N in temporal order of first speaking turn.

How long does a transcription take? A 60-minute interview takes 1–3 minutes on the paid tier. Bulk batches of 10 interviews complete in 5–10 minutes (parallelized).

Legal disclaimer

Use this actor only on interviews you have rights to transcribe — your own recordings with subject consent, properly licensed media, or material covered by journalistic source agreements. Some jurisdictions require subject consent for recording; you are responsible for compliance with applicable laws and IRB requirements for academic research. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.

Support

Join the Telegram support group, email apify@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.

More from SIÁN Agency

Platform-specific scrapers + transcribers:

Browse the full SIÁN Agency Apify Store for all available actors.

Transcribe Voice Memo to Text — Speaker Labels & Timestamps

sian.agency/transcribe-voice-memo-to-text

Transcribe iPhone and Android voice memos to text. Speaker labels, word-level timestamps, SRT/VTT. Bulk upload, 99+ languages. Try free.

SIÁN OÜ

Transcribe Video to Text & Audio to Text — 99+ Languages

sian.agency/INCREDIBLY-FAST-audio-transcriber

Transcribe video to text and audio to text in bulk on Apify. 99+ languages, word-level timestamps, speaker diarization, SRT/VTT export. Try free.

SIÁN OÜ

135

5.0

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

sian.agency/transcribe-podcast-to-text

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

SIÁN OÜ

Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

sian.agency/transcribe-zoom-meeting-to-text

Transcribe Zoom recordings to text in bulk. Speaker labels for host and participants, word-level timestamps, SRT/VTT export. 99+ languages. Try free.

SIÁN OÜ

Transcribe | Transcribe any video or audio

rexreus/Transcribe

Transcribe any video or audio from YouTube, TikTok, Instagram, Twitter, and 1000+ sites

REXREUS D.O

5.0

YouTube Video Transcribe

entertained_rattlesnake/youtube-video-transcribe

Transcribe YouTube videos by extracting subtitles and metadata, and push the results directly to the Apify Dataset.

Entertained Rattlesnake

Audio & Video Transcription + Speaker Diarization + SRT

vivid_astronaut/audio-video-transcription-diarization

Transcribe YouTube, TikTok, Instagram and direct audio/video with speaker diarization and SRT/VTT/TXT export. Flat $0.008/min, no OpenAI or other API key required.

Fabio Suizu

Kick VOD Transcription — Stream to Text, SRT & VTT

scrapersdelight/kick-transcript-scraper

Transcribe Kick.com VODs (which have no captions) with AI speech-to-text — searchable transcript in TXT, SRT & VTT plus VOD metadata, by channel or VOD URL. No login or API key. Schedule it to transcribe new VODs automatically. $0.012 per audio minute.

Scrapers Delight

Subtitle Translator — SRT & VTT

dami_studio/subtitle-translator

Translate subtitles into many languages at once. Paste an SRT/VTT file (or give a video URL to auto-transcribe), pick target languages, and get clean translated SRT + VTT back — timings preserved. For localization, accessibility, and multi-language publishing.

Dami's Studio

Instagram Youtube Transcripts With Speaker Labels Full Account

transcriptdl/instagram-youtube-transcripts-with-speaker-labels-full-account

Verified 99.4% Success. BULK generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.