Transcribe Interview to Text — for Journalists & Researchers
Pricing
from $0.15 / 1,000 second of video processeds
Transcribe Interview to Text — for Journalists & Researchers
Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.
Pricing
from $0.15 / 1,000 second of video processeds
Rating
0.0
(0)
Developer
SIÁN OÜ
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Transcribe interviews and recorded conversations to text. Built for journalists, qualitative researchers, market researchers, and anyone with hours of interview tape. Speaker labels for interviewer and guest, word-level timestamps for precise quote extraction, SRT/VTT subtitles, 99+ languages.
How to transcribe an interview in 4 steps
- Upload your interview recordings — drop
.m4a,.mp3,.wav,.mp4, or any common format into the Upload Interview Recordings field. Bulk uploads supported. - Pick your options — auto-detect language or pick from 99+, toggle speaker diarization to separate the interviewer from each guest, optionally translate non-English interviews to English.
- Run the actor — recordings process 10 at a time in parallel on the paid tier; an entire project's interviews can be transcribed in one run.
- Download results — every recording lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.
Supported formats: M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.
Example output — interview transcript with speaker labels
{"transcript": "Interviewer: Tell me about the first time you realized... Guest: Honestly, it was when my mentor pulled me aside and said...","detected_language": "en","duration": 1432.7,"segments": [{"id": 0,"text": "Tell me about the first time you realized you wanted to do this.","start": 0.42,"end": 4.18,"speaker": "SPEAKER_00","language": "en","words": [{ "word": "Tell", "start": 0.42, "end": 0.61, "speaker": "SPEAKER_00" },{ "word": "me", "start": 0.61, "end": 0.74, "speaker": "SPEAKER_00" }]},{"id": 1,"text": "Honestly, it was when my mentor pulled me aside.","start": 4.86,"end": 8.94,"speaker": "SPEAKER_01","language": "en","words": []}],"srt": "1\n00:00:00,420 --> 00:00:04,180\nTell me about the first time you realized you wanted to do this.\n\n2\n00:00:04,860 --> 00:00:08,940\nHonestly, it was when my mentor pulled me aside.","vtt": "WEBVTT\n\n00:00:00.420 --> 00:00:04.180\nTell me about the first time you realized you wanted to do this.\n\n00:00:04.860 --> 00:00:08.940\nHonestly, it was when my mentor pulled me aside.","speakers": ["SPEAKER_00", "SPEAKER_01"],"languages": ["en"],"fileSizeMB": 12.6,"success": true}
Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, recording duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.
Built for journalists, qualitative researchers, market researchers
- 📰 Journalists — turn phone-recorded interviews into clean transcripts ready for quote pulling
- 🧪 Qualitative researchers — preserve participant voices with speaker separation for thematic analysis
- 📊 Market researchers — bulk-transcribe focus group and 1:1 interview tapes
- 📚 Oral historians — searchable, time-stamped archives of long-form interviews
- 🎙️ Podcasters publishing interview shows — transcripts for show notes, blog repurposing, and SEO
Speaker diarization (interviewer / guest separation)
Toggle the Speaker Diarization input and the actor automatically labels every segment and every word with the speaker it came from (SPEAKER_00 for the interviewer, SPEAKER_01 for the first guest, SPEAKER_02 for the second, etc.). This makes it trivial to extract clean quote attributions for journalism or coding qualitative data for research. Powered by pyannote-audio. Charged per audio second; only billed when enabled.
Translate foreign-language interviews to English
Toggle Translate to English and the actor returns the transcript translated into English while preserving timing — perfect for conducting interviews in your subject's native language and publishing in English. Combine with Speaker Diarization to get clean, attributed quotes in both directions. Charged separately when enabled.
SRT / VTT subtitle export
Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:
- Publish a video version of the interview with subtitles for YouTube, Vimeo, or your CMS
- Add HTML5
<track>accessibility captions to embedded video - Build a searchable interview archive with timestamps
Set Timestamp Granularities to word for cue precision down to individual words.
Why interviewers choose this transcriber
- ✅ Interviewer ↔ guest separation out of the box via pyannote-audio diarization — clean attributed quotes ready for publication
- ⏱️ Word-level timestamps for every word — find any quote in a 90-minute interview in seconds
- 🌐 Translate non-English interviews to English in the same run — perfect for international journalism and cross-cultural research
- 🎬 SRT and VTT subtitles included for video versions of interviews
- 🌍 99+ languages — automatic detection, no manual selection
- 🇪🇺 EU-region processing for GDPR-aligned research workflows
- 💰 Pay per audio second — no per-minute Rev.com markups, no Otter subscription
- 🚀 10× parallel on the paid tier — an entire research project's worth of interviews done in one run
Use cases
- 📰 Investigative journalists transcribing source interviews and pulling attributed quotes for stories
- 🎙️ Long-form podcasters generating publication-ready transcripts of every guest interview
- 🧪 Qualitative researchers coding participant transcripts in NVivo, Atlas.ti, or MAXQDA
- 📊 Market research firms transcribing focus groups and customer 1:1 sessions for thematic analysis
- 📚 Oral history projects preserving long-form recorded interviews with timestamped speaker tracks
- 🎓 Academic researchers conducting qualitative fieldwork in foreign-language contexts (transcribe + translate in one pass)
- ✍️ Authors and biographers working from hours of recorded conversations for book material
- 🎬 Documentary filmmakers preparing rough-cut transcripts of interview tape for editing
Pricing & tiers
Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.
| FREE tier | PAID tier |
|---|---|
| Perfect for testing and small jobs | Built for production volume |
| Up to 5 interviews per run | Unlimited interviews per run |
| 50 MB max per file | 1 GB max per file |
| 200 MB / 20 minutes monthly | Unlimited monthly volume |
| 3 concurrent files | 10 concurrent files (10× parallel) |
| No credit card required | $0.0005 per audio second |
Optional add-ons (only billed when enabled):
| Feature | Price |
|---|---|
| Speaker diarization | $0.0001 per audio second |
| Translate to English | $0.0003 per audio second |
| EU-region processing | $0.0007 per audio second (replaces base $0.0005) |
A 60-minute interview with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). Compare to Rev.com's $1.50/min ($90 for the same interview).
Integration examples
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('sian.agency/transcribe-interview-to-text').call({audioFiles: ['https://example.com/interview-with-source.m4a'],speakerDiarization: true,translateToEnglish: false,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items[0].transcript);console.log(items[0].srt);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_APIFY_TOKEN')run = client.actor('sian.agency/transcribe-interview-to-text').call(run_input={'audioFiles': ['https://example.com/interview-with-source.m4a'],'speakerDiarization': True,'translateToEnglish': False,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items[0]['transcript'])print(items[0]['vtt'])
cURL
curl -X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-interview-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN' \-H 'Content-Type: application/json' \-d '{"audioFiles": ["https://example.com/interview.m4a"],"speakerDiarization": true}'
n8n / Zapier / Make
Wire this actor onto a "new file in research-recordings folder" trigger (Dropbox, Google Drive, OneDrive). The dataset record returned per item includes transcript, segments[].words[], srt, and vtt — drop them into Notion (research database), Airtable (interview log), MAXQDA/NVivo (qualitative coding), or Google Docs (story drafts).
FAQ
How accurate is interview transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on clean studio or quiet-room interviews, lower on phone-recorded or noisy field interviews. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.
What audio and video formats are supported? M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB on the paid tier.
Can I transcribe foreign-language interviews? Yes — auto-detection across 99+ languages including Spanish, French, German, Mandarin, Japanese, Portuguese, Arabic, Hindi, Russian, and many more. Toggle Translate to English to receive an English transcript alongside the timestamped original.
Is speaker diarization included?
Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (interviewer), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.
How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 1-hour interview with diarization is approximately $2.16 — versus Rev.com at ~$90 for the same length.
Can I integrate this into my qualitative research workflow?
Yes. The actor exposes a standard Apify run/dataset API. The dataset record includes transcript, segments[].words[], srt, and vtt ready to feed into NVivo, MAXQDA, Atlas.ti, Dovetail, or any qualitative analysis tool that accepts plain text or VTT.
What if my interview is multi-speaker (panel, focus group)?
Speaker diarization handles up to ~6 distinct speakers reliably. Each speaker is labeled SPEAKER_00 through SPEAKER_N in temporal order of first speaking turn.
How long does a transcription take? A 60-minute interview takes 1–3 minutes on the paid tier. Bulk batches of 10 interviews complete in 5–10 minutes (parallelized).
Legal disclaimer
Use this actor only on interviews you have rights to transcribe — your own recordings with subject consent, properly licensed media, or material covered by journalistic source agreements. Some jurisdictions require subject consent for recording; you are responsible for compliance with applicable laws and IRB requirements for academic research. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.
Support
Join the Telegram support group, email support@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.
More from SIÁN Agency
Platform-specific scrapers + transcribers:
- Instagram AI Transcript Extractor
- Best TikTok AI Transcript Extractor
- YouTube Shorts AI Transcript Extractor
- Facebook AI Transcript Extractor
Browse the full SIÁN Agency Apify Store for all available actors.