YouTube Audio Segment Downloader
Pricing
from $40.00 / 1,000 audio minute processeds
YouTube Audio Segment Downloader
Download YouTube audio tracks or selected audio segments as MP3, M4A, or WAV.
Pricing
from $40.00 / 1,000 audio minute processeds
Rating
0.0
(0)
Developer
Entertained Rattlesnake
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Download YouTube audio (or a specific segment), transcribe it, and push the transcript to the Apify Dataset.
The Actor supports MP3, M4A, and WAV output formats. When startTime and/or endTime are provided, the Actor downloads only the requested section using yt-dlp --download-sections instead of downloading the full audio first. Audio files are stored in the Key-Value Store; the Dataset receives one item per successful video with the same minimal schema as entertained_rattlesnake/youtube-subtitle-scraper-ai.
Use this Actor only for videos you own, videos licensed for reuse, or videos where you have permission to download and process the audio.
Features
- Download YouTube audio
- Download only a selected audio segment (
90,1:30,00:01:30) - Supports MP3, M4A, and WAV
- Audio files saved to Key-Value Store
- Built-in transcription via configurable HTTP endpoint (default: Whisper)
- Optional
[HH:MM:SS.mmm]timestamps in the transcript - Dataset schema compatible with
entertained_rattlesnake/youtube-subtitle-scraper-ai - Built-in proxy fallback (residential → datacenter → auto → none)
Input
{"videos": ["https://www.youtube.com/watch?v=qb15hppy9rI"],"format": "mp3","audioQuality": "192k","startTime": "00:01:30","endTime": "00:03:00","fileNamePrefix": "youtube-audio","maxFileSizeMb": 500,"transcribe": true,"transcribeTimeoutSeconds": 600,"timestamps": false}```- `videos` — list of YouTube URLs or 11-character video IDs.- `timestamps` — if `true`, each line of the returned `text` is prefixed with `[HH:MM:SS.mmm]`.- `transcribe`, `transcribeTimeoutSeconds` — control transcription behaviour. The transcription endpoint URL is hardcoded inside the Actor and is not user-configurable.- `startTime` / `endTime` — optional segment bounds.## OutputEach successfully transcribed video produces one Dataset item with the schema below. Failed runs are written to the Key-Value Store as `FAILED_<videoId>.json` and are NOT pushed to the Dataset (this matters for pay-per-result monetization).| Field | Type | Description || --- | --- | --- || `videoUrl` | `string` (uri) | Canonical URL of the YouTube video. || `videoId` | `string` | 11-character YouTube video identifier. || `videoTitle` | `string \| null` | Video title as reported by YouTube. || `channelName` | `string \| null` | Display name of the channel that published the video. || `views` | `string \| null` | View count at the time of extraction. || `text` | `string` | Full transcript. Includes `[HH:MM:SS.mmm]` prefixes when `timestamps` is `true`. |Example item:```json[{"videoUrl": "https://www.youtube.com/watch?v=qb15hppy9rI","videoId": "qb15hppy9rI","videoTitle": "З нами сьогодні Ігор Закотинський...","channelName": "Example channel","views": "1933","text": "З нами сьогодні Ігор Закотинський, той самий син маминої подриги..."}]
Side outputs (Key-Value Store)
AUDIO_<videoId>_<timestamp>.<ext>— converted audio file.TRANSCRIPT_<videoId>_<timestamp>.json— full transcription response (segments, language, duration, etc.).FAILED_<videoId>.json— failure details for items that did not produce a Dataset record.SUMMARY— run summary.
Proxy
The Actor uses Apify Proxy automatically — there is no input field for it. Each yt-dlp call iterates through a built-in fallback chain until one succeeds:
RESIDENTIALgroup (preferred — YouTube blocks datacenter IPs aggressively)- Datacenter group (
BUYPROXIES94952) - Default Apify Proxy (auto)
- No proxy (last resort)
A separate proxy session is used per video so retries for the same video stick to the same IP.
Transcription
When transcribe is true, the saved audio file is sent to the built-in transcription endpoint as multipart/form-data with a single file field. The endpoint is expected to return JSON with at least status, language, duration_seconds, text, and segments. If transcription fails, the video is treated as failed (no Dataset record).
Pricing (Pay per event)
The Actor is monetized via Apify's Pay-per-event model. Two events are emitted by the Actor and must be configured in the Apify Console (Actor → Monetization → Pay per event):
| Event ID | When it fires | Recommended price (USD) | Covers |
|---|---|---|---|
video-started | Once per video, immediately after YouTube metadata is fetched successfully. | $0.05 | Fixed overhead: metadata fetch, proxy retries, KV-store writes, run startup. |
audio-minute-processed | Once per successful Dataset item, with count = ceil(durationSeconds / 60). Duration is taken from the transcription response when available, otherwise from the requested segment / video metadata. | $0.03 per minute | Residential proxy data transfer (~$8/GB), CPU/RAM compute, external data transfer, and transcription server time. |
Failed videos are not charged for audio-minute-processed (only video-started is charged once the metadata fetch succeeds).
Example prices for the end user
| Audio length | Charge breakdown | Total |
|---|---|---|
| 1 minute | $0.05 + 1 × $0.03 | $0.08 |
| 5 minutes | $0.05 + 5 × $0.03 | $0.20 |
| 15 minutes | $0.05 + 15 × $0.03 | $0.50 |
| 30 minutes | $0.05 + 30 × $0.03 | $0.95 |
| 60 minutes | $0.05 + 60 × $0.03 | $1.85 |
How to configure in Apify Console
- Open the Actor → Monetization → choose Pay per event.
- Add the two events above with the exact IDs
video-startedandaudio-minute-processedand the prices from the table. - Optionally set a global
maxTotalChargeUsdper run as a safety cap.
The SDK calls (Actor.charge({ eventName, count })) live in src/charging.js and are invoked from src/processVideo.js. If an event is not configured in the Console the SDK silently skips the charge — the Actor will still run, it just will not bill the user.