YouTube Audio Segment Downloader avatar

YouTube Audio Segment Downloader

Pricing

from $40.00 / 1,000 audio minute processeds

Go to Apify Store
YouTube Audio Segment Downloader

YouTube Audio Segment Downloader

Download YouTube audio tracks or selected audio segments as MP3, M4A, or WAV.

Pricing

from $40.00 / 1,000 audio minute processeds

Rating

0.0

(0)

Developer

Entertained Rattlesnake

Entertained Rattlesnake

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 days ago

Last modified

Share

Download YouTube audio (or a specific segment), transcribe it, and push the transcript to the Apify Dataset.

The Actor supports MP3, M4A, and WAV output formats. When startTime and/or endTime are provided, the Actor downloads only the requested section using yt-dlp --download-sections instead of downloading the full audio first. Audio files are stored in the Key-Value Store; the Dataset receives one item per successful video with the same minimal schema as entertained_rattlesnake/youtube-subtitle-scraper-ai.

Use this Actor only for videos you own, videos licensed for reuse, or videos where you have permission to download and process the audio.

Features

  • Download YouTube audio
  • Download only a selected audio segment (90, 1:30, 00:01:30)
  • Supports MP3, M4A, and WAV
  • Audio files saved to Key-Value Store
  • Built-in transcription via configurable HTTP endpoint (default: Whisper)
  • Optional [HH:MM:SS.mmm] timestamps in the transcript
  • Dataset schema compatible with entertained_rattlesnake/youtube-subtitle-scraper-ai
  • Built-in proxy fallback (residential → datacenter → auto → none)

Input

{
"videos": [
"https://www.youtube.com/watch?v=qb15hppy9rI"
],
"format": "mp3",
"audioQuality": "192k",
"startTime": "00:01:30",
"endTime": "00:03:00",
"fileNamePrefix": "youtube-audio",
"maxFileSizeMb": 500,
"transcribe": true,
"transcribeTimeoutSeconds": 600,
"timestamps": false
}```
- `videos` — list of YouTube URLs or 11-character video IDs.
- `timestamps` — if `true`, each line of the returned `text` is prefixed with `[HH:MM:SS.mmm]`.
- `transcribe`, `transcribeTimeoutSeconds` — control transcription behaviour. The transcription endpoint URL is hardcoded inside the Actor and is not user-configurable.
- `startTime` / `endTime` — optional segment bounds.
## Output
Each successfully transcribed video produces one Dataset item with the schema below. Failed runs are written to the Key-Value Store as `FAILED_<videoId>.json` and are NOT pushed to the Dataset (this matters for pay-per-result monetization).
| Field | Type | Description |
| --- | --- | --- |
| `videoUrl` | `string` (uri) | Canonical URL of the YouTube video. |
| `videoId` | `string` | 11-character YouTube video identifier. |
| `videoTitle` | `string \| null` | Video title as reported by YouTube. |
| `channelName` | `string \| null` | Display name of the channel that published the video. |
| `views` | `string \| null` | View count at the time of extraction. |
| `text` | `string` | Full transcript. Includes `[HH:MM:SS.mmm]` prefixes when `timestamps` is `true`. |
Example item:
```json
[
{
"videoUrl": "https://www.youtube.com/watch?v=qb15hppy9rI",
"videoId": "qb15hppy9rI",
"videoTitle": "З нами сьогодні Ігор Закотинський...",
"channelName": "Example channel",
"views": "1933",
"text": "З нами сьогодні Ігор Закотинський, той самий син маминої подриги..."
}
]

Side outputs (Key-Value Store)

  • AUDIO_<videoId>_<timestamp>.<ext> — converted audio file.
  • TRANSCRIPT_<videoId>_<timestamp>.json — full transcription response (segments, language, duration, etc.).
  • FAILED_<videoId>.json — failure details for items that did not produce a Dataset record.
  • SUMMARY — run summary.

Proxy

The Actor uses Apify Proxy automatically — there is no input field for it. Each yt-dlp call iterates through a built-in fallback chain until one succeeds:

  1. RESIDENTIAL group (preferred — YouTube blocks datacenter IPs aggressively)
  2. Datacenter group (BUYPROXIES94952)
  3. Default Apify Proxy (auto)
  4. No proxy (last resort)

A separate proxy session is used per video so retries for the same video stick to the same IP.

Transcription

When transcribe is true, the saved audio file is sent to the built-in transcription endpoint as multipart/form-data with a single file field. The endpoint is expected to return JSON with at least status, language, duration_seconds, text, and segments. If transcription fails, the video is treated as failed (no Dataset record).

Pricing (Pay per event)

The Actor is monetized via Apify's Pay-per-event model. Two events are emitted by the Actor and must be configured in the Apify Console (Actor → Monetization → Pay per event):

Event IDWhen it firesRecommended price (USD)Covers
video-startedOnce per video, immediately after YouTube metadata is fetched successfully.$0.05Fixed overhead: metadata fetch, proxy retries, KV-store writes, run startup.
audio-minute-processedOnce per successful Dataset item, with count = ceil(durationSeconds / 60). Duration is taken from the transcription response when available, otherwise from the requested segment / video metadata.$0.03 per minuteResidential proxy data transfer (~$8/GB), CPU/RAM compute, external data transfer, and transcription server time.

Failed videos are not charged for audio-minute-processed (only video-started is charged once the metadata fetch succeeds).

Example prices for the end user

Audio lengthCharge breakdownTotal
1 minute$0.05 + 1 × $0.03$0.08
5 minutes$0.05 + 5 × $0.03$0.20
15 minutes$0.05 + 15 × $0.03$0.50
30 minutes$0.05 + 30 × $0.03$0.95
60 minutes$0.05 + 60 × $0.03$1.85

How to configure in Apify Console

  1. Open the Actor → Monetization → choose Pay per event.
  2. Add the two events above with the exact IDs video-started and audio-minute-processed and the prices from the table.
  3. Optionally set a global maxTotalChargeUsd per run as a safety cap.

The SDK calls (Actor.charge({ eventName, count })) live in src/charging.js and are invoked from src/processVideo.js. If an event is not configured in the Console the SDK silently skips the charge — the Actor will still run, it just will not bill the user.