Pricing

Pay per usage

Go to Apify Store

YouTube Audio Downloader Free

Try for free

Downloads audio from YouTube videos and stores them as files in the key-value store with metadata in the dataset.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Epic Scrapers

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify >= 3.0.0, < 4.0.0
5yt-dlp >= 2024.0.0

Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.14

# Install ffmpeg for audio extraction / format conversion
USER root
RUN apt-get update -qq \
    && apt-get install -y -qq --no-install-recommends ffmpeg \
    && rm -rf /var/lib/apt/lists/*

USER myuser

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY --chown=myuser:myuser requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY --chown=myuser:myuser . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python -m compileall -q my_actor/

# Specify how to launch the source code of your Actor.
CMD ["python", "-m", "my_actor"]

.dockerignore

.git
.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

.gitignore

.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

# Added by Apify CLI
node_modules

AGENTS.md

1# Apify Actors Development Guide
2
3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the tool and model you're currently using, such as "Claude Code with Claude Sonnet 4.5". This helps Apify monitor and improve AGENTS.md for specific AI tools and models.
4
5## What are Apify Actors?
6
7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems.
8- Actors are programs packaged as Docker images that run in isolated containers
9
10## Core Concepts
11
12- Accept well-defined JSON input
13- Perform isolated tasks (web scraping, automation, data processing)
14- Produce structured JSON output to datasets and/or store data in key-value stores
15- Can run from seconds to hours or even indefinitely
16- Persist state and can be restarted
17
18## Do
19
20- accept well-defined JSON input and produce structured JSON output
21- use Apify SDK (`apify`) for code running ON Apify platform
22- validate input early with proper error handling and fail gracefully
23- use CheerioCrawler for static HTML content (10x faster than browsers)
24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content
25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls
26- implement retry strategies with exponential backoff for failed requests
27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5)
28- set sensible defaults in `.actor/input_schema.json` for all optional fields
29- set up output schema in `.actor/output_schema.json`
30- clean and validate data before pushing to dataset
31- use semantic CSS selectors and fallback strategies for missing elements
32- respect robots.txt, ToS, and implement rate limiting with delays
33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance
34- use `Actor.log` for logging (censors sensitive data)
35- implement readiness probe handler for standby Actors
36- handle the `aborting` event to gracefully shut down when Actor is stopped
37
38## Don't
39
40- do not rely on `Dataset.getInfo()` for final counts on Cloud platform
41- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP)
42- do not hard code values that should be in input schema or environment variables
43- do not skip input validation or error handling
44- do not overload servers - use appropriate concurrency and delays
45- do not scrape prohibited content or ignore Terms of Service
46- do not store personal/sensitive data unless explicitly permitted
47- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x)
48- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead
49- do not assume that local storage is persistent or automatically synced to Apify Console - when running locally with `apify run`, the `storage/` directory is local-only and is NOT pushed to the Cloud
50- do not disable standby mode (`usesStandbyMode: false`) without explicit permission
51
52## Logging
53
54- **ALWAYS use `Actor.log` for logging** - This logger contains critical security logic including censoring sensitive data (Apify tokens, API keys, credentials) to prevent accidental exposure in logs
55
56### Available Log Levels
57
58The Apify Actor logger provides the following methods for logging:
59
60- `Actor.log.debug()` - Debug level logs (detailed diagnostic information)
61- `Actor.log.info()` - Info level logs (general informational messages)
62- `Actor.log.warning()` - Warning level logs (warning messages for potentially problematic situations)
63- `Actor.log.error()` - Error level logs (error messages for failures)
64- `Actor.log.exception()` - Exception level logs (for exceptions with stack traces)
65
66**Best practices:**
67
68- Use `Actor.log.debug()` for detailed operation-level diagnostics (inside functions)
69- Use `Actor.log.info()` for general informational messages (API requests, successful operations)
70- Use `Actor.log.warning()` for potentially problematic situations (validation failures, unexpected states)
71- Use `Actor.log.error()` for actual errors and failures
72- Use `Actor.log.exception()` for caught exceptions with stack traces
73
74## Graceful Abort Handling
75
76Handle the `aborting` event to terminate the Actor quickly when stopped by user or platform, minimizing costs especially for PPU/PPE+U billing.
77
78```python
79import asyncio
80
81async def on_aborting() -> None:
82    # Persist any state, do any cleanup you need, and terminate the Actor using `await Actor.exit()` explicitly as soon as possible
83    # This will help ensure that the Actor is doing best effort to honor any potential limits on costs of a single run set by the user
84    # Wait 1 second to allow Crawlee/SDK state persistence operations to complete
85    # This is a temporary workaround until SDK implements proper state persistence in the aborting event
86    await asyncio.sleep(1)
87    await Actor.exit()
88
89Actor.on('aborting', on_aborting)
90```
91
92## Standby Mode
93
94- **NEVER disable standby mode (`usesStandbyMode: false`) in `.actor/actor.json` without explicit permission** - Actor Standby mode solves this problem by letting you have the Actor ready in the background, waiting for the incoming HTTP requests. In a sense, the Actor behaves like a real-time web server or standard API server instead of running the logic once to process everything in batch. Always keep `usesStandbyMode: true` unless there is a specific documented reason to disable it
95- **ALWAYS implement readiness probe handler for standby Actors** - Handle the `x-apify-container-server-readiness-probe` header at GET / endpoint to ensure proper Actor lifecycle management
96
97You can recognize a standby Actor by checking the `usesStandbyMode` property in `.actor/actor.json`. Only implement the readiness probe if this property is set to `true`.
98
99### Readiness Probe Implementation Example
100
101```python
102# Apify standby readiness probe
103from http.server import SimpleHTTPRequestHandler
104
105class GetHandler(SimpleHTTPRequestHandler):
106    def do_GET(self):
107        # Handle Apify standby readiness probe
108        if 'x-apify-container-server-readiness-probe' in self.headers:
109            self.send_response(200)
110            self.end_headers()
111            self.wfile.write(b'Readiness probe OK')
112            return
113
114        self.send_response(200)
115        self.end_headers()
116        self.wfile.write(b'Actor is ready')
117```
118
119Key points:
120
121- Detect the `x-apify-container-server-readiness-probe` header in incoming requests
122- Respond with HTTP 200 status code for both readiness probe and normal requests
123- This enables proper Actor lifecycle management in standby mode
124
125## Commands
126
127```bash
128# Bootstrap & local development
129apify create [name]                    # Create new Actor project from a template
130apify init                             # Initialize Actor in current directory
131apify run                              # Run Actor locally with simulated platform env
132apify run --purge                      # Run after clearing previous local storage
133apify validate-schema                  # Validate .actor/input_schema.json
134
135# Authentication & account
136apify login                            # Authenticate account (token stored in ~/.apify)
137apify logout                           # Remove stored credentials
138apify info                             # Print currently authenticated account info
139
140# Deployment & remote execution
141apify push                             # Deploy Actor to platform per .actor/actor.json
142apify pull <actor>                     # Download Actor code from the platform
143apify call <actor>                     # Execute Actor remotely on the platform
144apify actors build <actor>             # Create a new build of an Actor
145apify runs ls                          # List recent runs
146
147# Discovery (search Apify Store for community Actors)
148apify actors search "<query>" --user-agent <your-agent-name>
149apify actors info <actor>              # Get details about a specific Actor
150
151# Secrets (referenced from actor.json via "@mySecret")
152apify secrets add <name> <value>       # Store a secret locally; uploaded on push
153apify secrets ls                       # List stored secret keys
154
155# Direct API access
156apify api <endpoint>                   # Send an authenticated HTTP request to Apify API
157
158# Help
159apify help                             # List all commands
160apify <command> --help                 # Get help for a specific command
161```
162
163Note: If no dedicated Actor exists for your target, search Apify Store for community options with `apify actors search "<query>" --user-agent <your-agent-name>` before building from scratch.
164
165Tip: Inside a running Actor, prefer the SDK (`Actor.get_input()`, `Actor.push_data()`, `Actor.set_value()`) over the equivalent `apify actor` runtime subcommands.
166
167## Apify Platform Environment
168
169When the Actor runs on the Apify platform, the API token is automatically available via the `APIFY_TOKEN` environment variable (note: the variable is `APIFY_TOKEN`, not `APIFY_API_TOKEN`). The Apify SDK reads it automatically, so you do not need to pass it explicitly. Locally, run `apify login` once and the SDK will use your stored credentials.
170
171## Safety and Permissions
172
173Allowed without prompt:
174
175- read files with `Actor.get_value()`
176- push data with `Actor.push_data()`
177- set values with `Actor.set_value()`
178- enqueue requests to RequestQueue
179- run locally with `apify run`
180
181Ask first:
182
183- npm/pip package installations
184- apify push (deployment to cloud)
185- proxy configuration changes (requires paid plan)
186- Dockerfile changes affecting builds
187- deleting datasets or key-value stores
188
189## Project Structure
190
191.actor/
192├── actor.json # Actor config: name, version, env vars, runtime settings
193├── input_schema.json # Input validation & Console form definition
194└── output_schema.json # Specifies where an Actor stores its output
195src/
196└── main.js # Actor entry point and orchestrator
197storage/ # Local-only storage for development (NOT synced to Cloud)
198├── datasets/ # Output items (JSON objects)
199├── key_value_stores/ # Files, config, INPUT
200└── request_queues/ # Pending crawl requests
201Dockerfile # Container image definition
202AGENTS.md # AI agent instructions (this file)
203
204## Local vs Cloud Storage
205
206When running locally with `apify run`, the Apify SDK emulates Cloud storage APIs using the local `storage/` directory. This local storage behaves differently from Cloud storage:
207
208- **Local storage is NOT persistent** - The `storage/` directory is meant for local development and testing only. Data stored there (datasets, key-value stores, request queues) exists only on your local disk.
209- **Local storage is NOT automatically pushed to Apify Console** - Running `apify run` does not upload any storage data to the Apify platform. The data stays local.
210- **Each local run may overwrite previous data** - The local `storage/` directory is reused between runs, but this is local-only behavior, not Cloud persistence.
211- **Cloud storage only works when running on Apify platform** - After deploying with `apify push` and running the Actor in the Cloud, storage calls (`Actor.push_data()`, `Actor.set_value()`, etc.) interact with real Apify Cloud storage, which is then visible in the Apify Console.
212- **To verify Actor output, deploy and run in Cloud** - Do not rely on local `storage/` contents as proof that data will appear in the Apify Console. Always test by deploying (`apify push`) and running the Actor on the platform.
213
214## Actor Input Schema
215
216The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform.
217
218### Structure
219
220```json
221{
222    "title": "<INPUT-SCHEMA-TITLE>",
223    "type": "object",
224    "schemaVersion": 1,
225    "properties": {
226        /* define input fields here */
227    },
228    "required": []
229}
230```
231
232### Example
233
234```json
235{
236    "title": "E-commerce Product Scraper Input",
237    "type": "object",
238    "schemaVersion": 1,
239    "properties": {
240        "startUrls": {
241            "title": "Start URLs",
242            "type": "array",
243            "description": "URLs to start scraping from (category pages or product pages)",
244            "editor": "requestListSources",
245            "default": [{ "url": "https://example.com/category" }],
246            "prefill": [{ "url": "https://example.com/category" }]
247        },
248        "followVariants": {
249            "title": "Follow Product Variants",
250            "type": "boolean",
251            "description": "Whether to scrape product variants (different colors, sizes)",
252            "default": true
253        },
254        "maxRequestsPerCrawl": {
255            "title": "Max Requests per Crawl",
256            "type": "integer",
257            "description": "Maximum number of pages to scrape (0 = unlimited)",
258            "default": 1000,
259            "minimum": 0
260        },
261        "proxyConfiguration": {
262            "title": "Proxy Configuration",
263            "type": "object",
264            "description": "Proxy settings for anti-bot protection",
265            "editor": "proxy",
266            "default": { "useApifyProxy": false }
267        },
268        "locale": {
269            "title": "Locale",
270            "type": "string",
271            "description": "Language/country code for localized content",
272            "default": "cs",
273            "enum": ["cs", "en", "de", "sk"],
274            "enumTitles": ["Czech", "English", "German", "Slovak"]
275        }
276    },
277    "required": ["startUrls"]
278}
279```
280
281## Actor Output Schema
282
283The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results.
284
285### Structure
286
287```json
288{
289    "actorOutputSchemaVersion": 1,
290    "title": "<OUTPUT-SCHEMA-TITLE>",
291    "properties": {
292        /* define your outputs here */
293    }
294}
295```
296
297### Example
298
299```json
300{
301    "actorOutputSchemaVersion": 1,
302    "title": "Output schema of the files scraper",
303    "properties": {
304        "files": {
305            "type": "string",
306            "title": "Files",
307            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
308        },
309        "dataset": {
310            "type": "string",
311            "title": "Dataset",
312            "template": "{{links.apiDefaultDatasetUrl}}/items"
313        }
314    }
315}
316```
317
318### Output Schema Template Variables
319
320- `links` (object) - Contains quick links to most commonly used URLs
321- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId`
322- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId`
323- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId`
324- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`
325- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId`
326- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`
327- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint
328- `run.defaultDatasetId` (string) - ID of the default dataset
329- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store
330
331## Dataset Schema Specification
332
333The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
334
335### Example
336
337Consider an example Actor that calls `Actor.pushData()` to store data into dataset:
338
339```python
340# Dataset push example (Python)
341import asyncio
342from datetime import datetime
343from apify import Actor
344
345async def main():
346    await Actor.init()
347
348    # Actor code
349    await Actor.push_data({
350        'numericField': 10,
351        'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
352        'linkUrl': 'https://google.com',
353        'textField': 'Google',
354        'booleanField': True,
355        'dateField': datetime.now().isoformat(),
356        'arrayField': ['#hello', '#world'],
357        'objectField': {},
358    })
359
360    # Exit successfully
361    await Actor.exit()
362
363if __name__ == '__main__':
364    asyncio.run(main())
365```
366
367To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`:
368
369```json
370{
371    "actorSpecification": 1,
372    "name": "book-library-scraper",
373    "title": "Book Library Scraper",
374    "version": "1.0.0",
375    "storages": {
376        "dataset": "./dataset_schema.json"
377    }
378}
379```
380
381Then create the dataset schema in `.actor/dataset_schema.json`:
382
383```json
384{
385    "actorSpecification": 1,
386    "fields": {},
387    "views": {
388        "overview": {
389            "title": "Overview",
390            "transformation": {
391                "fields": [
392                    "pictureUrl",
393                    "linkUrl",
394                    "textField",
395                    "booleanField",
396                    "arrayField",
397                    "objectField",
398                    "dateField",
399                    "numericField"
400                ]
401            },
402            "display": {
403                "component": "table",
404                "properties": {
405                    "pictureUrl": {
406                        "label": "Image",
407                        "format": "image"
408                    },
409                    "linkUrl": {
410                        "label": "Link",
411                        "format": "link"
412                    },
413                    "textField": {
414                        "label": "Text",
415                        "format": "text"
416                    },
417                    "booleanField": {
418                        "label": "Boolean",
419                        "format": "boolean"
420                    },
421                    "arrayField": {
422                        "label": "Array",
423                        "format": "array"
424                    },
425                    "objectField": {
426                        "label": "Object",
427                        "format": "object"
428                    },
429                    "dateField": {
430                        "label": "Date",
431                        "format": "date"
432                    },
433                    "numericField": {
434                        "label": "Number",
435                        "format": "number"
436                    }
437                }
438            }
439        }
440    }
441}
442```
443
444### Structure
445
446```json
447{
448    "actorSpecification": 1,
449    "fields": {},
450    "views": {
451        "<VIEW_NAME>": {
452            "title": "string (required)",
453            "description": "string (optional)",
454            "transformation": {
455                "fields": ["string (required)"],
456                "unwind": ["string (optional)"],
457                "flatten": ["string (optional)"],
458                "omit": ["string (optional)"],
459                "limit": "integer (optional)",
460                "desc": "boolean (optional)"
461            },
462            "display": {
463                "component": "table (required)",
464                "properties": {
465                    "<FIELD_NAME>": {
466                        "label": "string (optional)",
467                        "format": "text|number|date|link|boolean|image|array|object (optional)"
468                    }
469                }
470            }
471        }
472    }
473}
474```
475
476**Dataset Schema Properties:**
477
478- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)
479- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)
480- `views` (DatasetView object, required) - Object with API and UI views description
481
482**DatasetView Properties:**
483
484- `title` (string, required) - Visible in UI Output tab and API
485- `description` (string, optional) - Only available in API response
486- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API
487- `display` (ViewDisplay object, required) - Output tab UI visualization definition
488
489**ViewTransformation Properties:**
490
491- `fields` (string[], required) - Fields to present in output (order matches column order)
492- `unwind` (string[], optional) - Deconstructs nested children into parent object
493- `flatten` (string[], optional) - Transforms nested object into flat structure
494- `omit` (string[], optional) - Removes specified fields from output
495- `limit` (integer, optional) - Maximum number of results (default: all)
496- `desc` (boolean, optional) - Sort order (true = newest first)
497
498**ViewDisplay Properties:**
499
500- `component` (string, required) - Only `table` is available
501- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values
502
503**ViewDisplayProperty Properties:**
504
505- `label` (string, optional) - Table column header
506- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object`
507
508## Key-Value Store Schema Specification
509
510The key-value store schema organizes keys into logical groups called collections for easier data management.
511
512### Example
513
514Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store:
515
516```python
517# Key-Value Store set example (Python)
518import asyncio
519from apify import Actor
520
521async def main():
522    await Actor.init()
523
524    # Actor code
525    await Actor.set_value('document-1', 'my text data', content_type='text/plain')
526
527    image_id = '123'          # example placeholder
528    image_buffer = b'...'     # bytes buffer with image data
529    await Actor.set_value(f'image-{image_id}', image_buffer, content_type='image/jpeg')
530
531    # Exit successfully
532    await Actor.exit()
533
534if __name__ == '__main__':
535    asyncio.run(main())
536```
537
538To configure the key-value store schema, reference a schema file in `.actor/actor.json`:
539
540```json
541{
542    "actorSpecification": 1,
543    "name": "data-collector",
544    "title": "Data Collector",
545    "version": "1.0.0",
546    "storages": {
547        "keyValueStore": "./key_value_store_schema.json"
548    }
549}
550```
551
552Then create the key-value store schema in `.actor/key_value_store_schema.json`:
553
554```json
555{
556    "actorKeyValueStoreSchemaVersion": 1,
557    "title": "Key-Value Store Schema",
558    "collections": {
559        "documents": {
560            "title": "Documents",
561            "description": "Text documents stored by the Actor",
562            "keyPrefix": "document-"
563        },
564        "images": {
565            "title": "Images",
566            "description": "Images stored by the Actor",
567            "keyPrefix": "image-",
568            "contentTypes": ["image/jpeg"]
569        }
570    }
571}
572```
573
574### Structure
575
576```json
577{
578    "actorKeyValueStoreSchemaVersion": 1,
579    "title": "string (required)",
580    "description": "string (optional)",
581    "collections": {
582        "<COLLECTION_NAME>": {
583            "title": "string (required)",
584            "description": "string (optional)",
585            "key": "string (conditional - use key OR keyPrefix)",
586            "keyPrefix": "string (conditional - use key OR keyPrefix)",
587            "contentTypes": ["string (optional)"],
588            "jsonSchema": "object (optional)"
589        }
590    }
591}
592```
593
594**Key-Value Store Schema Properties:**
595
596- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1)
597- `title` (string, required) - Title of the schema
598- `description` (string, optional) - Description of the schema
599- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object
600
601**Collection Properties:**
602
603- `title` (string, required) - Collection title shown in UI tabs
604- `description` (string, optional) - Description appearing in UI tooltips
605- `key` (string, conditional\*) - Single specific key for this collection
606- `keyPrefix` (string, conditional\*) - Prefix for keys included in this collection
607- `contentTypes` (string[], optional) - Allowed content types for validation
608- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation
609
610\*Either `key` or `keyPrefix` must be specified for each collection, but not both.
611
612## Actor README
613
614**Always generate a README.md file as part of Actor development.** The README is the Actor's public landing page on Apify Store - it serves as SEO, first impression, documentation, and support page combined.
615
616### Required: Generate README automatically
617
618When building an Actor, always create a `README.md` in the project root. Do not wait for the user to ask for it. The README is a critical part of a complete Actor.
619
620### README structure
621
622Write in Markdown. Use H2 (`##`) for main sections (these become the table of contents) and H3 (`###`) for subsections. Do not use H1 - the Actor name is automatically the H1. Aim for at least 300 words.
623
624Include these sections in order:
625
6261. **What does [Actor name] do?** - 2-3 sentences explaining what it does, what data it extracts, and how to try it. Link to the target website. Mention Apify platform advantages (API access, scheduling, integrations, proxy rotation, monitoring).
6272. **Why use [Actor name]?** - Business use cases and benefits.
6283. **How to use [Actor name]** - Numbered step-by-step tutorial. Keep it simple and reassuring.
6294. **Input** - Describe input fields. Reference the Input tab. Optionally include a screenshot or JSON example of the input schema.
6305. **Output** - Show a simplified JSON output example. Mention "You can download the dataset in various formats such as JSON, HTML, CSV, or Excel."
6316. **Data table** - If the Actor extracts data, include a table of the main data fields it outputs.
6327. **Pricing / Cost estimation** - Set expectations on cost. Mention free tier limits if applicable. Frame as "How much does it cost to scrape [target site]?"
6338. **Tips or Advanced options** - How to optimize runs, limit compute units, improve speed or accuracy.
6349. **FAQ, disclaimers, and support** - Legality disclaimer for scrapers, known limitations, link to Issues tab for feedback, mention custom solution availability.
635
636### README best practices
637
638- Write SEO-friendly headings with relevant keywords (e.g., "How to scrape [site] data" not just "Tutorial")
639- Bold the most important words in the intro
640- The first 25% of the README matters most - front-load the value proposition
641- Match the tone to the target audience: simple language for no-code users, technical details for developers
642- Include a JSON output example showing 1-2 representative items
643- Reference these top Actors for README best practices: https://apify.com/apify/instagram-scraper and https://apify.com/compass/crawler-google-places
644- Embed YouTube video URLs on their own line (Apify Console auto-renders them)
645- Use HTML for image sizing if needed; CSS is not supported
646
647## MCP Tools
648
649### Apify MCP
650
651If the Apify MCP server is configured, use these tools for documentation:
652
653- `search-apify-docs` - Search documentation
654- `fetch-apify-docs` - Get full doc pages
655
656Otherwise, reference: `@https://mcp.apify.com/`
657
658### Playwright MCP (debugging)
659
660The Playwright MCP server is a useful tool for debugging Actors that interact with the web - it lets the agent drive a real browser to inspect pages, capture selectors, and reproduce issues.
661
662Install with the Claude Code CLI:
663
664```bash
665claude mcp add playwright npx @playwright/mcp@latest
666```
667
668Or add it manually to your MCP config:
669
670```json
671{
672    "mcpServers": {
673        "playwright": {
674            "command": "npx",
675            "args": ["@playwright/mcp@latest"]
676        }
677    }
678}
679```
680
681## Resources
682
683- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference
684- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs
685- [crawlee.dev](https://crawlee.dev) - Crawlee documentation
686- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

.actor/input_schema.json

{
  "title": "YouTube Audio Downloader Input",
  "description": "This is actor input schema for youtube audio downloader",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "startUrls": {
      "title": "YouTube Video URLs",
      "type": "array",
      "description": "One or more YouTube video URLs to download audio from",
      "editor": "stringList",
      "prefill": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
      "minItems": 1,
      "uniqueItems": true
    },
    "audioFormat": {
      "title": "Audio Format",
      "type": "string",
      "description": "Output audio format. 'best' keeps the original container without re-encoding.",
      "default": "mp3",
      "prefill": "mp3",
      "editor": "select",
      "enum": [
        "mp3",
        "m4a",
        "opus",
        "best"
      ],
      "enumTitles": [
        "MP3 (192 kbps)",
        "M4A (AAC)",
        "Opus (smallest, good quality)",
        "Best (original format, no conversion)"
      ]
    }
  },
  "required": [
    "startUrls"
  ]
}

.actor/actor.json

{
	"$schema": "https://apify.com/schemas/v1/actor.ide.json",
	"actorSpecification": 1,
	"name": "youtube-audio-downloader",
	"title": "YouTube Audio Downloader",
	"description": "Downloads audio from YouTube videos and stores them as files in the key-value store with metadata in the dataset.",
	"version": "0.0",
	"buildTag": "latest",
	"meta": {
		"templateId": "python-empty",
		"generatedBy": "Claude Code with Claude Sonnet 4.5"
	},
	"dockerfile": "../Dockerfile"
}

my_actor/init.py

my_actor/py.typed

my_actor/main.py

1"""YouTube Audio Downloader
2
3Apify Actor that downloads audio from YouTube videos and stores the files
4in the key-value store, with metadata pushed to the dataset.
5"""
6
7import asyncio
8import os
9import random
10import tempfile
11from pathlib import Path
12from uuid import uuid4
13
14from apify import Actor, ProxyConfiguration
15from crawlee.events import Event
16from yt_dlp import YoutubeDL
17
18# Maximum retry attempts on YouTube bot detection
19_MAX_RETRIES = 3
20
21# Sleep range (seconds) between processing URLs to avoid rate-limiting
22_SLEEP_BETWEEN_URLS = (5, 12)
23
24
25def _audio_opts(audio_format: str) -> tuple[str, list | None]:
26    """Return (format_spec, postprocessors) for the requested audio format."""
27    if audio_format == 'mp3':
28        return 'bestaudio/best', [
29            {'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'},
30        ]
31    if audio_format == 'm4a':
32        return 'bestaudio/best', [
33            {'key': 'FFmpegExtractAudio', 'preferredcodec': 'm4a'},
34        ]
35    if audio_format == 'opus':
36        # Keep opus in its native webm container – no re-encode needed.
37        return 'bestaudio[ext=webm]/bestaudio/best', None
38    # 'best' – keep whatever the best audio format is, no conversion.
39    return 'bestaudio/best', None
40
41
42def _mime_type(ext: str) -> str:
43    return {
44        'mp3': 'audio/mpeg',
45        'm4a': 'audio/mp4',
46        'webm': 'audio/webm',
47        'opus': 'audio/ogg',
48        'ogg': 'audio/ogg',
49        'aac': 'audio/aac',
50        'wav': 'audio/wav',
51        'flac': 'audio/flac',
52    }.get(ext, 'application/octet-stream')
53
54
55async def _get_proxy_url(
56    proxy_cfg: ProxyConfiguration | None,
57    session_id: str,
58) -> str | None:
59    """Get a proxy URL for the given session, or None if no proxy configured."""
60    if proxy_cfg is None:
61        return None
62    try:
63        return await proxy_cfg.new_url(session_id=session_id)
64    except Exception:
65        Actor.log.warning('Failed to get proxy URL, continuing without proxy')
66        return None
67
68
69def _is_bot_error(error: Exception) -> bool:
70    """Check if the yt-dlp error is a YouTube bot detection."""
71    msg = str(error)
72    return 'Sign in to confirm you\'re not a bot' in msg
73
74
75async def _download_with_retry(
76    url: str,
77    format_spec: str,
78    postprocessors: list | None,
79    tmpdir: str,
80    proxy_cfg: ProxyConfiguration | None = None,
81) -> dict:
82    """Download audio from *url*, retrying with a new proxy on bot detection.
83
84    Returns the yt-dlp info dict on success.
85    Raises the last error after exhausting retries.
86    """
87    last_error: Exception | None = None
88
89    for attempt in range(1, _MAX_RETRIES + 1):
90        # Build yt-dlp options
91        ydl_opts: dict = {
92            'format': format_spec,
93            'outtmpl': str(Path(tmpdir) / '%(title)s.%(ext)s'),
94            'quiet': True,
95            'no_warnings': True,
96            # Don't let yt-dlp retry internally — we control retries with fresh IPs
97            'retries': 0,
98            'fragment_retries': 0,
99            'extractor_retries': 0,
100            # Timeout for socket reads during download
101            'socket_timeout': 30,
102            # Use Safari player client — YouTube is less aggressive with Safari
103            'extractor_args': {'youtube': ['player_client=web_safari']},
104        }
105        if postprocessors:
106            ydl_opts['postprocessors'] = postprocessors
107
108        # Each attempt (including the first) gets its own proxy session
109        session_id = f'ytaudio{uuid4().hex[:8]}'
110        proxy_url = await _get_proxy_url(proxy_cfg, session_id)
111
112        if proxy_url:
113            ydl_opts['proxy'] = proxy_url
114            Actor.log.info(f'  Proxy session: {session_id}')
115        else:
116            Actor.log.info('  Proxy not configured')
117        if attempt > 1:
118            Actor.log.info(f'  Attempt {attempt}/{_MAX_RETRIES}')
119
120        try:
121            # Wrap in asyncio timeout so yt-dlp can't hang indefinitely
122            with YoutubeDL(ydl_opts) as ydl:
123                info = await asyncio.wait_for(
124                    asyncio.to_thread(ydl.extract_info, url, download=True),
125                    timeout=120,
126                )
127            return info  # success
128        except Exception as exc:
129            last_error = exc
130            if attempt >= _MAX_RETRIES:
131                # Out of retries — give up
132                raise
133
134            error_str = str(exc)
135            if _is_bot_error(exc):
136                Actor.log.warning(
137                    f'  Bot detected on attempt {attempt}/{_MAX_RETRIES}, '
138                    f'rotating proxy and retrying…'
139                )
140            elif isinstance(exc, asyncio.TimeoutError):
141                Actor.log.warning(f'  Timeout on attempt {attempt}/{_MAX_RETRIES}, retrying…')
142            else:
143                Actor.log.warning(
144                    f'  Error on attempt {attempt}/{_MAX_RETRIES}: {error_str[:120]}, '
145                    f'rotating proxy and retrying…'
146                )
147            continue
148
149    # Shouldn't reach here, but guard against empty last_error
150    raise last_error or RuntimeError(f'Failed to download {url}')
151
152
153async def main() -> None:
154    """Entry point for the Apify Actor."""
155    async with Actor:
156        # ── Platform KV store URL (for dataset links) ────────────────────
157        store_id = os.environ.get('APIFY_DEFAULT_KEY_VALUE_STORE_ID')
158        kv_store_base = f'https://api.apify.com/v2/key-value-stores/{store_id}/records' if store_id else None
159
160        # ── Input ──────────────────────────────────────────────────────────
161        inp = await Actor.get_input() or {}
162        start_urls = inp.get('startUrls', [])
163        audio_format = inp.get('audioFormat', 'mp3')
164
165        # Validate input
166        start_urls = [u.strip() for u in start_urls if isinstance(u, str) and u.strip()]
167        if not start_urls:
168            raise ValueError('startUrls must contain at least one valid URL')
169
170        # ── Abort handler (graceful shutdown) ──────────────────────────────
171        aborted = False
172
173        async def on_aborting() -> None:
174            nonlocal aborted
175            aborted = True
176            Actor.log.info('Abort signal received – finishing current video then exiting')
177            await asyncio.sleep(1)
178            await Actor.exit()
179
180        Actor.on(Event.ABORTING, on_aborting)
181
182        # ── Resolve format options once ────────────────────────────────────
183        format_spec, postprocessors = _audio_opts(audio_format)
184        Actor.log.info(f'Audio format: {audio_format}')
185
186        # Create proxy configuration (RESIDENTIAL proxies)
187        proxy_cfg = await Actor.create_proxy_configuration(
188            groups=['RESIDENTIAL'],
189        )
190        if proxy_cfg:
191            Actor.log.info('Apify proxy configured – using RESIDENTIAL proxies with retry on bot detection')
192        else:
193            Actor.log.warning('No proxy available – will retry without proxy (may not help against bot detection)')
194
195        # ── Process each URL ───────────────────────────────────────────────
196        for idx, url in enumerate(start_urls):
197            if aborted:
198                break
199
200            Actor.log.info(f'[{idx + 1}/{len(start_urls)}] Processing: {url}')
201
202            try:
203                with tempfile.TemporaryDirectory() as tmpdir:
204                    info = await _download_with_retry(
205                        url=url,
206                        format_spec=format_spec,
207                        postprocessors=postprocessors,
208                        tmpdir=tmpdir,
209                        proxy_cfg=proxy_cfg,
210                    )
211
212                    video_id = info['id']
213                    video_title = info.get('title', 'Unknown')
214                    duration = info.get('duration', 0)
215                    Actor.log.info(f'  Video: "{video_title}" ({video_id}), {duration}s')
216
217                    # Locate the produced file
218                    files = list(Path(tmpdir).iterdir())
219                    if not files:
220                        raise RuntimeError('No audio file was produced by yt-dlp')
221
222                    audio_path = files[0]
223                    file_ext = audio_path.suffix.lstrip('.')
224                    file_size = audio_path.stat().st_size
225                    Actor.log.info(f'  Downloaded: {audio_path.name} ({file_size} bytes)')
226
227                    if aborted:
228                        break
229
230                    # Store the audio binary in the key-value store
231                    kv_key = f'audio-{video_id}'
232                    content_type = _mime_type(file_ext)
233                    with open(audio_path, 'rb') as f:
234                        await Actor.set_value(kv_key, f.read(), content_type=content_type)
235
236                # Push metadata to the dataset
237                item = {
238                    'video_id': video_id,
239                    'video_url': url,
240                    'video_title': video_title,
241                    'duration': duration,
242                    'audio_format': file_ext,
243                    'file_size_bytes': file_size,
244                    'kv_store_key': kv_key,
245                    'status': 'downloaded',
246                }
247                if kv_store_base:
248                    item['audio_url'] = f'{kv_store_base}/{kv_key}'
249                await Actor.push_data(item)
250                Actor.log.info(f'  Done: {video_title}')
251
252            except Exception as e:
253                Actor.log.exception(f'  Failed: {e}')
254                await Actor.push_data({
255                    'video_url': url,
256                    'status': 'error',
257                    'error': str(e),
258                })
259
260            # Random delay between URLs to avoid rate-limiting patterns
261            if not aborted:
262                await _sleep_between_urls(idx, len(start_urls))
263
264        Actor.log.info('Finished processing all URLs')
265
266
267async def _sleep_between_urls(idx: int, total: int) -> None:
268    """Sleep a random interval between URLs to avoid rate-limiting."""
269    if idx < total - 1:  # don't sleep after the last URL
270        delay = random.uniform(*_SLEEP_BETWEEN_URLS)
271        Actor.log.info(f'Sleeping {delay:.1f}s before next URL to avoid rate-limiting…')
272        await asyncio.sleep(delay)
273
274
275if __name__ == '__main__':
276    asyncio.run(main())

my_actor/main.py

1import asyncio
2
3from .main import main
4
5if __name__ == '__main__':
6    asyncio.run(main())

Youtube Audio Downloader Lite

alpha-scraper/youtube-audio-downloader-lite

Fast and reliable YouTube audio downloader. Extract high-quality audio links and full video metadata from YouTube videos and Shorts. Supports bulk URLs, delivers clean structured output, and is perfect for automation, data collection, and content workflows. 🚀

Alpha Scraper

Youtube Mp3 Audio Downloader

scrapers-hub/youtube-mp3-audio-downloader

YouTube MP3 audio downloader to convert and download audio from YouTube videos 🎧📥 Perfect for offline listening, content reuse, and audio extraction. Fast, high-quality, and easy to use.

Scrapers Hub

Youtube Video Audio Downloader

apple_yang/youtube-video-audio-downloader

Youtube Video Downloader API for downloading videos and extracting audio from public content. Get HD video, audio, and metadata for AI, automation, and data workflows. Fast, reliable, and built for developers.

APISmith

TikTok Audio Downloader 🎵

alpha-scraper/tiktok-audio-downloader

Super fast & No proxy needed! 🎵 Extract high-quality, playable audio URLs from TikTok video links. Supports multiple videos, delivers clean metadata, and saves audio to dataset & key-value store—ideal for automation, research, and content workflows.

Alpha Scraper

YouTube Video Downloader

apilabs/youtube-video-downloader

Download YouTube videos or extract audio without any limits

ApiLabs

118

2.0

Youtube Video Downloader Pro

barksdale04/youtube-video-downloader-pro

Downloads YouTube videos programmatically via API. Accepts YouTube URLs, extracts video metadata, and downloads the video file to cloud storage (Apify Key-Value Store). Returns structured data with download links.

Larry Barksdale

TikTok Video Downloader

dead00/tiktok-video-downloader

A simple and efficient actor that downloads TikTok videos in HD quality and stores them in organized Apify key-value stores for easy access and management.

Dead

1.0

Youtube Video, Audio and Transcript Downloader Actor

philippe.trounev/youtube-video-audio-and-transcript-downloader-actor

Easily download YouTube videos in MP4/Webm, audio (separate), subtitles and transcript.

Docsie Inc.

Instagram Audio Downloader

alpha-scraper/instagram-audio-downloader

Instagram Audio Downloader 🎵 Extract playable audio URLs from Instagram videos and reels. Supports multiple links, returns clean metadata, and delivers direct audio access—ideal for automation, research, and content workflows.

Alpha Scraper

YouTube Mp3/Audio Downloader

codenest/youtube-mp3-audio-downloader

Easily and fast extract high-quality MP3/audio from YouTube videos & Shorts! 🎵 Get multiple formats, bitrates, and full metadata. Perfect for podcasters 🎙️, musicians 🎶, educators 📚, and content creators. Batch download audio with crystal-clear quality! 🚀YouTube Mp3/Audio Downloader.

CodeNest

134

2.5

YouTube Audio Downloader Free

requirements.txt

Dockerfile

.dockerignore

.gitignore

AGENTS.md

.actor/input_schema.json

.actor/actor.json

my_actor/__init__.py

my_actor/py.typed

my_actor/main.py

my_actor/__main__.py

You might also like

Youtube Audio Downloader Lite

Youtube Mp3 Audio Downloader

Youtube Video Audio Downloader

TikTok Audio Downloader 🎵

YouTube Video Downloader

Youtube Video Downloader Pro

TikTok Video Downloader

Youtube Video, Audio and Transcript Downloader Actor

Instagram Audio Downloader

YouTube Mp3/Audio Downloader

requirements.txt

Dockerfile

.dockerignore

.gitignore

AGENTS.md

.actor/input_schema.json

.actor/actor.json

my_actor/__init__.py

my_actor/py.typed

my_actor/main.py

my_actor/__main__.py

my_actor/init.py

my_actor/main.py

my_actor/init.py

my_actor/main.py