# PDF to MP3 - Convert PDF, EPUB, DOCX & Text to Audiobook (`marielise.dev/pdf-to-mp3`) Actor

Convert PDF, EPUB, DOCX, Markdown, HTML, TXT, and RTF to MP3 audiobooks. Free Microsoft Edge TTS (no API key) with OCR for scanned PDFs, 70+ languages, and optional OpenAI or ElevenLabs voices. ~$0.04/min.

- **URL**: https://apify.com/marielise.dev/pdf-to-mp3.md
- **Developed by:** [Marielise](https://apify.com/marielise.dev) (community)
- **Categories:** AI, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $40.00 / 1,000 audio minute generateds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Text to Audio Narrator

![TTS: Edge free + OpenAI + ElevenLabs](https://img.shields.io/badge/TTS-Edge%20free%20%2B%20OpenAI%20%2B%20ElevenLabs-2563eb)
![Formats: PDF · DOCX · EPUB · MD · TXT · HTML · RTF](https://img.shields.io/badge/formats-PDF%20·%20DOCX%20·%20EPUB%20·%20MD%20·%20TXT%20·%20HTML%20·%20RTF-444)
![OCR: scanned PDFs](https://img.shields.io/badge/OCR-scanned%20PDFs-16a34a)
![Output: MP3 audiobook](https://img.shields.io/badge/output-MP3%20audiobook-orange)

Turn any document — **PDF, DOCX, EPUB, Markdown, plain text, HTML, or RTF** — into an **MP3 audiobook** in one run. Paste a URL, upload a file, drop in raw text, or send base64 bytes. Pick a voice. Click run. Get a downloadable MP3. No prompts to chain, no manual chunking, no ffmpeg gymnastics. Just clean audio at the end.

Even **scanned / image-only PDFs** work — when a page has no text layer, the actor automatically OCRs it (Tesseract) and narrates the recovered text.

Use the **free Edge TTS** voices by default (400+ neural voices, 70+ languages, no API key) — or bring your own OpenAI / ElevenLabs key for premium voices and steerable narration.

**Perfect for:** Anyone with a backlog of Project Gutenberg / Calibre EPUBs and zero time to read them, researchers listening to arXiv PDFs on a commute, business users turning Word DOCX reports into audio briefings, devs converting READMEs and blog drafts into audio for proofing, knowledge workers narrating long reports, accessibility-first publishers, podcast producers prototyping audiobook conversions, students reviewing textbook chapters on the go, journalists drafting voiceovers, and Substack writers exporting their newsletter to audio.

### Features

<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Seven input formats</strong></td>
<td>PDF (text-layer), DOCX (Word, via mammoth), EPUB (ebook, spine-ordered chapters), Markdown (syntax stripped), plain text, HTML (tags stripped), RTF (control codes stripped). Auto-detected from magic bytes / mimetype / extension</td>
</tr>
<tr>
<td><strong>OCR for scanned PDFs</strong></td>
<td>Pages with no selectable text layer are auto-rendered and OCR'd with Tesseract (7 languages: EN, ES, FR, DE, IT, PT, NL) so scanned books and photographed pages narrate too. Only the pages that actually need OCR are processed and billed</td>
</tr>
<tr>
<td><strong>Encrypted PDF support</strong></td>
<td>Provide the password via <code>pdfPassword</code> to decrypt and narrate password-protected PDFs</td>
</tr>
<tr>
<td><strong>SSRF-guarded URL fetch + proxy</strong></td>
<td>Document URLs are validated against private / internal address ranges before fetching. Optional Apify proxy for hosts that block datacenter IPs</td>
</tr>
<tr>
<td><strong>Four ways to provide it</strong></td>
<td>Public URL, file upload, base64 paste, or raw text paste (great for blog drafts &amp; ChatGPT replies)</td>
</tr>
<tr>
<td><strong>Free Edge TTS by default</strong></td>
<td>Microsoft Edge neural voices, no API key, no per-character cost — works out of the box</td>
</tr>
<tr>
<td><strong>One document in, one (or many) MP3s out</strong></td>
<td>Short inputs produce a single MP3. Long ones are auto-split into chapter-sized parts, plus a shareable INDEX.html page with inline players + download links</td>
</tr>
<tr>
<td><strong>Multiple TTS engines</strong></td>
<td>Edge TTS (FREE, default), OpenAI gpt-4o-mini-tts (steerable), OpenAI tts-1 / tts-1-hd, ElevenLabs Flash v2.5 / Turbo v2.5</td>
</tr>
<tr>
<td><strong>BYOK for premium models</strong></td>
<td>OpenAI and ElevenLabs models require your own API key (we never markup the provider price)</td>
</tr>
<tr>
<td><strong>Steerable narration</strong></td>
<td>With gpt-4o-mini-tts you can prompt the voice ("calm audiobook narrator", "energetic podcast host", "slow and deliberate") without retraining</td>
</tr>
<tr>
<td><strong>Auto language detection</strong></td>
<td>Edge TTS auto-picks a matching voice based on the text language. Or force a specific Azure voice name (e.g. <code>en-US-AndrewNeural</code>)</td>
</tr>
<tr>
<td><strong>Markdown-aware</strong></td>
<td>Markdown syntax is stripped (no "asterisk asterisk bold asterisk asterisk"). Headings, lists, links, code fences, tables, and inline formatting all read as natural prose</td>
</tr>
<tr>
<td><strong>Page / section range support</strong></td>
<td>Narrate the whole document, a single chapter, or a custom slice. Works on PDFs (real pages) and TXT / MD / HTML (~3000-char pseudo-pages). Range syntax: <code>1-10</code>, <code>1,3,5</code>, <code>1-3,7-9</code></td>
</tr>
<tr>
<td><strong>Smart chunking</strong></td>
<td>Text is split on paragraph and sentence boundaries before TTS. Hard cuts respect word boundaries so chunks don't start mid-word</td>
</tr>
<tr>
<td><strong>Pre-flight cost preview + hard cap</strong></td>
<td>Every run prints an estimated ceiling cost before TTS starts and writes a <code>PREVIEW</code> key. Set <code>maxCostUsd</code> to abort before TTS if the estimate is too high — and it also clamps the actual audio-minute charge so the final bill never exceeds your cap</td>
</tr>
<tr>
<td><strong>Provider-adaptive concurrency</strong></td>
<td>Auto-caps parallel TTS calls per provider (ElevenLabs 2, Edge 8, OpenAI 10) so free tiers don't hit 429 storms</td>
</tr>
<tr>
<td><strong>Resume failed runs</strong></td>
<td>Already-synthesized chunks are cached. If a long run times out or fails partway, the next run picks up where it left off — no re-paying for TTS already done</td>
</tr>
<tr>
<td><strong>Skip-failed-chunks mode</strong></td>
<td>If a single chunk keeps failing after retries, skip it and keep narrating the rest of the document (configurable). Auth / quota errors always abort cleanly</td>
</tr>
<tr>
<td><strong>ffmpeg concat + ID3 tags</strong></td>
<td>Chunks are stitched into valid MP3 containers with correct duration metadata and ID3 tags (title, album, track, genre=Audiobook) so players show proper info. No "audio glitch at minute 4" bugs</td>
</tr>
<tr>
<td><strong>Transparent pricing</strong></td>
<td>Pay-per-event: per page narrated + per audio minute. No surcharges, no markups on BYOK providers</td>
</tr>
</tbody>
</table>

### How to Use

#### Step 1: Provide a document

Any **one** of:
- **Document URL** — any publicly reachable URL ending in `.pdf`, `.docx`, `.epub`, `.md`, `.markdown`, `.txt`, `.html`, `.htm`, or `.rtf`
- **Upload file** — drag and drop a PDF / DOCX / EPUB / MD / TXT / HTML / RTF (uploaded to the run's key-value store)
- **Base64 content** — paste raw base64 bytes; format auto-detected from magic bytes / mimetype
- **Raw text paste** — paste prose, Markdown, or HTML directly into the `text` field (perfect for blog drafts, ChatGPT replies, READMEs)

> Scanned / image-only PDF pages (no text layer) are automatically OCR'd when `enableOcr` is on (default). For encrypted / password-protected PDFs, supply `pdfPassword`.

#### Step 2: Pick a voice and model

| Model | Best for | API key required |
|-------|----------|------------------|
| `edge-tts` (default) | Free, long books, multi-language | **No key** — free |
| `openai-gpt-4o-mini-tts` | Steerable narration with style instructions | OpenAI key (BYOK) |
| `openai-tts-1` | Bulk runs, low cost, supports `speed` | OpenAI key (BYOK) |
| `openai-tts-1-hd` | High-quality OpenAI audio | OpenAI key (BYOK) |
| `elevenlabs-flash-v2_5` | Fast, real-time-quality voices | ElevenLabs key (BYOK) |
| `elevenlabs-turbo-v2_5` | Highest-quality ElevenLabs voices | ElevenLabs key (BYOK) |

#### Step 3: (Optional) Set a range, voice, speed, or instructions

- `voice` — leave blank for auto. For Edge TTS use an Azure ShortName like `en-US-AndrewNeural`, `es-ES-ElviraNeural`. For OpenAI: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`, `coral`, `sage`. For ElevenLabs: a voice ID.
- `language` — `auto` (recommended) or a specific ISO code (Edge TTS only — OpenAI voices are multilingual).
- `pageRange` — e.g. `1-10` or `1,3,5` or `1-3,7-9`. Empty = whole document. For non-PDF formats, "pages" are ~3000-char sections.
- `speed` — 0.25 to 4.0. Only applies to `openai-tts-1` / `openai-tts-1-hd`.
- `instructions` — free-form style hint for `openai-gpt-4o-mini-tts`, e.g. *"Calm, slow audiobook narrator with a neutral accent."*
- `enableOcr` — on by default. Auto-OCRs scanned PDF pages that have no text layer. Turn off to fail fast on scans instead.
- `pdfPassword` — password for encrypted PDFs.
- `proxyConfiguration` — optional Apify proxy, used only for the Document URL fetch.

#### Step 4: Run and download

The Actor:
1. Downloads / decodes / reads the input
2. Detects the format (PDF magic bytes + content-type + extension + content sniff)
3. Extracts and normalises the text (page-range aware, Markdown / HTML aware)
4. Splits into TTS-sized chunks at sentence boundaries (word-boundary safe hard cuts)
5. Synthesises each chunk with the chosen provider in parallel
6. Folds chunks into chapter-sized parts as they complete (ffmpeg concat)
7. Uploads each part to the key-value store + writes a shareable INDEX.html

You'll find the result in:
- The **dataset** — one row with metadata (`indexUrl`, `audioUrl`, `partsCount`, `parts[]`, `durationSeconds`, `chars`, `pagesProcessed`, `cost`, `status`)
- The **key-value store** — each MP3 part, the `INDEX.html` page, the `PREVIEW` estimate, and the `OUTPUT` record

### Input Reference

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `documentUrl` | string | one of | Public URL (PDF / TXT / MD / HTML) |
| `documentFile` | file | one of | Upload a PDF / TXT / MD / HTML from your device |
| `documentBase64` | string | one of | Base64-encoded document bytes |
| `text` | string | one of | Paste raw prose, Markdown, or HTML directly |
| `model` | enum | no | TTS model (default `edge-tts`, free) |
| `voice` | string | no | Edge ShortName, OpenAI voice, or ElevenLabs voice ID |
| `language` | enum | no | Auto-detect (default) or specific ISO code |
| `speed` | number | no | 0.25 to 4.0, default 1.0. tts-1 / tts-1-hd only |
| `instructions` | string | no | Free-form style for gpt-4o-mini-tts |
| `pageRange` | string | no | e.g. `1-10` or `1,3,5`. Empty = full document |
| `chunkSize` | integer | no | 500 to 4096, default 4000 (auto-clamped to 2500 for ElevenLabs) |
| `concurrency` | integer | no | 1 to 20 parallel TTS requests, default 5 (auto-clamped per provider) |
| `resume` | boolean | no | Skip already-synthesized chunks from previous runs (default true) |
| `skipFailedChunks` | boolean | no | Skip individual chunk failures instead of aborting (default true) |
| `maxPartMb` | integer | no | Max size per MP3 part, default 40MB |
| `maxCostUsd` | number | no | Hard cap (min 0.02). Aborts before TTS if the estimate exceeds it, and clamps the audio-minute charge so the final bill never exceeds the cap |
| `enableOcr` | boolean | no | OCR scanned / image-only PDF pages (default true) |
| `pdfPassword` | secret | no | Password for encrypted PDFs |
| `proxyConfiguration` | object | no | Apify proxy for the Document URL fetch |
| `openaiApiKey` | secret | **required for openai-*** | Your OpenAI API key (BYOK) |
| `elevenlabsApiKey` | secret | **required for elevenlabs-*** | Your ElevenLabs API key (BYOK) |
| `debug` | boolean | no | Verbose logs |

### Output Example

```json
{
    "indexUrl": "https://api.apify.com/v2/key-value-stores/.../records/INDEX",
    "audioUrl": "https://api.apify.com/v2/key-value-stores/.../records/narration-abc123-part001.mp3",
    "audioKvKey": "narration-abc123-part001.mp3",
    "durationSeconds": 1843.2,
    "partsCount": 2,
    "parts": [
        { "part": 1, "key": "narration-abc123-part001.mp3", "url": "https://...", "durationSeconds": 1200, "bytes": 12500000 },
        { "part": 2, "key": "narration-abc123-part002.mp3", "url": "https://...", "durationSeconds": 643.2, "bytes": 6700000 }
    ],
    "chars": 48210,
    "pagesProcessed": 24,
    "ocrPagesProcessed": 0,
    "voice": "en-US-AndrewNeural",
    "model": "edge-tts",
    "cost": 2.13,
    "status": "success",
    "chunksTotal": 13,
    "chunksSucceeded": 13,
    "chunksFailed": 0,
    "generatedAt": "2026-06-02T10:30:00.000Z"
}
````

### Use Cases

- **Any ebook → free audiobook** — drop a Project Gutenberg EPUB and listen to a full classic novel. Spine-ordered chapters narrate in the right sequence.
- **Word doc → audio briefing** — drop your DOCX report and listen on a commute instead of skimming on screen.
- **Audiobook prototyping** — convert your ebook PDF into MP3 to validate narrator tone before commissioning a human voiceover.
- **Research papers on the go** — listen to arXiv PDFs during a commute or workout.
- **README → audio** — paste your project README and listen to your own docs to spot rough explanations.
- **Blog draft proofing** — paste a Markdown blog draft and listen to it before publishing. You'll hear awkward phrasing you'd never catch reading.
- **ChatGPT reply → podcast snippet** — copy a long ChatGPT response into the `text` field and listen as audio.
- **Accessibility** — generate audio versions of internal documentation for screen-reader-light workflows.
- **Onboarding** — pipe HR PDFs (handbooks, policies) into audio for distributed teams.
- **Newsletter audio versions** — automatically narrate weekly reports (PDF, MD, or HTML) for paying subscribers.
- **Language learning** — narrate text in different voices and speeds to practise listening comprehension.
- **Substack → audio export** — export a post as HTML and narrate it for a podcast feed.

### Pricing

This Actor uses **Pay Per Event** so you only pay for the work the run actually does. No premium-voice surcharges, no provider markups.

| Event | Price (USD) | When charged |
|-------|------------:|--------------|
| `actor-start` | $0.02 | Once per run, after the document loads successfully |
| `pdf-page-narrated` | $0.05 | Once per page (PDF) or per ~3000-char section (TXT / MD / HTML) successfully narrated |
| `audio-minute-generated` | $0.03 | Once per minute of MP3 output |
| `ocr-page-processed` | $0.10 | Only for scanned / image-only PDF pages that had no text layer and were recovered via OCR. Text-layer PDFs never pay this |

**Typical cost example** — a 20-page research paper (~40k chars, ~50 minutes of audio):

- `actor-start`: $0.02
- 20 × `pdf-page-narrated`: $1.00
- \~50 × `audio-minute-generated`: $1.50
- **Total: ~$2.52** for the full paper

For OpenAI / ElevenLabs models, this is **all you pay Apify**. You pay the provider directly with your own API key on top — that's the whole point of BYOK: no markup.

#### Pre-flight cost preview & hard cap

Every run writes a `PREVIEW` key to the key-value store BEFORE TTS starts, with pages to process, estimated audio minutes, and the estimated ceiling cost. The same numbers are printed to the run log.

Set the optional **`maxCostUsd`** input to enforce a hard cap: if the estimate exceeds it, the run aborts cleanly before any TTS — you only pay the actor-start fee (plus any OCR already performed). The actual audio-minute charge is also clamped to the cap, so even if the produced audio runs longer than estimated (slow speech, CJK scripts) the final bill never exceeds your cap. Combine with Apify's run-level **Max total charge** as a second belt-and-suspenders limit.

### BYOK — Bring Your Own Key

| Model family | Key field | Where to get it | Free tier? |
|--------------|-----------|-----------------|------------|
| `edge-tts` (default) | — | **No key needed** | Yes — completely free |
| `openai-*` | `openaiApiKey` | https://platform.openai.com/api-keys | OpenAI charges per character |
| `elevenlabs-*` | `elevenlabsApiKey` | https://elevenlabs.io/app/settings/api-keys | ElevenLabs free tier available |

The actor never logs your keys (`isSecret: true`) and never proxies your calls through our servers — your key talks directly to the provider from inside the actor's run.

### FAQ

#### Which document formats are supported?

- **PDF** (`.pdf`) — text-layer PDFs extract natively; scanned / image-only pages are auto-OCR'd (Tesseract).
- **DOCX** (`.docx`) — Word documents, parsed with `mammoth`. Styles, lists, tables, footnotes handled natively.
- **EPUB** (`.epub`) — ebooks. Walked in spine order so chapters narrate in the right sequence. HTML stripped per chapter.
- **Markdown** (`.md`, `.markdown`, `.mdx`) — syntax stripped so the voice reads natural prose.
- **Plain text** (`.txt`, `.text`) — UTF-8, BOM handled.
- **HTML** (`.html`, `.htm`, `.xhtml`) — tags stripped, entities decoded, scripts and styles removed.
- **RTF** (`.rtf`) — control codes stripped, unicode escapes and hex bytes decoded.

#### Does it work on scanned PDFs?

Yes. When a PDF page has no selectable text layer, the actor renders it (poppler `pdftoppm`) and runs OCR (Tesseract) to recover the text, then narrates it. OCR runs only on pages that need it, and those pages are billed via the `ocr-page-processed` event ($0.10/page). Built-in OCR languages: English, Spanish, French, German, Italian, Portuguese, Dutch (others fall back to English). Turn it off with `enableOcr: false` to fail fast on scans instead.

#### Does it work on password-protected PDFs?

Yes — pass the password in the `pdfPassword` input and the actor decrypts the PDF before extraction.

#### What about ODT, MOBI, AZW3, or Pages?

Not supported in v0.1. Convert ODT to DOCX first; for Kindle formats, convert via Calibre to EPUB.

#### Why do OpenAI / ElevenLabs models require BYOK?

So we never markup the provider price. Pay Apify for the actor work, pay OpenAI / ElevenLabs directly for the TTS calls. Cleaner, cheaper, more honest. For zero-key zero-friction runs, the default `edge-tts` is free and gives great quality on 70+ languages.

#### How is "page" defined for TXT / MD / HTML?

There are no real pages, so the actor splits the cleaned text into ~3000-char pseudo-pages — roughly the length of one PDF page of prose. This keeps `pageRange` and per-page billing fair across formats.

#### How long can the document be?

PDFs up to 50 MB. EPUB up to 40 MB. DOCX up to 30 MB. TXT / MD / HTML / RTF up to 20 MB of decoded text. There is no hard page limit. Long inputs are auto-split into chapter-sized MP3 parts (configurable via `maxPartMb`).

#### What if my run times out or fails partway?

Re-run with the same input. The `resume` option (on by default) skips already-synthesized chunks via a shared cache, so you only pay TTS for the missing pieces.

#### Can I get word-level timestamps?

v0.1 does not emit timestamps. Coming in a future version.

#### Can I use multiple speakers / podcast mode?

Not in v0.1. Single-voice narration only.

#### What audio format is produced?

MP3, standard playback on any device. Edge TTS produces 24 kHz mono; OpenAI / ElevenLabs use their default high-quality output.

#### Can I override the voice with a custom ElevenLabs voice?

Yes — paste any ElevenLabs voice ID into the `voice` field when using an `elevenlabs-*` model. You can clone your own voice in your ElevenLabs account and use that ID here.

#### Is my API key safe?

Yes. API keys are marked `isSecret: true` in the input schema and are never logged or persisted.

### Built and maintained by Equipinico

Need a custom variant (different language model, custom voices, SSML support, podcast multi-speaker, EPUB / DOCX support)? Reach out via the Apify Store contact link.

# Actor input Schema

## `documentUrl` (type: `string`):

Public URL of the document you want to narrate. Supported formats: PDF (.pdf), Word (.docx), EPUB (.epub), Markdown (.md, .markdown, .mdx), plain text (.txt), HTML (.html, .htm, .xhtml), Rich Text (.rtf). Either documentUrl, documentFile, documentBase64, or text must be provided. NOTE: scanned/image-only PDFs are supported via automatic OCR (see enableOcr). Encrypted/password-protected PDFs are supported via the pdfPassword input. Encrypted DOCX files are not supported.

## `documentFile` (type: `array`):

Drag and drop or select a local file. Accepted: .pdf, .docx, .epub, .md, .markdown, .txt, .html, .htm, .rtf. Uploaded to a key-value store and narrated. If multiple are provided, only the first is used.

## `documentBase64` (type: `string`):

Alternative input (e.g. for the CLI/API): base64-encoded document bytes. Useful when the document is not publicly hosted and you are not using the upload widget. Format auto-detected from magic bytes / content.

## `text` (type: `string`):

Paste prose, Markdown, or HTML directly. Quickest way to narrate a blog post draft, a long ChatGPT reply, or a README. Auto-detected as Markdown if it contains heading or list syntax.

## `model` (type: `string`):

Which text-to-speech engine to use. edge-tts is FREE (Microsoft Edge neural voices, no API key needed, no per-character cost) and the recommended default for long books. All OpenAI and ElevenLabs models require BYOK (Bring Your Own Key): you pay the provider directly using your own API key, in addition to this actor's small per-page fee.

## `language` (type: `string`):

Language of the text. 'Auto-detect' reads the content and picks a matching voice automatically (recommended). Pick a specific language to force it. Only used for Edge TTS, where each voice is locked to one language; OpenAI voices are multilingual and follow the text. Ignored if you set an explicit Voice below.

## `voice` (type: `string`):

Leave blank to auto-pick by language (recommended).

• Edge TTS (free) — Azure ShortName. Popular examples:

- English: en-US-AndrewNeural, en-US-AvaNeural, en-GB-SoniaNeural, en-AU-NatashaNeural
- Spanish: es-ES-ElviraNeural, es-MX-DaliaNeural
- French: fr-FR-DeniseNeural, fr-CA-SylvieNeural
- German: de-DE-KatjaNeural
- Italian: it-IT-ElsaNeural
- Portuguese: pt-BR-FranciscaNeural, pt-PT-RaquelNeural
- 400+ voices total covering 70+ languages.

• OpenAI: alloy, echo, fable, onyx, nova, shimmer, coral, sage.

• ElevenLabs: a voice ID (e.g. 21m00Tcm4TlvDq8ikWAM for Rachel).

## `speed` (type: `number`):

Playback speed multiplier. 1.0 = normal pace. Range 0.25 to 4.0. Only applies to OpenAI tts-1 / tts-1-hd. gpt-4o-mini-tts ignores this (use instructions instead).

## `instructions` (type: `string`):

Free-form style guidance for the gpt-4o-mini-tts model, e.g. 'Calm, slow audiobook narrator' or 'Energetic podcast host'. Ignored by other models.

## `pageRange` (type: `string`):

Optional 1-indexed range. For PDFs this is the actual page range. For DOCX / EPUB / TXT / MD / HTML / RTF, the cleaned text is split into ~3000-char pseudo-pages so the same range syntax still works (EPUBs are walked in spine order). Examples: '1-10', '1,3,5', '1-3,7-9'. Leave empty for the full document.

## `enableOcr` (type: `boolean`):

When a PDF page has no selectable text layer (scanned documents, photographed pages), run it through OCR (Tesseract) to recover the text and narrate it. Only pages that actually need OCR are processed and billed (ocr-page-processed event). Leave on for 'just works' behavior; turn off to fail fast on scans instead.

## `pdfPassword` (type: `string`):

Password to decrypt a password-protected PDF before extraction. Leave blank for normal PDFs.

## `chunkSize` (type: `integer`):

Characters per TTS request. OpenAI accepts up to ~4096. ElevenLabs caps at 2500 (free / starter plans) - the actor auto-clamps to 2500 for ElevenLabs models. Smaller chunks recover better from errors but cost the same in total.

## `concurrency` (type: `integer`):

How many TTS chunks to synthesize in parallel. 5 is a safe default. Auto-clamped per provider (Edge 8, OpenAI 10, ElevenLabs 2) to avoid rate limits. Range: 1-20.

## `resume` (type: `boolean`):

If a previous run for the same document + voice + model failed or timed out, skip already-synthesized chunks and continue. Uses a named key-value store called 'pdf-audio-cache' shared across runs. Disable to force a full re-narration.

## `skipFailedChunks` (type: `boolean`):

If a single chunk keeps failing after retries (e.g. malformed text), skip it with a warning and keep narrating the rest of the document. The output status becomes 'partial' and failed chunk indexes are listed. Auth/quota errors always abort regardless, since every chunk would fail. Disable to stop on the first failed chunk.

## `maxPartMb` (type: `integer`):

Long documents are split into multiple MP3 parts so each stays a manageable, chapter-sized file. When a part reaches this size it is finalized and uploaded, and disk is freed. Lower values = more, smaller files. Range 1-500.

## `maxCostUsd` (type: `number`):

Hard cap on actor charges. If the pre-flight estimate exceeds this number the run aborts BEFORE any TTS happens (you only pay the actor-start fee plus any OCR already done). The actual audio-minute charge is also clamped to this cap, so the final bill never exceeds it even if the produced audio runs longer than estimated. Must be at least 0.02. Leave blank to disable. Caps the actor's PPE charges only - your OpenAI / ElevenLabs provider costs (BYOK) are separate and not bounded by this.

## `openaiApiKey` (type: `string`):

Your own OpenAI API key. REQUIRED whenever you pick an openai-\* model (tts-1, tts-1-hd, gpt-4o-mini-tts). You pay OpenAI directly for the TTS calls and we charge a small per-page actor fee on top. Not needed for Edge TTS (free) or ElevenLabs models. Get a key at https://platform.openai.com/api-keys

## `elevenlabsApiKey` (type: `string`):

Your ElevenLabs API key. REQUIRED whenever you pick an elevenlabs-\* model. You pay ElevenLabs directly for the audio generation and we charge a small per-page actor fee on top. Get a key at https://elevenlabs.io/app/settings/api-keys

## `proxyConfiguration` (type: `object`):

Optional proxy used only when fetching from a Document URL. Helps with hosts that block datacenter IPs. Ignored for uploaded files, base64, and raw text.

## `debug` (type: `boolean`):

Enable verbose logging for troubleshooting.

## Actor input object example

```json
{
  "documentUrl": "https://example.com/document.pdf",
  "text": "# Chapter 1\n\nIt was the best of times, it was the worst of times...",
  "model": "edge-tts",
  "language": "auto",
  "voice": "en-US-AndrewNeural",
  "speed": 1,
  "instructions": "Speak slowly and deliberately, like an audiobook narrator.",
  "pageRange": "1-10",
  "enableOcr": true,
  "chunkSize": 4000,
  "concurrency": 5,
  "resume": true,
  "skipFailedChunks": true,
  "maxPartMb": 40,
  "maxCostUsd": 5,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "debug": false
}
```

# Actor output Schema

## `audio` (type: `string`):

Audio file metadata view: indexUrl, audioUrl, durationSeconds, partsCount, chars, pagesProcessed, voice, model, cost, status.

## `indexPage` (type: `string`):

Shareable single-page overview with inline players + download links for every audio part.

## `preview` (type: `string`):

Pre-flight estimate written before TTS starts: pages to process, estimated audio minutes, and ceiling cost in USD.

## `keyValueStore` (type: `string`):

Raw key-value store containing every MP3 part, the INDEX.html, the PREVIEW, and the OUTPUT record.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "documentUrl": "https://www.orimi.com/pdf-test.pdf",
    "instructions": "Calm, clear audiobook narrator with a neutral accent."
};

// Run the Actor and wait for it to finish
const run = await client.actor("marielise.dev/pdf-to-mp3").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "documentUrl": "https://www.orimi.com/pdf-test.pdf",
    "instructions": "Calm, clear audiobook narrator with a neutral accent.",
}

# Run the Actor and wait for it to finish
run = client.actor("marielise.dev/pdf-to-mp3").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "documentUrl": "https://www.orimi.com/pdf-test.pdf",
  "instructions": "Calm, clear audiobook narrator with a neutral accent."
}' |
apify call marielise.dev/pdf-to-mp3 --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=marielise.dev/pdf-to-mp3",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PDF to MP3 - Convert PDF, EPUB, DOCX & Text to Audiobook",
        "description": "Convert PDF, EPUB, DOCX, Markdown, HTML, TXT, and RTF to MP3 audiobooks. Free Microsoft Edge TTS (no API key) with OCR for scanned PDFs, 70+ languages, and optional OpenAI or ElevenLabs voices. ~$0.04/min.",
        "version": "0.0",
        "x-build-id": "9ItByvFGFeXIoxx7l"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/marielise.dev~pdf-to-mp3/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-marielise.dev-pdf-to-mp3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/marielise.dev~pdf-to-mp3/runs": {
            "post": {
                "operationId": "runs-sync-marielise.dev-pdf-to-mp3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/marielise.dev~pdf-to-mp3/run-sync": {
            "post": {
                "operationId": "run-sync-marielise.dev-pdf-to-mp3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "documentUrl": {
                        "title": "Document URL",
                        "type": "string",
                        "description": "Public URL of the document you want to narrate. Supported formats: PDF (.pdf), Word (.docx), EPUB (.epub), Markdown (.md, .markdown, .mdx), plain text (.txt), HTML (.html, .htm, .xhtml), Rich Text (.rtf). Either documentUrl, documentFile, documentBase64, or text must be provided. NOTE: scanned/image-only PDFs are supported via automatic OCR (see enableOcr). Encrypted/password-protected PDFs are supported via the pdfPassword input. Encrypted DOCX files are not supported."
                    },
                    "documentFile": {
                        "title": "Or upload a file from your device",
                        "type": "array",
                        "description": "Drag and drop or select a local file. Accepted: .pdf, .docx, .epub, .md, .markdown, .txt, .html, .htm, .rtf. Uploaded to a key-value store and narrated. If multiple are provided, only the first is used."
                    },
                    "documentBase64": {
                        "title": "Or paste Base64-encoded document",
                        "type": "string",
                        "description": "Alternative input (e.g. for the CLI/API): base64-encoded document bytes. Useful when the document is not publicly hosted and you are not using the upload widget. Format auto-detected from magic bytes / content."
                    },
                    "text": {
                        "title": "Or paste raw text / Markdown",
                        "type": "string",
                        "description": "Paste prose, Markdown, or HTML directly. Quickest way to narrate a blog post draft, a long ChatGPT reply, or a README. Auto-detected as Markdown if it contains heading or list syntax."
                    },
                    "model": {
                        "title": "TTS Model",
                        "enum": [
                            "edge-tts",
                            "openai-gpt-4o-mini-tts",
                            "openai-tts-1",
                            "openai-tts-1-hd",
                            "elevenlabs-flash-v2_5",
                            "elevenlabs-turbo-v2_5"
                        ],
                        "type": "string",
                        "description": "Which text-to-speech engine to use. edge-tts is FREE (Microsoft Edge neural voices, no API key needed, no per-character cost) and the recommended default for long books. All OpenAI and ElevenLabs models require BYOK (Bring Your Own Key): you pay the provider directly using your own API key, in addition to this actor's small per-page fee.",
                        "default": "edge-tts"
                    },
                    "language": {
                        "title": "Language",
                        "enum": [
                            "auto",
                            "en",
                            "es",
                            "fr",
                            "de",
                            "it",
                            "pt",
                            "nl",
                            "ru",
                            "pl",
                            "tr",
                            "ar",
                            "zh",
                            "ja",
                            "ko",
                            "hi",
                            "uk",
                            "sv",
                            "no",
                            "da",
                            "fi",
                            "cs",
                            "el",
                            "he",
                            "th",
                            "vi",
                            "id",
                            "ro",
                            "hu"
                        ],
                        "type": "string",
                        "description": "Language of the text. 'Auto-detect' reads the content and picks a matching voice automatically (recommended). Pick a specific language to force it. Only used for Edge TTS, where each voice is locked to one language; OpenAI voices are multilingual and follow the text. Ignored if you set an explicit Voice below.",
                        "default": "auto"
                    },
                    "voice": {
                        "title": "Voice (optional, overrides language)",
                        "type": "string",
                        "description": "Leave blank to auto-pick by language (recommended).\n\n• Edge TTS (free) — Azure ShortName. Popular examples:\n  - English: en-US-AndrewNeural, en-US-AvaNeural, en-GB-SoniaNeural, en-AU-NatashaNeural\n  - Spanish: es-ES-ElviraNeural, es-MX-DaliaNeural\n  - French: fr-FR-DeniseNeural, fr-CA-SylvieNeural\n  - German: de-DE-KatjaNeural\n  - Italian: it-IT-ElsaNeural\n  - Portuguese: pt-BR-FranciscaNeural, pt-PT-RaquelNeural\n  - 400+ voices total covering 70+ languages.\n\n• OpenAI: alloy, echo, fable, onyx, nova, shimmer, coral, sage.\n\n• ElevenLabs: a voice ID (e.g. 21m00Tcm4TlvDq8ikWAM for Rachel)."
                    },
                    "speed": {
                        "title": "Speech Speed",
                        "minimum": 0.25,
                        "maximum": 4,
                        "type": "number",
                        "description": "Playback speed multiplier. 1.0 = normal pace. Range 0.25 to 4.0. Only applies to OpenAI tts-1 / tts-1-hd. gpt-4o-mini-tts ignores this (use instructions instead).",
                        "default": 1
                    },
                    "instructions": {
                        "title": "Voice Instructions (gpt-4o-mini-tts only)",
                        "type": "string",
                        "description": "Free-form style guidance for the gpt-4o-mini-tts model, e.g. 'Calm, slow audiobook narrator' or 'Energetic podcast host'. Ignored by other models."
                    },
                    "pageRange": {
                        "title": "Page / Section Range",
                        "type": "string",
                        "description": "Optional 1-indexed range. For PDFs this is the actual page range. For DOCX / EPUB / TXT / MD / HTML / RTF, the cleaned text is split into ~3000-char pseudo-pages so the same range syntax still works (EPUBs are walked in spine order). Examples: '1-10', '1,3,5', '1-3,7-9'. Leave empty for the full document."
                    },
                    "enableOcr": {
                        "title": "OCR scanned / image-only PDFs",
                        "type": "boolean",
                        "description": "When a PDF page has no selectable text layer (scanned documents, photographed pages), run it through OCR (Tesseract) to recover the text and narrate it. Only pages that actually need OCR are processed and billed (ocr-page-processed event). Leave on for 'just works' behavior; turn off to fail fast on scans instead.",
                        "default": true
                    },
                    "pdfPassword": {
                        "title": "PDF Password (for encrypted PDFs)",
                        "type": "string",
                        "description": "Password to decrypt a password-protected PDF before extraction. Leave blank for normal PDFs."
                    },
                    "chunkSize": {
                        "title": "Chunk Size (characters)",
                        "minimum": 500,
                        "maximum": 4096,
                        "type": "integer",
                        "description": "Characters per TTS request. OpenAI accepts up to ~4096. ElevenLabs caps at 2500 (free / starter plans) - the actor auto-clamps to 2500 for ElevenLabs models. Smaller chunks recover better from errors but cost the same in total.",
                        "default": 4000
                    },
                    "concurrency": {
                        "title": "Parallel TTS Requests",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "How many TTS chunks to synthesize in parallel. 5 is a safe default. Auto-clamped per provider (Edge 8, OpenAI 10, ElevenLabs 2) to avoid rate limits. Range: 1-20.",
                        "default": 5
                    },
                    "resume": {
                        "title": "Resume from previous run (recommended for books)",
                        "type": "boolean",
                        "description": "If a previous run for the same document + voice + model failed or timed out, skip already-synthesized chunks and continue. Uses a named key-value store called 'pdf-audio-cache' shared across runs. Disable to force a full re-narration.",
                        "default": true
                    },
                    "skipFailedChunks": {
                        "title": "Skip failed chunks instead of aborting",
                        "type": "boolean",
                        "description": "If a single chunk keeps failing after retries (e.g. malformed text), skip it with a warning and keep narrating the rest of the document. The output status becomes 'partial' and failed chunk indexes are listed. Auth/quota errors always abort regardless, since every chunk would fail. Disable to stop on the first failed chunk.",
                        "default": true
                    },
                    "maxPartMb": {
                        "title": "Max size per audio part (MB)",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Long documents are split into multiple MP3 parts so each stays a manageable, chapter-sized file. When a part reaches this size it is finalized and uploaded, and disk is freed. Lower values = more, smaller files. Range 1-500.",
                        "default": 40
                    },
                    "maxCostUsd": {
                        "title": "Max cost ceiling (USD, optional)",
                        "minimum": 0.02,
                        "type": "number",
                        "description": "Hard cap on actor charges. If the pre-flight estimate exceeds this number the run aborts BEFORE any TTS happens (you only pay the actor-start fee plus any OCR already done). The actual audio-minute charge is also clamped to this cap, so the final bill never exceeds it even if the produced audio runs longer than estimated. Must be at least 0.02. Leave blank to disable. Caps the actor's PPE charges only - your OpenAI / ElevenLabs provider costs (BYOK) are separate and not bounded by this."
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API Key (REQUIRED for OpenAI models)",
                        "type": "string",
                        "description": "Your own OpenAI API key. REQUIRED whenever you pick an openai-* model (tts-1, tts-1-hd, gpt-4o-mini-tts). You pay OpenAI directly for the TTS calls and we charge a small per-page actor fee on top. Not needed for Edge TTS (free) or ElevenLabs models. Get a key at https://platform.openai.com/api-keys"
                    },
                    "elevenlabsApiKey": {
                        "title": "ElevenLabs API Key (REQUIRED for ElevenLabs models)",
                        "type": "string",
                        "description": "Your ElevenLabs API key. REQUIRED whenever you pick an elevenlabs-* model. You pay ElevenLabs directly for the audio generation and we charge a small per-page actor fee on top. Get a key at https://elevenlabs.io/app/settings/api-keys"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy (for Document URL fetch)",
                        "type": "object",
                        "description": "Optional proxy used only when fetching from a Document URL. Helps with hosts that block datacenter IPs. Ignored for uploaded files, base64, and raw text.",
                        "default": {
                            "useApifyProxy": false
                        }
                    },
                    "debug": {
                        "title": "Debug Mode",
                        "type": "boolean",
                        "description": "Enable verbose logging for troubleshooting.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```