Convert PDF, EPUB, DOCX, Markdown, HTML, TXT, and RTF to MP3 audiobooks. Free Microsoft Edge TTS (no API key) with OCR for scanned PDFs, 70+ languages, and optional OpenAI or ElevenLabs voices. ~$0.04/min.
All notable changes to this Actor will be documented in this file.
[0.1.1] - 2026-06-08
Production-hardening: OCR, modern PDF engine, security, billing safety
New capabilities
OCR fallback for scanned / image-only PDFs — pages with no text layer are
auto-rendered (poppler pdftoppm) and OCR'd (Tesseract: EN, ES, FR, DE, IT,
PT, NL) and narrated. New ocr-page-processed billing event ($0.10/page),
charged only for pages that actually need OCR. Toggle with enableOcr.
Encrypted PDF support — decrypt password-protected PDFs via pdfPassword.
Proxy support — optional proxyConfiguration for the Document URL fetch.
ID3 tags on every MP3 part (title / album / track / genre=Audiobook).
Engineering
Replaced pdf-parse (2018 PDF.js, unmaintained) with unpdf — current,
serverless-friendly PDF.js. Per-page extraction via a direct array index
(no render-hook page-order invariant to desync).
Real unit test suite (Node test runner) for chunking, page-range parsing,
SSRF address checks, format detection, strippers, key sanitization, voice +
OCR-language mapping. ESLint flat config added.
Security
SSRF guard on Document URL fetch: rejects non-http(s) schemes and any host
resolving to private / loopback / link-local / CGNAT ranges (incl. cloud
metadata 169.254.169.254). Redirects are followed manually and re-validated
at every hop.
Added .dockerignore so .env / secrets / local state never enter image layers.
Billing safety
maxCostUsd now also clamps the actual audio-minute charge (not just the
pre-flight estimate), so slow-speech / CJK runs can't bill past the cap.
maxCostUsd: 0 (or below the $0.02 floor) is now rejected instead of silently
disabling the cap.
[0.1.0] - 2026-06-02
Initial public release as Text to Audio Narrator
Multi-format document narration: PDF, Markdown, plain text, and HTML in, MP3 out.
Supported inputs (7 formats)
PDF (.pdf) — native text-layer extraction (no OCR)
DOCX (.docx) — Word documents via mammoth: styles, lists, tables, footnotes
EPUB (.epub) — ebooks via epub2: spine-ordered chapter walk, HTML stripped per chapter
Markdown (.md, .markdown, .mdx) — syntax stripped before TTS so the voice reads natural prose
Plain text (.txt, .text) — UTF-8 with BOM handling
HTML (.html, .htm, .xhtml) — tags stripped, entities decoded