LibriVox Audiobooks Scraper avatar

LibriVox Audiobooks Scraper

Pricing

from $10.00 / 1,000 result items

Go to Apify Store
LibriVox Audiobooks Scraper

LibriVox Audiobooks Scraper

Pull free public domain audiobooks from LibriVox: title, author, narrator, language, runtime, chapter count, genre, copyright year, description, RSS feed, and MP3 download URLs. Export to JSON, CSV, or Excel for educators, podcasters, language learners, and audio content libraries.

Pricing

from $10.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

📚 LibriVox Audiobooks Scraper

🚀 Export the world's largest public-domain audiobook library in seconds. Browse 20,000+ free audiobooks from LibriVox, filter by title, author, language, or genre, and pull every section's reader credit, runtime, and direct audio URL. No login, no manual catalog scrape.

🕒 Last updated: 2026-05-23 · 📊 23 fields per record · 📚 20,000+ audiobooks · 🎤 100k+ volunteer readings · 🌍 multilingual catalog

The LibriVox Audiobooks Scraper queries the LibriVox catalog and returns 23 structured fields per audiobook, including the title, author list, primary author, language, copyright year, runtime in human-readable and seconds form, description, genres, translators, plus direct links to the LibriVox page, RSS podcast feed, ZIP download, Internet Archive page, and original Project Gutenberg text source. Optional extended mode adds the full sections list with per-section reader credits, individual playtimes, and per-file audio URLs.

The catalog includes classic literature read aloud (Project Gutenberg titles), original LibriVox productions, multilingual works, and short-story collections. Most readings are in English but the project covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, and Mandarin. This Actor turns the catalog into clean CSV, Excel, JSON, or XML in under five minutes.

🎯 Target Audience💡 Primary Use Cases
Audiobook app developers, podcast networks, education and EdTech teams, accessibility specialists, public libraries, audio content curatorsStock audiobook apps with free content, classroom listening assignments, accessibility libraries for visually impaired users, podcast feed generation, language-learning audio decks

📋 What the LibriVox Audiobooks Scraper does

Five filtering workflows in a single run:

  • 🔎 Title substring search. Case-insensitive title match like pride, monte cristo.
  • ✍️ Author substring search. Match by author last name or full name.
  • 🌐 Language filter. Filter to a specific LibriVox language (English, French, German, Spanish, Japanese, etc.).
  • 🎭 Genre filter. Pick one of 28 LibriVox genres (Romance, Crime & Mystery, Philosophy, Poetry, and more).
  • 📑 Extended mode toggle. When enabled, each record carries the full sections list with reader credits, runtimes, and audio URLs.

Each record includes the LibriVox ID, title, full author list, primary author, language, copyright year, section count, total runtime (human-readable and seconds), full description (plain text and HTML), genres, translators when applicable, and the canonical LibriVox page, RSS podcast feed, ZIP archive URL, Project Gutenberg source link, Internet Archive page, and any other reference URLs.

💡 Why it matters: LibriVox is the canonical free audiobook archive, but its catalog page is paginated and its section structure is nested under each book. Building your own crawler means walking thousands of pages and threading section lookups. This Actor returns everything as flat structured rows ready for a database or content platform.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded audiobook catalog.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Audiobooks to return. Free plan caps at 10, paid plan at 1,000,000.
titlestring""Case-insensitive title substring.
authorstring""Author last-name substring.
languagestring""Language name as used by LibriVox.
genrestring""One of 28 LibriVox genres.
extendedbooleantrueWhen true, include sections list with reader credits and audio URLs.

Example: 50 Jane Austen audiobooks in English.

{
"maxItems": 50,
"author": "austen",
"language": "English",
"extended": true
}

Example: 20 French-language poetry audiobooks with full section list.

{
"maxItems": 20,
"language": "French",
"genre": "Poetry",
"extended": true
}

⚠️ Good to Know: LibriVox sections are individual chapters or tracks read by volunteer narrators. When extended is true the per-section reader credit, length, and audio file URL are exposed for every track in the book. Expect 10-100 sections per long novel.


📊 Output

Each audiobook record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 librivoxIdstring"100"
📚 titlestring"Pride and Prejudice"
✍️ authorsarray[{"first_name":"Jane","last_name":"Austen"}]
👤 primaryAuthorstring"Jane Austen"
🌐 languagestring"English"
📅 copyrightYearstring"1813"
🔢 numSectionsnumber61
⏱️ totalTimestring"11:35:46"
⏲️ totalTimeSecondsnumber41746
📝 descriptionstring"Pride and Prejudice is the second novel by Jane Austen..."
📄 descriptionHtmlstring"<p>Pride and Prejudice is the second novel..."
🎭 genresarray["General Fiction", "Romance"]
🌍 translatorsarray[]
🔗 urlLibrivoxstring"https://librivox.org/pride-and-prejudice-by-jane-austen/"
📻 urlRssstring"https://librivox.org/rss/100"
📦 urlZipFilestring"https://www.archive.org/.../pride_and_prejudice_64kb_mp3.zip"
🌐 urlProjectstring | null"https://www.gutenberg.org/ebooks/1342"
🔗 urlOtherstring | nullnull
📚 urlInternetArchivestring"https://archive.org/details/pride_and_prejudice_0809_librivox"
📖 urlTextSourcestring | null"https://www.gutenberg.org/files/1342/1342-h/1342-h.htm"
🔢 sectionsCountnumber61
🎤 sectionsarray[{"chapter":"Chapter 1","reader":"Karen Savage","playtime":"00:14:23","audioUrl":"..."}]
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📚20,000+ audiobook catalog. Every public-domain title LibriVox has produced.
🎯Multi-dimensional filters. Title, author, language, and genre combine in a single run.
🎤Per-section reader credits. Extended mode exposes chapter-level narrator, runtime, and audio URL.
📻RSS podcast feeds included. Drop straight into a podcast player.
Fast. 10 audiobooks in under 5 seconds, 1,000 in under 5 minutes.
🔁Always fresh. Live catalog reads on every run.
🚫No authentication. Public archive, no key required.

📊 LibriVox is the canonical free audiobook library and a foundation for any audio content product targeting public-domain works.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ LibriVox Audiobooks Scraper (this Actor)$5 free credit, then pay-per-use20,000+ audiobooksLive per runtitle, author, language, genre⚡ 2 min
Commercial audiobook libraries$14.95+/monthCurated paid catalogDailyLimited🐢 Days
Custom site scraperFree engineeringFullCron drivenHand built⏳ Weeks
Per-book browsingFreeOne book at a timeManualUI only🕒 Painful

Pick this Actor when you want a clean, filterable feed of the entire LibriVox catalog with zero parser maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the LibriVox Audiobooks Scraper page on the Apify Store.
  3. 🎯 Set input. Add optional title, author, language, or genre filters, choose extended mode if you want per-section data.
  4. 🚀 Run it. Click Start and let the Actor collect catalog records.
  5. 📥 Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🎧 Audiobook Apps

  • Stock a freemium audiobook app with public-domain titles
  • Build a kids' audiobook section from children's fiction
  • Generate themed audio libraries (romance, mystery, classics)
  • Add multilingual content packs to existing apps

🎙️ Podcast Networks

  • Spin up "classic literature" podcast feeds from RSS URLs
  • Curate themed reading series for syndication
  • Source content for an audio newsletter or daily-listen app
  • Build chaptered podcast versions of long novels

🎓 Education & Accessibility

  • Stock classroom listening assignments for literature classes
  • Build accessibility libraries for visually impaired users
  • Augment ESL programs with audio for graded readers
  • Provide reading-along audio for early literacy programs

📚 Library Apps & Catalogs

  • Add a free audiobook collection to a digital library catalog
  • Source ISBN-less audiobook records for cataloging projects
  • Build a "listen-along" companion to a Gutenberg etext app
  • Curate themed reading lists with audio companions

🔌 Automating LibriVox Audiobooks Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream audiobook catalog topped up with the latest LibriVox publications.


🌟 Beyond business use cases

Public-domain audiobooks power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reception studies of classic literature
  • Audiobook narration and voice-acting research
  • Reproducible corpora citing exact dataset pulls
  • Cross-language comparative literature with audio

🎨 Personal and creative

  • Curated bedtime story collections for parents
  • Mood-based reading lists for hobbyist apps
  • Visualization dashboards of reader hours by language
  • Themed playlists for road trips or long walks

🤝 Non-profit and civic

  • Free audio libraries for under-resourced schools
  • Audio access for visually impaired community members
  • Senior-living center listening programs
  • Language-revitalization audio packs for minority tongues

🧪 Experimentation

  • Train automatic speech recognition on volunteer narration
  • Build alignment datasets pairing audio with Gutenberg text
  • Prototype voice-cloning research with diverse readers
  • Test podcast-publishing pipelines with real RSS feeds

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick optional title, author, language, or genre filters and choose whether to include per-section detail. Click Start and the Actor returns clean rows with audio links, RSS feeds, ZIP archives, and reader credits.

📏 How complete is the metadata?

LibriVox metadata is curated by volunteer catalogers. Most fields are populated, with the occasional gap for very old additions. The description, genres, and section reader credits are typically complete.

🔁 How often is the catalog refreshed?

LibriVox publishes new audiobooks weekly. Every Actor run hits the live catalog, so new releases appear in your dataset right away.

🌐 Which languages are supported?

Most audiobooks are in English, but LibriVox covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, Mandarin, and more. Use the language input to filter.

🎤 Do I get per-chapter audio URLs?

Yes, when extended is true. Each section carries the chapter name, reader credit, playtime, and a direct MP3 URL hosted by the Internet Archive.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (weekly is recommended for new releases).

Yes. LibriVox audiobooks are public domain in the United States. Source texts are also typically Project Gutenberg public-domain works. Always verify the copyright status in your jurisdiction.

💼 Can I use these audiobooks commercially?

Yes. LibriVox recordings are dedicated to the public domain worldwide. You can use, remix, and resell them freely. Attribution to the volunteer readers is a nice courtesy.

💳 Do I need a paid Apify plan to use this Actor?

No. The free plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

🔁 What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

🆘 What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.


🔌 Integrate with any app

LibriVox Audiobooks Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe audiobook data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh audiobook records into your catalog or alert your content team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the LibriVox project. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. LibriVox audiobooks are dedicated to the public domain.