DOAB Directory of Open Access Books Scraper avatar

DOAB Directory of Open Access Books Scraper

Pricing

from $11.00 / 1,000 result items

Go to Apify Store
DOAB Directory of Open Access Books Scraper

DOAB Directory of Open Access Books Scraper

Browse the Directory of Open Access Books (DOAB) with peer-reviewed academic titles. Capture title, authors, publisher, ISBN, DOI, subjects, language, publication year, abstract, and download URL. Export to JSON, CSV, or Excel for libraries, researchers, and content aggregation.

Pricing

from $11.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

ParseForge Banner

📚 DOAB Directory of Open Access Books Scraper

🚀 Export the global open-access book catalog in seconds. Pull 70,000+ peer-reviewed academic titles across 25+ subject areas in 50+ languages. No API key, no registration, no manual catalog scraping.

🕒 Last updated: 2026-05-23 · 📊 16 fields per record · 📚 70,000+ books · 🌍 50+ languages · 🏛️ 25+ subject areas

The DOAB Scraper exports the Directory of Open Access Books, a community-maintained catalog of peer-reviewed scholarly monographs and edited volumes that anyone can read, download, and redistribute. Each record carries 16 fields with authors, publishers, subjects, licenses, ISBNs, DOIs, abstracts, and direct download links to the full text. The underlying catalog is curated by libraries and publishers worldwide and is one of the most cited open-access references in higher education.

Coverage spans the humanities, social sciences, STEM, law, and the arts across 70,000+ titles, 700+ publishers, and 50+ languages. Every book is released under a Creative Commons or equivalent open license. This Actor turns that catalog into a CSV, Excel, JSON, or XML download in under five minutes.

🎯 Target Audience💡 Primary Use Cases
Academic librarians, OER advocates, researchers, university publishers, digital humanities labs, repository managersLibrary catalog enrichment, OER course design, bibliometric studies, repository ingest, open-access discovery, syllabus building

📋 What the DOAB Scraper does

Four discovery workflows in a single run:

  • 🔍 Full-text search. Query titles, authors, and abstracts across the entire catalog.
  • 🗣️ Language filter. Restrict to English, German, Spanish, French, or any of 50+ languages.
  • 🏛️ Subject filter. Narrow to philosophy, history, mathematics, sociology, and 25+ disciplines.
  • 🆔 Direct handle lookup. Pull a single book by its DOAB handle when you already know the ID.

Each record includes UUID, DOAB handle, title, authors, publisher, language, subjects, ISBNs, DOI, abstract, license, publication date, and direct PDF/EPUB download URLs.

💡 Why it matters: open-access monographs are the backbone of modern OER programs and digital library collections. Building your own ingest pipeline means dealing with OAI-PMH, MARC, Dublin Core mappings, and inconsistent metadata. This Actor returns a clean, normalized record on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to pull a subject-filtered slice into a downloadable dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
searchQuerystring"*"Full-text query across titles, authors, abstracts. Use * for everything.
languagestring""ISO language code (e.g. en, de, es, fr).
subjectstring""Subject area (e.g. philosophy, mathematics, history).
publisherstring""Publisher name filter.
handleIdstring""Direct lookup by DOAB handle. Overrides search.

Example: 50 open-access philosophy books in English.

{
"maxItems": 50,
"searchQuery": "*",
"language": "en",
"subject": "philosophy"
}

Example: every recent title from a specific publisher.

{
"maxItems": 200,
"publisher": "Cambridge University Press"
}

⚠️ Good to Know: DOAB indexes peer-reviewed open-access scholarly books only. Conference proceedings, working papers, and trade titles fall outside the scope. License terms vary by title, so always check the per-record license field before redistribution.


📊 Output

Each book record contains 16 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 uuidstring"3c2f7e91-4b88-..."
🔗 handlestring"20.500.12854/12345"
📖 titlestring"Open Science by Design"
👤 authorsarray["Smith, Jane", "Doe, John"]
🏢 publisherstring"National Academies Press"
🗣️ languagestring"English"
🏛️ subjectsarray["Open science", "Research policy"]
🔖 isbnarray["978-0-309-31441-1"]
🆔 doistring | null"10.17226/25116"
📝 abstractstring"This report explores..."
⚖️ licensestring"CC BY 4.0"
📅 publicationDatestring"2018-07-19"
⬇️ downloadUrlsarray["https://library.oapen.org/..."]
🔗 detailUrlstring"https://directory.doabooks.org/handle/..."
🕒 lastModifiedISO 8601"2024-11-02T00:00:00.000Z"
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📚Comprehensive coverage. 70,000+ peer-reviewed open-access books from 700+ publishers.
🔍Flexible discovery. Search, language, subject, publisher, and handle lookups combine in a single run.
⚖️License-aware. Every record carries an explicit Creative Commons (or equivalent) license string.
🔗Persistent identifiers. DOAB handle, DOI, ISBN, and UUID for downstream cataloging and citation.
Fast. 10 books in under 5 seconds, 10,000 records in a few minutes.
🔁Always fresh. Pulls live catalog data, so new accessions appear in the next run.
🚫No authentication. Works against the public DOAB catalog. No login or key needed.

📊 Open-access monographs are reshaping how universities build reading lists, fund publishing, and measure impact. This Actor turns that catalog into a queryable dataset.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ DOAB Scraper (this Actor)$5 free credit, then pay-per-use70,000+ open-access booksLive per runsearch, language, subject, publisher, handle⚡ 2 min
Manual catalog browsingFreeFullLiveLimited UI🐢 Hours per query
OAI-PMH harvest scriptsFreeFullCustomBuild your own⏳ Days
Generic library APIsVariesMixedVariesMixed schema🕒 Variable

Pick this Actor when you want clean DOAB records on demand without writing a harvester.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the DOAB Directory of Open Access Books Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a subject, language, or publisher (or leave defaults for a wide pull) and set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🏛️ Academic Libraries

  • Bulk catalog ingest for open-access monographs
  • Subject-curated reading lists for faculty
  • Discovery layer enrichment with DOIs and ISBNs
  • Collection gap analysis by language and discipline

🎓 OER Programs

  • Course-aligned open textbook discovery
  • License-filtered reading packs for syllabi
  • Cost-savings reporting for adoption committees
  • Faculty alerting on new subject releases

📊 Bibliometrics & Research

  • Open-access publishing trends by subject
  • Publisher market-share analyses
  • Multilingual scholarship mapping
  • Funder open-access compliance audits

📰 Publishers & Repositories

  • Competitive benchmarking against peers
  • Title-level metadata QA against canonical DOAB
  • Repository mirror builds with curated subsets
  • Marketing reports for OA-program funders

🔌 Automating DOAB Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep downstream library catalogs in sync automatically.


🌟 Beyond business use cases

Open scholarship has reach well beyond commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reproducible bibliographies for peer-reviewed papers
  • Digital humanities corpora with permissive licenses
  • Multilingual scholarship surveys across decades
  • Open-data exercises for library and information science

🎨 Personal and creative

  • Hobbyist reading queues sorted by subject and license
  • Indie publishing benchmarking and inspiration
  • Personal Zotero / Calibre library ingest
  • Travel-and-learn itineraries with regional reading lists

🤝 Non-profit and civic

  • Community college reading rooms with zero-cost texts
  • Prison-education and refugee-learning curricula
  • Civic literacy projects with public-domain anchors
  • Translator-volunteer pipelines for under-served languages

🧪 Experimentation

  • Train NLP models on permissively licensed long-form text
  • Prototype recommender systems for OA discovery
  • Build LLM tools that cite real, open scholarly sources
  • Test catalog-merging algorithms against canonical metadata

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Configure your search, language, subject, or publisher filters in the input form, click Start, and the Actor queries the DOAB catalog and emits a clean structured record per book. No browser automation, no captchas, no setup.

📏 How complete are the records?

Most records include title, authors, publisher, language, subjects, ISBN, and a download URL. Abstracts, DOIs, and license strings depend on what the contributing publisher submitted to DOAB. Older records can have sparser metadata than recent accessions.

🔁 How often is the catalog refreshed?

DOAB receives new and updated records continuously from contributing publishers. Every run of this Actor fetches live data, so new accessions appear automatically.

🗣️ Which languages are covered?

DOAB indexes books in 50+ languages, with strong representation in English, German, Spanish, French, Italian, Portuguese, and Dutch. Pass an ISO code to the language input to filter.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep a downstream catalog in sync.

DOAB metadata is openly available, and every indexed book carries an explicit open license (typically Creative Commons). Always check the per-record license string before redistributing the full-text downloads.

💼 Can I use this data commercially?

Yes for metadata. For the underlying book content, check the license field on each record. Most CC BY licenses permit commercial use with attribution.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved.

📚 Does it return the full book PDF?

The dataset returns direct download URLs per record. You can follow those URLs to fetch the PDF or EPUB files.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

DOAB Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe book metadata into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh book records into your library system, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more open-data and research scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DOAB, OAPEN, or any of its contributing publishers. All trademarks mentioned are the property of their respective owners. Only publicly available open-access catalog data is collected.