DOAB Directory of Open Access Books Scraper
Pricing
from $11.00 / 1,000 result items
DOAB Directory of Open Access Books Scraper
Browse the Directory of Open Access Books (DOAB) with peer-reviewed academic titles. Capture title, authors, publisher, ISBN, DOI, subjects, language, publication year, abstract, and download URL. Export to JSON, CSV, or Excel for libraries, researchers, and content aggregation.
Pricing
from $11.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share

📚 DOAB Directory of Open Access Books Scraper
🚀 Export the global open-access book catalog in seconds. Pull 70,000+ peer-reviewed academic titles across 25+ subject areas in 50+ languages. No API key, no registration, no manual catalog scraping.
🕒 Last updated: 2026-05-23 · 📊 16 fields per record · 📚 70,000+ books · 🌍 50+ languages · 🏛️ 25+ subject areas
The DOAB Scraper exports the Directory of Open Access Books, a community-maintained catalog of peer-reviewed scholarly monographs and edited volumes that anyone can read, download, and redistribute. Each record carries 16 fields with authors, publishers, subjects, licenses, ISBNs, DOIs, abstracts, and direct download links to the full text. The underlying catalog is curated by libraries and publishers worldwide and is one of the most cited open-access references in higher education.
Coverage spans the humanities, social sciences, STEM, law, and the arts across 70,000+ titles, 700+ publishers, and 50+ languages. Every book is released under a Creative Commons or equivalent open license. This Actor turns that catalog into a CSV, Excel, JSON, or XML download in under five minutes.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Academic librarians, OER advocates, researchers, university publishers, digital humanities labs, repository managers | Library catalog enrichment, OER course design, bibliometric studies, repository ingest, open-access discovery, syllabus building |
📋 What the DOAB Scraper does
Four discovery workflows in a single run:
- 🔍 Full-text search. Query titles, authors, and abstracts across the entire catalog.
- 🗣️ Language filter. Restrict to English, German, Spanish, French, or any of 50+ languages.
- 🏛️ Subject filter. Narrow to philosophy, history, mathematics, sociology, and 25+ disciplines.
- 🆔 Direct handle lookup. Pull a single book by its DOAB handle when you already know the ID.
Each record includes UUID, DOAB handle, title, authors, publisher, language, subjects, ISBNs, DOI, abstract, license, publication date, and direct PDF/EPUB download URLs.
💡 Why it matters: open-access monographs are the backbone of modern OER programs and digital library collections. Building your own ingest pipeline means dealing with OAI-PMH, MARC, Dublin Core mappings, and inconsistent metadata. This Actor returns a clean, normalized record on every run.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to pull a subject-filtered slice into a downloadable dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
| searchQuery | string | "*" | Full-text query across titles, authors, abstracts. Use * for everything. |
| language | string | "" | ISO language code (e.g. en, de, es, fr). |
| subject | string | "" | Subject area (e.g. philosophy, mathematics, history). |
| publisher | string | "" | Publisher name filter. |
| handleId | string | "" | Direct lookup by DOAB handle. Overrides search. |
Example: 50 open-access philosophy books in English.
{"maxItems": 50,"searchQuery": "*","language": "en","subject": "philosophy"}
Example: every recent title from a specific publisher.
{"maxItems": 200,"publisher": "Cambridge University Press"}
⚠️ Good to Know: DOAB indexes peer-reviewed open-access scholarly books only. Conference proceedings, working papers, and trade titles fall outside the scope. License terms vary by title, so always check the per-record license field before redistribution.
📊 Output
Each book record contains 16 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 uuid | string | "3c2f7e91-4b88-..." |
🔗 handle | string | "20.500.12854/12345" |
📖 title | string | "Open Science by Design" |
👤 authors | array | ["Smith, Jane", "Doe, John"] |
🏢 publisher | string | "National Academies Press" |
🗣️ language | string | "English" |
🏛️ subjects | array | ["Open science", "Research policy"] |
🔖 isbn | array | ["978-0-309-31441-1"] |
🆔 doi | string | null | "10.17226/25116" |
📝 abstract | string | "This report explores..." |
⚖️ license | string | "CC BY 4.0" |
📅 publicationDate | string | "2018-07-19" |
⬇️ downloadUrls | array | ["https://library.oapen.org/..."] |
🔗 detailUrl | string | "https://directory.doabooks.org/handle/..." |
🕒 lastModified | ISO 8601 | "2024-11-02T00:00:00.000Z" |
🕒 scrapedAt | ISO 8601 | "2026-05-23T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📚 | Comprehensive coverage. 70,000+ peer-reviewed open-access books from 700+ publishers. |
| 🔍 | Flexible discovery. Search, language, subject, publisher, and handle lookups combine in a single run. |
| ⚖️ | License-aware. Every record carries an explicit Creative Commons (or equivalent) license string. |
| 🔗 | Persistent identifiers. DOAB handle, DOI, ISBN, and UUID for downstream cataloging and citation. |
| ⚡ | Fast. 10 books in under 5 seconds, 10,000 records in a few minutes. |
| 🔁 | Always fresh. Pulls live catalog data, so new accessions appear in the next run. |
| 🚫 | No authentication. Works against the public DOAB catalog. No login or key needed. |
📊 Open-access monographs are reshaping how universities build reading lists, fund publishing, and measure impact. This Actor turns that catalog into a queryable dataset.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ DOAB Scraper (this Actor) | $5 free credit, then pay-per-use | 70,000+ open-access books | Live per run | search, language, subject, publisher, handle | ⚡ 2 min |
| Manual catalog browsing | Free | Full | Live | Limited UI | 🐢 Hours per query |
| OAI-PMH harvest scripts | Free | Full | Custom | Build your own | ⏳ Days |
| Generic library APIs | Varies | Mixed | Varies | Mixed schema | 🕒 Variable |
Pick this Actor when you want clean DOAB records on demand without writing a harvester.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the DOAB Directory of Open Access Books Scraper page on the Apify Store.
- 🎯 Set input. Pick a subject, language, or publisher (or leave defaults for a wide pull) and set
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating DOAB Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep downstream library catalogs in sync automatically.
🌟 Beyond business use cases
Open scholarship has reach well beyond commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Configure your search, language, subject, or publisher filters in the input form, click Start, and the Actor queries the DOAB catalog and emits a clean structured record per book. No browser automation, no captchas, no setup.
📏 How complete are the records?
Most records include title, authors, publisher, language, subjects, ISBN, and a download URL. Abstracts, DOIs, and license strings depend on what the contributing publisher submitted to DOAB. Older records can have sparser metadata than recent accessions.
🔁 How often is the catalog refreshed?
DOAB receives new and updated records continuously from contributing publishers. Every run of this Actor fetches live data, so new accessions appear automatically.
🗣️ Which languages are covered?
DOAB indexes books in 50+ languages, with strong representation in English, German, Spanish, French, Italian, Portuguese, and Dutch. Pass an ISO code to the language input to filter.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep a downstream catalog in sync.
⚖️ Is this data legal to use?
DOAB metadata is openly available, and every indexed book carries an explicit open license (typically Creative Commons). Always check the per-record license string before redistributing the full-text downloads.
💼 Can I use this data commercially?
Yes for metadata. For the underlying book content, check the license field on each record. Most CC BY licenses permit commercial use with attribution.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.
🔁 What happens if a run fails or gets interrupted?
Apify automatically retries transient errors. If a run still fails, inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved.
📚 Does it return the full book PDF?
The dataset returns direct download URLs per record. You can follow those URLs to fetch the PDF or EPUB files.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
DOAB Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe book metadata into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh book records into your library system, or alert your team in Slack.
🔗 Recommended Actors
- 📖 arXiv Scraper - Open-access preprints across physics, math, and computer science
- 🧪 OSF Scraper - Open Science Framework projects and registrations
- 📊 Figshare Scraper - Research data and figures with DOIs
- 🌍 GBIF Biodiversity Scraper - Global biodiversity occurrence records
- 🩺 ClinicalTrials.gov Scraper - Registered clinical trials worldwide
💡 Pro Tip: browse the complete ParseForge collection for more open-data and research scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DOAB, OAPEN, or any of its contributing publishers. All trademarks mentioned are the property of their respective owners. Only publicly available open-access catalog data is collected.