# Open Library Scraper — Book Metadata in Bulk (`devilscrapes/openlibrary-books-scraper`) Actor

Search the Open Library API (the Internet Archive's open book catalogue) and export structured book metadata — title, authors, ISBNs, subjects, publish year, cover URL, edition count, OpenLibrary ID — to JSON or CSV. We handle pagination and retries across 30M+ works.

- **URL**: https://apify.com/devilscrapes/openlibrary-books-scraper.md
- **Developed by:** [DevilScrapes](https://apify.com/devilscrapes) (community)
- **Categories:** News, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<div align="center">
  <img src=".actor/icon.svg" width="160" alt="Devil Scrapes mark" />

## Open Library Scraper — Book Metadata in Bulk

**💰 $1.50 / 1 000 results** &nbsp;·&nbsp; pay only for results &nbsp;·&nbsp; no credit card to try

_We do the dirty work so your dataset stays clean._ 😈

Search Open Library (the Internet Archive's open book catalogue) and get structured book metadata — title, authors, ISBNs, subjects, publish year, cover image, edition count, OpenLibrary ID. We handle the pagination, retries, and rate-limit pacing so you get clean typed rows across the catalogue's 30M+ works.

</div>

---

### 🎯 What this scrapes

Open Library is the Internet Archive's catalogue of 30M+ works — the open, canonical bibliographic source that most reliable book-metadata pipelines lean on. When Goodreads shut their developer API in 2020, they left a gap that five years later developers are still Googling around. Open Library fills it: no licensing hurdles, no API key friction, free bulk export — if you can navigate the pagination and handle the upstream's occasional rate-limiting.

This Actor turns a free-form query (title, author, ISBN, subject) into typed dataset rows with cover URL, subjects, edition count, and the canonical Open Library key. We pace requests against the upstream, retry on transient errors, and surface partial successes loudly — so your library, recommender, or research dataset gets the rows it expects.

### 🔥 What we handle for you

- 🛡️ **Browser fingerprint rotation** — `curl-cffi` impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
- 🌐 **Residential proxy rotation** via Apify Proxy — fresh session and exit IP on every block.
- 🔁 **Retries with exponential backoff** on `408 / 429 / 5xx` — up to 5 attempts per page, `Retry-After` honoured.
- 🧱 **Rate-limit-aware pacing** — when the target pushes back, we slow down instead of getting banned.
- 🧊 **Clean, typed dataset rows** — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 **Pay-Per-Event pricing** — you only pay for results that hit your dataset. No data, no charge.

### 💡 Use cases

- **Goodreads alternative API** — rebuild the bibliographic data layer Goodreads took away in 2020. Title, authors, ISBNs, subjects, cover URL, edition count — the fields every book-rec app needs.
- **ISBN lookup at bulk scale** — enrich a CSV of book titles with ISBNs, authors, and covers in one run. Better unit economics than a per-request ISBN lookup API.
- **Free book metadata API** — feed a reading-list dashboard, a library catalogue app, or a fiction-RAG backend with structured Open Library data. No licensing restrictions on bibliographic metadata.
- **Discovery pipelines** — list every Asimov novel + edition count for a fan-site backend, or enumerate every title tagged "machine learning" for a curated reading list.
- **Digital humanities** — seed subject-tag corpora for distant-reading research, cultural-analytics, or AI-tutor curriculum ingestion.

### ⚙️ How to use it

1. Click **Try for free** at the top of the page.
2. Fill in the input form — most fields have sensible defaults.
3. Click **Start**. Output streams into the run's dataset.
4. Export from **Storage → Dataset** as JSON, CSV, or Excel — or fetch via the API.

### 📥 Input

| Field | Type | Required | Default | Notes |
|---|---|:--:|---|---|
| `searchQuery` | `string` | **yes** | 'isaac asimov foundation' | Free-text search. Open Library matches across title, author, subject, ISBN. |
| `searchField` | `string` | no | 'all' | Narrow which field your query targets. <code>all</code> matches everywhere. |
| `maxResults` | `integer` | no | 30 | Max books to return. API caps per page at 100; we paginate. |
| `language` | `string` | no | '' | 3-letter ISO-639-2 code, e.g. <code>eng</code>, <code>spa</code>, <code>fre</code>. Leave empty for all. |
| `proxyConfiguration` | `object` | no | {'useApifyProxy': False} | Open Library is open. Proxy optional. |

#### Example input

```json
{
  "searchQuery": "foundation asimov",
  "searchField": "all",
  "maxResults": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
````

### 📤 Output

Every row is one dataset item.

| Field | Type | Notes |
|---|---|---|
| `openlibrary_key` | `string` | Open Library work key (e.g. `/works/OL12345W`). |
| `title` | `string` | Work title. |
| `subtitle` | `['string', 'null']` | Subtitle, when present. |
| `authors` | `array` | Author names. |
| `first_publish_year` | `['integer', 'null']` | Earliest publication year recorded. |
| `edition_count` | `integer` | Number of editions Open Library tracks. |
| `languages` | `array` | Language codes detected across editions. |
| `subjects` | `array` | Subject tags (up to 30, truncated). |
| `isbns` | `array` | ISBNs detected (10 and 13). |
| `publishers` | `array` | Publishers across editions (deduped). |
| `cover_id` | `['integer', 'null']` | Open Library cover image ID. |
| `cover_url_l` | `['string', 'null']` | Large cover image URL. |
| `ratings_average` | `['number', 'null']` | Average rating where Open Library has one. |
| `ratings_count` | `['integer', 'null']` | Rating count. |
| `ebook_access` | `['string', 'null']` | Open Library's e-book availability — public, borrowable, no\_ebook, printdisabled. |
| `work_url` | `string` | Canonical Open Library URL. |
| `scraped_at` | `string` | When this row was recorded. |

#### Example output

```json
{
  "openlibrary_key": "/works/OL471576W",
  "title": "Foundation",
  "authors": [
    "Isaac Asimov"
  ],
  "first_publish_year": 1951,
  "edition_count": 142,
  "work_url": "https://openlibrary.org/works/OL471576W"
}
```

### 💰 Pricing

Pay-Per-Event — you pay only when these events fire:

| Event | USD | What it is |
|---|---:|---|
| `actor-start` | $0.005 | One-off warm-up charge per run |
| `result` | $0.0015 | Per dataset item |

Example: 1 000 results at the rates above ≈ **$1.50**. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

### 🚧 Limitations

- Search uses Open Library's relevance ranking — for canonical bibliographic data (LCSH/Dewey), use a dedicated MARC source. Subjects are tags, not curated taxonomies.
- This Actor exports **metadata only** — titles, ISBNs, authors, subjects, cover URLs, publish years. It does not download book text or full-text content. For public-domain full-text, follow `work_url` to the Internet Archive reader.
- Open Library has thinner rating data than Goodreads. Treat `ratings_average` with caution for niche or older works.

### ❓ FAQ

**Is this a Goodreads alternative API?**

Yes, for the bibliographic data layer. Goodreads shut their developer API in December 2020. Open Library provides the same core fields — title, authors, ISBNs, subjects, cover URL, edition count — under a fully open licence. This Actor is the managed bulk-export layer on top of that catalogue.

**Can I do ISBN lookup in bulk?**

Yes. Pass `searchField: "isbn"` with a specific ISBN-10 or ISBN-13 as `searchQuery`, or use `searchField: "all"` with a title + author combination to retrieve ISBNs at scale. Each result row returns the full `isbns` array for all editions of a work.

**Where's the book description / blurb?**

The search API doesn't include long descriptions; for those, follow up with `/works/{key}.json`. We surface enough to enrich a catalogue or recommendation engine.

**Why are some ISBNs missing?**

Older works weren't always catalogued with ISBNs. We return what Open Library has.

**Can I download the book text?**

Not via this Actor — we export metadata only. Visit `work_url` and follow Open Library's reader flow for public-domain full text.

**What about the Open Library API directly?**

Open Library's `/search.json` endpoint is public, but handling pagination, rate-limit pacing, retries, and clean typed output at scale is the work this Actor absorbs. We handle the blocks so you get consistent rows.

**Is the data licensed for commercial use?**

Open Library's bibliographic metadata is released under CC0 (public domain). Always verify the licence terms for your specific use case at openlibrary.org.

### 💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an
issue on the Actor's **Issues** tab on Apify Console — we ship
fixes weekly and we read every report.

***

<div align="center">

Built by **[Devil Scrapes](https://apify.com/DevilScrapes)** 😈 — a small fleet of
opinionated public-data Actors. Honest pricing, real engineering, zero fine print.

</div>

# Actor input Schema

## `searchQuery` (type: `string`):

Free-text search. Open Library matches across title, author, subject, ISBN.

## `searchField` (type: `string`):

Narrow which field your query targets. <code>all</code> matches everywhere.

## `maxResults` (type: `integer`):

Max books to return. API caps per page at 100; we paginate.

## `language` (type: `string`):

3-letter ISO-639-2 code, e.g. <code>eng</code>, <code>spa</code>, <code>fre</code>. Leave empty for all.

## `proxyConfiguration` (type: `object`):

Open Library is open. Proxy optional.

## Actor input object example

```json
{
  "searchQuery": "title:dune author:herbert",
  "searchField": "all",
  "maxResults": 30,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `datasetItems` (type: `string`):

All dataset items as JSON.

## `datasetItemsCsv` (type: `string`):

Same data exported to CSV.

## `datasetView` (type: `string`):

Open the run dataset in the Console.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "isaac asimov foundation",
    "language": "",
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("devilscrapes/openlibrary-books-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "isaac asimov foundation",
    "language": "",
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("devilscrapes/openlibrary-books-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "isaac asimov foundation",
  "language": "",
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call devilscrapes/openlibrary-books-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=devilscrapes/openlibrary-books-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Open Library Scraper — Book Metadata in Bulk",
        "description": "Search the Open Library API (the Internet Archive's open book catalogue) and export structured book metadata — title, authors, ISBNs, subjects, publish year, cover URL, edition count, OpenLibrary ID — to JSON or CSV. We handle pagination and retries across 30M+ works.",
        "version": "0.4",
        "x-build-id": "wbDguLwfU62JnOPmy"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/devilscrapes~openlibrary-books-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-devilscrapes-openlibrary-books-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/devilscrapes~openlibrary-books-scraper/runs": {
            "post": {
                "operationId": "runs-sync-devilscrapes-openlibrary-books-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/devilscrapes~openlibrary-books-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-devilscrapes-openlibrary-books-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "searchQuery"
                ],
                "properties": {
                    "searchQuery": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Free-text search. Open Library matches across title, author, subject, ISBN."
                    },
                    "searchField": {
                        "title": "Field hint",
                        "enum": [
                            "all",
                            "title",
                            "author",
                            "subject",
                            "isbn"
                        ],
                        "type": "string",
                        "description": "Narrow which field your query targets. <code>all</code> matches everywhere.",
                        "default": "all"
                    },
                    "maxResults": {
                        "title": "Max books",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Max books to return. API caps per page at 100; we paginate.",
                        "default": 30
                    },
                    "language": {
                        "title": "Language filter (3-letter ISO)",
                        "type": "string",
                        "description": "3-letter ISO-639-2 code, e.g. <code>eng</code>, <code>spa</code>, <code>fre</code>. Leave empty for all."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Open Library is open. Proxy optional.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
