Project Gutenberg Scraper avatar

Project Gutenberg Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Project Gutenberg Scraper

Project Gutenberg Scraper

Scrape Project Gutenberg (gutenberg.org). Search 70K+ free public domain ebooks. Extract titles, authors, subjects, download formats (EPUB, Kindle, TXT, HTML), and full metadata.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

lulz bot

lulz bot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

16 hours ago

Last modified

Categories

Share

Scrape the Project Gutenberg free eBook catalog. Search 70,000+ public domain books by title, author, topic, or language. Get complete metadata, subjects, bookshelves, and download links for every format (EPUB, HTML, plain text, Kindle).

Features

  • Search by title/author: Find books by any keyword
  • Filter by topic: Browse by subject like "science fiction", "philosophy", "children"
  • Filter by language: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese
  • Full metadata: Authors with birth/death years, subjects, bookshelves, download counts
  • Download links: Direct URLs for EPUB, HTML, plain text, Kindle, and cover images
  • Pagination: Automatically follows paginated results up to your limit

Output Fields

FieldDescription
idGutenberg book ID
titleBook title
authorsArray of authors with name, birthYear, deathYear
subjectsArray of Library of Congress subjects
bookshelvesArray of Gutenberg bookshelves
languagesArray of language codes (e.g. "en", "fr")
downloadCountTotal download count
formatsObject with epub, html, txt, kindle, coverImage URLs
copyrightBoolean copyright status
mediaTypeMedia type (usually "Text")
scrapedAtISO timestamp

Input Options

  • Search Query: Search by title or author name
  • Topic: Filter by subject/bookshelf
  • Language: Filter by language
  • Max Results: Limit number of books (default 50, max 5000)

Use Cases

  • Digital library building: Bulk download public domain books
  • Literary research: Analyze authors, subjects, and popularity trends
  • NLP/AI training: Gather text corpora by language or topic
  • Education: Find free reading materials by subject area
  • Data journalism: Analyze most popular public domain works

Example Output

{
"id": 1342,
"title": "Pride and Prejudice",
"authors": [{"name": "Austen, Jane", "birthYear": 1775, "deathYear": 1817}],
"subjects": ["Courtship -- Fiction", "England -- Fiction", "Sisters -- Fiction"],
"bookshelves": ["Best Books Ever Listings"],
"languages": ["en"],
"downloadCount": 75892,
"formats": {
"epub": "https://www.gutenberg.org/ebooks/1342.epub3.images",
"html": "https://www.gutenberg.org/files/1342/1342-h/1342-h.htm",
"txt": "https://www.gutenberg.org/files/1342/1342-0.txt"
},
"copyright": false,
"scrapedAt": "2026-04-26T12:00:00.000Z"
}

Run on Apify

This scraper runs on the Apify platform -- a full-stack web scraping and automation cloud. Sign up for a free account to get started with 30-day trial of all features.

Try Apify free ->