Goodreads Booklist Scraper avatar

Goodreads Booklist Scraper

Pricing

$4.99/month + usage

Go to Apify Store
Goodreads Booklist Scraper

Goodreads Booklist Scraper

Scrape book data from Goodreads including titles, authors, ratings, and publication info using AWS Lambda API.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

Goodreads Booklist Scraper

Powerful Apify Actor to scrape book information from Goodreads using AWS Lambda API.

Features

  • Multiple Search Queries: Scrape multiple search terms in one run
  • Concurrent Processing: Process multiple queries simultaneously (up to 10 concurrent requests)
  • Flexible Configuration: Custom pages and books per page for each query
  • Rate Limiting: Built-in concurrent request limiting
  • Error Handling: Robust error handling with detailed logging
  • AWS Lambda Integration: Uses serverless Lambda function for scraping

Input

Required Fields

searchQueries (Array)

List of search terms to scrape. Can be:

  • Simple strings: ["python programming", "machine learning"]
  • Objects with custom settings:
[
{
"searchTerm": "tolkien",
"pages": 5,
"booksPerPage": 30
}
]

Optional Fields

pages (Integer)

  • Default number of pages to scrape per search term
  • Range: 1-15
  • Default: 1

booksPerPage (Integer)

  • Default maximum books to return per page
  • Range: 1-100
  • Default: 20

maxConcurrentRequests (Integer)

  • Maximum number of concurrent API requests
  • Range: 1-10
  • Default: 5

Output

The actor stores results in the Apify dataset. Each result contains:

{
"search_term": "python programming",
"status": "success",
"pages_requested": 5,
"books_per_page": 15,
"data": {
"search_term": "python programming",
"total_pages_scraped": 5,
"total_books_found": 75,
"books": [
{
"id": "123456",
"title": "Python Crash Course",
"url": "https://www.goodreads.com/book/show/123456",
"cover_image": "https://...",
"authors": [
{
"name": "Eric Matthes",
"url": "https://www.goodreads.com/author/show/...",
"role": "Author"
}
],
"average_rating": 4.5,
"ratings_count": 12345,
"publication_year": 2019,
"publication_info": "published 2019",
"rank": 1
}
]
}
}

Example Input

Simple Example

{
"searchQueries": [
"python programming",
"machine learning",
"web development"
],
"pages": 3,
"booksPerPage": 25
}

Advanced Example

{
"searchQueries": [
"harry potter",
{
"searchTerm": "tolkien",
"pages": 10,
"booksPerPage": 50
},
{
"searchTerm": "stephen king",
"pages": 5,
"booksPerPage": 30
}
],
"pages": 1,
"booksPerPage": 20,
"maxConcurrentRequests": 3
}

Usage Limits

  • Pages per query: 1-15 (Lambda enforced)
  • Books per page: 1-100 (Lambda enforced)
  • Concurrent requests: 1-10 (Actor enforced)
  • Request timeout: 300 seconds (5 minutes)

Use Cases

  • 📖 Book research and analysis
  • 📊 Market research for publishers
  • 🎓 Academic research
  • 📚 Reading list generation
  • 🔍 Book discovery and recommendations

Error Handling

The actor handles various error scenarios:

  • Network errors: Timeout and connection issues
  • API errors: Invalid API key, rate limiting
  • Lambda errors: Scraping failures
  • Invalid input: Missing search queries

All errors are logged and included in the output for debugging.

Support

For issues or questions:

  • Check the actor logs for detailed error messages
  • Verify your environment variables are set correctly
  • Ensure your Lambda function is running and accessible