Pricing

from $0.01 / 1,000 results

BooktoScrape.com

This scraper extracts book information from books.toscrape.com, automatically crawling all pages and saving titles, prices, availability, and URLs to the Apify Dataset. Built with Apify SDK and PuppeteerCrawler for reliable cloud and local execution.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Yugesh

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Features

Scrapes all paginated listing pages from books.toscrape.com
Extracts:
- Title
- Price
- Availability text
- URL of the product page
Streams results directly into Apify Dataset
Automatically discovers and crawls next pages
Works on both local environment and Apify cloud
Uses Apify.main() for proper actor lifecycle handling

Project Structure

scraper.js - Main actor script package.json - Dependencies and start command actor.json - Actor metadata for Apify platform README.txt - Documentation (this file)

Local Setup

Install dependencies: npm install
Start the scraper: node scraper.js
Output will be written to the local Apify dataset folder: apify_storage/datasets/default/

Running on Apify Cloud

Upload or push this project to your Apify actor.
Build the actor.
Run the actor.
After completion, open the Dataset tab to download results as: JSON, JSONL, CSV, Excel, or XML.

How the Scraper Works

Starts at page: https://books.toscrape.com/catalogue/page-1.html
Parses each product block using Cheerio.
Pushes each record to the Apify Dataset using: Apify.pushData({ title, price, availability, url });
Detects and enqueues pagination links.
Continues until no more pages exist.

Output Example

{ "title": "A Light in the Attic", "price": "£51.77", "availability": "In stock", "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html" }

Technologies Used

Node.js
Apify SDK
PuppeteerCrawler
Axios
Cheerio

Configuration Notes

Adjust concurrency in scraper.js: maxConcurrency: 5
Increasing this value speeds up crawling but increases load.
A CheerioCrawler version can also be created if required.

Troubleshooting

Empty dataset: Parsing may have failed. Check logs.
"Module not found" on Apify: Add missing dependencies to package.json.
Actor finishes too quickly: Pagination may not be detected.

Books to Scrape Actor

keeran11/books-to-scrape-actor

This is the automation script which extract the Quotes form the https://books.toscrape.com/ website

Kiran Acharya

Scrapy Books Example

vdusek/scrapy-books-example

Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.

Vlada Dusek

Apify Unofficial SDK

jupri/apify-unofficial-sdk

Apify Unofficial SDK in Other Languanges

cat

Goodreads Books Scraper

shahidirfan/Goodreads-Book-Scraper

Efficiently extract detailed book data with the Goodreads Books Scraper. Ideal for building reading lists or analyzing metadata. Note: For bulk scraping of more than 50 books, providing JSON cookies is essential to ensure seamless access and reliable results.

Shahid Irfan

5.0

📚 Goodreads Book Scraper

easyapi/goodreads-book-scraper

Extract comprehensive book data from Goodreads search results. Get detailed information about books, authors, ratings, and more. Perfect for market research, data analysis, and building book recommendation systems. 🔍📚

EasyApi

Email Search

rapidtech1898/email-search

Extract all email addresses from multiple websites with ease. Simply provide the URLs and the scraper automatically finds and collects all available emails. Perfect for lead generation, outreach, and research, saving you time and manual effort.

Max Pohler

3.0

Goodreads Book Scraper

api-empire/goodreads-book-scraper

Scrape detailed book data with the Apify Goodreads Book Scraper. Extract titles, authors, ratings, reviews, genres, and publication info. Perfect for research, book analytics, and recommendation systems. Fast, accurate, and easy to integrate into any automation workflow.

API Empire

Open Library Book Search

agenscrape/open-library-book-search

Search millions of books from Open Library database. Extract book titles, authors, ISBN, publishers, publication years, cover images, and availability status. Perfect for bibliographic research and book databases.

Agenscrape

Amazon Books Reviews Actor

getdataforme/amazon-books-reviews-actor

Extract book reviews, ratings, and descriptions from Amazon Kindle pages with structured JSON output. Ideal for sentiment analysis, author monitoring, and review aggregation. Just provide book URLs and get rich review data fast and reliably.

GetDataForMe

Universal Apify Email & Metadata Scraper (Puppeteer + Crawlee)

lucrateresults/universal-apify-email-metadata-scraper-puppeteer-crawlee

Description: A production-ready Apify actor built with PuppeteerCrawler (Crawlee) to extract emails and metadata from public websites. Optimized for parallel crawling, JavaScript rendering, and IP rotation. Disclaimer: Scrape only public data. Respect each site’s terms.