No credit card required

Free eBook Scraper

epctex/gutenberg-scraper

No credit card required

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferences. Discover literary gems in multiple formats.

Actor - Gutenberg.org Scraper

Gutenberg.org Scraper is an Apify actor for extracting data of ebooks from Gutenberg.org. It allows you to search for keywords and pick a language. It is build on top of Apify SDK and you can run it both on Apify platform and locally.

Gutenberg.org Scraper Input Parameters
Gutenberg.org Scraper Input Example
Gutenberg.org Scraper Ebook Output
Extend output function
Compute Unit Consumption
During The Run
Gutenberg.org Export

Gutenberg.org Scraper Input Parameters

The input of this scraper should be JSON containing the list of pages on Gutenberg that should be visited. Possible fields are:

Field	Type	Description
search	String	(optional)
language	Array	(optional) List of languages that Gutenberg provides. You can fetch all ebooks of a language with it
startUrls	Array	(optional) List of Gutenberg URLs. You should provide ony "search" or "browse" URLs
maxItems	Integer	(optional) Maximum number of items that output will contain
extendOutputFunction	string	Function that takes a JQuery handle ($) as argument and returns data that will be merged with the default output. More information in Extend output function
proxyConfig	Object	Proxy configuration

This solution requires the use of Proxy servers, either your own proxy servers or you can use Apify Proxy.

Gutenberg Scraper Input example

1{
2	"proxyConfig":{"useApifyProxy": true},
3	"startUrls":   [
4		{ "url": "https://www.gutenberg.org/browse/recent/last7" },
5    { "url": "https://www.gutenberg.org/browse/titles/h" }
6	]
7}

Gutenberg Ebook Output

The structure of each item in Gutenberg ebooks looks like this:

1{
2  "author": "United States. National Park Service",
3  "title": "Cumberland Island: Junior Ranger Program Activity Guide for Ages 5-7",
4  "language": "English",
5  "htmlURL": "https://www.gutenberg.org/files/61452/61452-h/61452-h.htm",
6  "epubURL": "https://www.gutenberg.org/ebooks/61452.epub.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf",
7  "kindleURL": "https://www.gutenberg.org/ebooks/61452.kindle.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf",
8  "plainTextURL": "https://www.gutenberg.org/files/61452/61452-0.txt"
9}

Extend output function

You can use this function to update the default output of this actor. This function gets a JQuery handle $ as an argument so you can choose what data from the page you want to scrape. The output from this will function will get merged with the default output.

The return value of this function has to be an object!

You can return fields to achive 3 different things:

Add a new field - Return object with a field that is not in the default output
Change a field - Return an existing field with a new value
Remove a field - Return an existing field with a value undefined

Compute Unit Consumption

The actor optimized to run blazing fast and scrape many product as possible. Therefore, it forefronts all product detail requests. If actor doesn't block very often it'll scrape ~250 products in 3 minutes with 0.0235 compute units.

During the Run

During the run, the actor will output messages letting you know what is going on. Each message always contains a short label specifying which page from the provided list is currently specified. When items are loaded from the page, you should see a message about this event with a loaded item count and total item count for each page.

If you provide incorrect input to the actor, it will immediately stop with failure state and output an explanation of what is wrong.

Gutenberg Export

During the run, the actor stores results into a dataset. Each item is a separate item in the dataset.

You can manage the results in any languague (Python, PHP, Node JS/NPM). See the FAQ or our API reference to learn more about getting results from this Gutenberg actor.

Contact

Please visit us through epctex.com to see all the products that is available for you. If you are looking for any custom integration or so, please reach out to us through the chat box in epctex.com. In need of support? devops@epctex.com is at your service.

Developer

epctex

Actor metrics

6 monthly users
100.0% runs succeeded
0.0 days response time
Created in Feb 2020
Modified about 14 hours ago

Categories

Jobs

Education

Other

Indeed Scraper

misceres/indeed-scraper

Scrape jobs posted on Indeed. Get detailed information from this job portal about saved and sponsored jobs. Specify the search based on location with the output attributes position, location, and description.

Misceres

2.2k

CarGurus Zipcode Search Scraper

tri_angle/cargurus-zipcode-search-scraper

Scraper to find all CarGurus car listings for a given zipcode in the United States of America

Tri⟁angle

Carmax Zipcode Search Scraper

tri_angle/carmax-zipcode-search-scraper

Scraper to find all Carmax car listings for a given zipcode in the United States of America

Tri⟁angle

Proxy Scraper

mstephen190/proxy-scraper

Free proxy scraper and checker. Search dozens of free proxy websites. Get list of 100% working public proxies in seconds. Automatically test proxies based on target URL and maximum timeout.

Matthias Stephens

2.4k

Bulk Image Downloader

trudax/bulk-image-downloader

Download all images from a website with this easy-to-use Bulk Image Downloader. Scrape all images from any website by URL to a zip file with a single click.

Gustavo Rudiger

1.7k

🔥 LinkedIn Jobs Scraper

bebity/linkedin-jobs-scraper

ℹ️ Designed for both personal and professional use, simply enter your desired job title and location to receive a tailored list of job opportunities. Try it today!

Bebity

1.6k

Clutch.co Scraper

epctex/clutchco-scraper

Unleash the power of data extraction with our Clutch.co Scraper. Gather comprehensive company information, numeric company focus, real client reviews, portfolios, and more from the extensive Clutch.co commercial database. Explore top company listings and perform targeted searches effortlessly.

epctex

1.2k

Ultimate Proxy Scraper

epctex/ultimate-proxy-scraper

Scrape thousands of proxies from 30+ sources. Get free, fast, and premium HTTP, HTTPS, SOCKS4, and SOCKS5 proxies. Enjoy anonymous, elite, and premium proxies for various applications. Maximize browsing speed, security, and data privacy with ease.

epctex

906

Onlyfans profile scraper

curious_coder/onlyfans-scraper

Onlyfans data extractor scrapes profiles in bulk with all details including, name, bio, last online, engagement insights, social media urls, etc

Curious Coder

579

Work Scraper

trudax/work-scraper

Extract data from the top freelancing websites. Search by URL or search terms, filter by categories, English level, and hourly rate. Get info about freelancers and agencies without login. Download your data as an HTML table, JSON, CSV, Excel, or XML.

Gustavo Rudiger

569

Top 10 free proxies for web scraping (Reviewed in 2024)

Proxy Scraper: how to get free proxies fast

7 best proxy scraper services in 2023

Build new tools

Are you a developer? Build your own Actors and run them on Apify.

Learn more

Get a custom solution

Get a custom web scraping or RPA solution.

Book a demo

Free eBook Scraper

Actor - Gutenberg.org Scraper