
Cheerio Scraper
Pricing
Pay per usage
Go to Store

Cheerio Scraper
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
4.7 (11)
Pricing
Pay per usage
137
Total users
7.4k
Monthly users
771
Runs succeeded
>99%
Last modified
14 days ago
Dockerfile
# First, specify the base Docker image. You can read more about# the available images at https://sdk.apify.com/docs/guides/docker-images# You can also use any other image from Docker Hub.FROM apify/actor-node:16
# Second, copy just package.json and package-lock.json since those are the only# files that affect "npm install" in the next step, to speed up the build.COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to# keep the image small. Avoid logging too much and print the dependency# tree for debuggingRUN npm --quiet set progress=false \ && npm install --only=prod --no-optional \ && echo "Installed NPM packages:" \ && (npm list || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version
# Next, copy the remaining files and directories with the source code.# Since we do this after NPM install, quick build will be really fast# for most source file changes.COPY . ./
# Optionally, specify how to launch the source code of your actor.# By default, Apify's base Docker images define the CMD instruction# that runs the Node.js source code using the command specified# in the "scripts.start" section of the package.json file.# In short, the instruction looks something like this:## CMD npm start
main.js
1// This is the main Node.js source code file of your actor.2// It is referenced from the "scripts" section of the package.json file.3
4const Apify = require('apify');5
6Apify.main(async () => {7 // Get input of the actor.8 // If you'd like to have your input checked and generate a user9 // interface for it, add INPUT_SCHEMA.json file to your actor.10 // For more information, see https://docs.apify.com/actors/development/input-schema11 const input = await Apify.getInput();12 console.log('Input:');13 console.dir(input);14
15 // Do something useful here...16
17 // Save output18 const output = {19 receivedInput: input,20 message: 'Hello sir!',21 };22 console.log('Output:');23 console.dir(output);24 await Apify.setValue('OUTPUT', output);25});
package.json
{ "name": "my-actor", "version": "0.0.1", "dependencies": { "apify": "^2.2.2" }, "scripts": { "start": "node main.js" }, "author": "Me!"}