Cheerio Scraper avatar
Cheerio Scraper

Pricing

Pay per usage

Go to Store
Cheerio Scraper

Cheerio Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

4.7 (11)

Pricing

Pay per usage

137

Total users

7.4k

Monthly users

771

Runs succeeded

>99%

Last modified

14 days ago

Dockerfile

# First, specify the base Docker image. You can read more about
# the available images at https://sdk.apify.com/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:16
# Second, copy just package.json and package-lock.json since those are the only
# files that affect "npm install" in the next step, to speed up the build.
COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
&& npm install --only=prod --no-optional \
&& echo "Installed NPM packages:" \
&& (npm list || true) \
&& echo "Node.js version:" \
&& node --version \
&& echo "NPM version:" \
&& npm --version
# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./
# Optionally, specify how to launch the source code of your actor.
# By default, Apify's base Docker images define the CMD instruction
# that runs the Node.js source code using the command specified
# in the "scripts.start" section of the package.json file.
# In short, the instruction looks something like this:
#
# CMD npm start

main.js

1// This is the main Node.js source code file of your actor.
2// It is referenced from the "scripts" section of the package.json file.
3
4const Apify = require('apify');
5
6Apify.main(async () => {
7 // Get input of the actor.
8 // If you'd like to have your input checked and generate a user
9 // interface for it, add INPUT_SCHEMA.json file to your actor.
10 // For more information, see https://docs.apify.com/actors/development/input-schema
11 const input = await Apify.getInput();
12 console.log('Input:');
13 console.dir(input);
14
15 // Do something useful here...
16
17 // Save output
18 const output = {
19 receivedInput: input,
20 message: 'Hello sir!',
21 };
22 console.log('Output:');
23 console.dir(output);
24 await Apify.setValue('OUTPUT', output);
25});

package.json

{
"name": "my-actor",
"version": "0.0.1",
"dependencies": {
"apify": "^2.2.2"
},
"scripts": {
"start": "node main.js"
},
"author": "Me!"
}