Web Scraper avatar
Web Scraper

Pricing

Pay per usage

Go to Store
Web Scraper

Web Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

4.5 (22)

Pricing

Pay per usage

689

Total users

82.2k

Monthly users

4k

Runs succeeded

>99%

Issue response

32 days

Last modified

16 days ago

Dockerfile

# First, specify the base Docker image. You can read more about
# the available images at https://sdk.apify.com/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:16
# Second, copy just package.json and package-lock.json since those are the only
# files that affect "npm install" in the next step, to speed up the build.
COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
&& npm install --only=prod --no-optional \
&& echo "Installed NPM packages:" \
&& (npm list || true) \
&& echo "Node.js version:" \
&& node --version \
&& echo "NPM version:" \
&& npm --version
# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./
# Optionally, specify how to launch the source code of your actor.
# By default, Apify's base Docker images define the CMD instruction
# that runs the Node.js source code using the command specified
# in the "scripts.start" section of the package.json file.
# In short, the instruction looks something like this:
#
# CMD npm start

main.js

1// This is the main Node.js source code file of your actor.
2// It is referenced from the "scripts" section of the package.json file.
3
4const Apify = require('apify');
5
6Apify.main(async () => {
7 // Get input of the actor.
8 // If you'd like to have your input checked and generate a user
9 // interface for it, add INPUT_SCHEMA.json file to your actor.
10 // For more information, see https://docs.apify.com/actors/development/input-schema
11 const input = await Apify.getInput();
12 console.log('Input:');
13 console.dir(input);
14
15 // Do something useful here...
16
17 // Save output
18 const output = {
19 receivedInput: input,
20 message: 'Hello sir!',
21 };
22 console.log('Output:');
23 console.dir(output);
24 await Apify.setValue('OUTPUT', output);
25});

package.json

{
"name": "my-actor",
"version": "0.0.1",
"dependencies": {
"apify": "^2.2.2"
},
"scripts": {
"start": "node main.js"
},
"author": "Me!"
}