
Web Scraper
Pricing
Pay per usage
Go to Store

Web Scraper
Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
4.5 (22)
Pricing
Pay per usage
689
Total users
82.2k
Monthly users
4k
Runs succeeded
>99%
Issue response
32 days
Last modified
16 days ago
Dockerfile
# First, specify the base Docker image. You can read more about# the available images at https://sdk.apify.com/docs/guides/docker-images# You can also use any other image from Docker Hub.FROM apify/actor-node:16
# Second, copy just package.json and package-lock.json since those are the only# files that affect "npm install" in the next step, to speed up the build.COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to# keep the image small. Avoid logging too much and print the dependency# tree for debuggingRUN npm --quiet set progress=false \ && npm install --only=prod --no-optional \ && echo "Installed NPM packages:" \ && (npm list || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version
# Next, copy the remaining files and directories with the source code.# Since we do this after NPM install, quick build will be really fast# for most source file changes.COPY . ./
# Optionally, specify how to launch the source code of your actor.# By default, Apify's base Docker images define the CMD instruction# that runs the Node.js source code using the command specified# in the "scripts.start" section of the package.json file.# In short, the instruction looks something like this:## CMD npm start
main.js
1// This is the main Node.js source code file of your actor.2// It is referenced from the "scripts" section of the package.json file.3
4const Apify = require('apify');5
6Apify.main(async () => {7 // Get input of the actor.8 // If you'd like to have your input checked and generate a user9 // interface for it, add INPUT_SCHEMA.json file to your actor.10 // For more information, see https://docs.apify.com/actors/development/input-schema11 const input = await Apify.getInput();12 console.log('Input:');13 console.dir(input);14
15 // Do something useful here...16
17 // Save output18 const output = {19 receivedInput: input,20 message: 'Hello sir!',21 };22 console.log('Output:');23 console.dir(output);24 await Apify.setValue('OUTPUT', output);25});
package.json
{ "name": "my-actor", "version": "0.0.1", "dependencies": { "apify": "^2.2.2" }, "scripts": { "start": "node main.js" }, "author": "Me!"}