Actor picture

Example Selenium

apify/example-selenium

Example of loading a web page in headless Chrome using Selenium Webdriver.

No credit card required

Author's avatarApify
  • Modified
  • Users104
  • Runs6,119

Dockerfile

FROM apify/actor-node-chrome:beta

COPY package.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && npm list || true \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version \
 && echo "Google Chrome version:" \
 && bash -c "$APIFY_CHROME_EXECUTABLE_PATH --version" \
 && echo "ChromeDriver version:" \
 && chromedriver --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

# Optionally, specify how to launch the source code of your actor.
# By default, Apify's base Docker images define the CMD instruction
# that runs the source code using the command specified
# in the "scripts.start" section of the package.json file.
# In short, the instruction looks something like this:
# CMD npm start

INPUT_SCHEMA.json

{
    "title": "Input schema for Selenium example",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "url": {
            "title": "Url",
            "type": "string",
            "description": "Url to open with the selenium webdriver",
            "editor": "textfield",
            "prefill": "https://www.example.com"
        },
        "userAgent": {
            "title": "User agent",
            "type": "string",
            "description": "If you want to specify user agent to use, you can do it here",
            "editor": "textfield",
            "prefill": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
        },
        "proxy": {
            "title": "Proxy configuration",
            "type": "object",
            "description": "Select proxies to be used by your crawler.",
            "prefill": { "useApifyProxy": true },
            "editor": "proxy"
        }
    },
    "required": ["url"]
}

README.md

# Selenium controlled chrome example

This actor serves as an example of how to use selenium with chrome on Apify.

To use selenium with chrome in actor you need a dockerfile which has chrome
and chrome driver with correct versions installed in global path. You can use our
docker file [apify/actor-node-chrome](https://github.com/apifytech/apify-actor-docker/blob/master/node-chrome/Dockerfile) 
and [apify/actor-node-chrome-xvbf](https://github.com/apifytech/apify-actor-docker/blob/master/node-chrome-xvfb/Dockerfile) (for non-headless mode). 
Or can build your own custom file.

main.js

This file is 110 lines long. Only the first 50 are shown. Show all

const Apify = require('apify');
const { Capabilities, Builder, logging } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
const proxy = require('selenium-webdriver/proxy');
const { anonymizeProxy } = require('proxy-chain');

const launchChromeWebdriver = async (options) => {
    let anonymizedProxyUrl = null;

    // logging.installConsoleHandler();
    // logging.getLogger('webdriver.http').setLevel(logging.Level.ALL);

    // See https://github.com/SeleniumHQ/selenium/wiki/DesiredCapabilities for reference.
    const capabilities = new Capabilities();
    capabilities.set('browserName', 'chrome');

    // Chrome-specific options
    // By default, Selenium already defines a long list of command-line options
    // to enable browser automation, here we add a few other ones
    // (inspired by Lighthouse, see lighthouse/lighthouse-cli/chrome-launcher)
    const chromeOptions = new chrome.Options();
    chromeOptions.addArguments('--disable-translate');
    chromeOptions.addArguments('--safebrowsing-disable-auto-update');

    if (options.headless) {
        chromeOptions.addArguments('--headless', '--no-sandbox');
    }

    if (options.userAgent) {
        chromeOptions.addArguments(`--user-agent=${options.userAgent}`);
    } 

    if (options.extraChromeArguments) {
        chromeOptions.addArguments(options.extraChromeArguments);
    }

    const builder = new Builder();

    // For proxy servers with authentication, this class starts a local proxy server
    // NOTE: to view effective proxy settings in Chrome, open chrome://net-internals/#proxy
    if (options.proxyUrl) {
        const anonymizedProxyUrl = await anonymizeProxy(options.proxyUrl)
        chromeOptions.addArguments(`--proxy-server=${anonymizedProxyUrl}`);
    }

    const webDriver = builder
        .setChromeOptions(chromeOptions)
        .withCapabilities(capabilities)
        .build();

package.json

{
    "name": "selenium-chrome-example",
    "version": "0.0.1",
    "dependencies": {
        "apify": "^0.16.0",
        "proxy-chain": "^0.3.2",
        "selenium-webdriver": "^4.0.0-alpha.5"
    },
    "scripts": {
        "start": "node main.js"
    },
    "author": "Apify"
}