Actor picture

Screenshot Taker

jancurn/screenshot-taker

Takes a screenshot of one or more web pages using the Chrome browser. The actor enables the setting of custom viewport size, page load timeout, delay, proxies, and output image format.

No credit card required

Author's avatarJan Čurn
  • Modified
  • Users149
  • Runs62,994

Dockerfile

FROM apify/actor-node-puppeteer-chrome:16

# Second, copy just package.json since it should be the only file
# that affects NPM install in the next step
COPY package.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && (npm list || true) \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

INPUT_SCHEMA.json

This file is 103 lines long. Only the first 50 are shown. Show all

{
    "title": "Schema for the actor",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "urls": {
            "title": "Page URLs",
            "type": "array",
            "description": "List of URLs of web pages to take the screenshot of.",
            "prefill": [
                { "url": "https://www.example.com" },
                { "url": "https://sdk.apify.com" }
            ],
            "editor": "requestListSources"
        },
        "pageLoadTimeoutSecs": {
            "title": "Page load timeout",
            "type": "integer",
            "description": "Timeout for the web page load, in seconds. If the web page does not load in this time frame, it is considered to have failed and will be retried, similarly as with other page load errors.",
            "minimum": 1,
            "maximum": 180,
            "default": 60,
            "unit": "seconds"
        },
        "pageMaxRetryCount": {
            "title": "Page retry count",
            "type": "integer",
            "description": "How many times to retry to load the page on error or timeout.",
            "minimum": 0,
            "maximum": 10,
            "default": 2
        },
        "waitUntil": {
            "title": "Wait until",
            "type": "string",
            "description": "Indicates when to consider the navigation to the page as succeeded. For more details, see <code>waitUntil</code> parameter of <a href='https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options' target='_blank' rel='noopener'>Page.goto()</a> function in Puppeteer documention.",
            "default": "load",
            "enum": [
                "load",
                "domcontentloaded",
                "networkidle0",
                "networkidle2"
            ],
            "enumTitles": [
                "The load event is fired (load)",
                "The DOMContentLoaded event is fired (domcontentloaded)",
                "There are no more than 0 network connections for at least 500 ms (networkidle0)",
                "There are no more than 2 network connections for at least 500 ms (networkidle2)"
            ],
            "editor": "select"

README.md

# Screenshot taker

This Apify actor takes a screenshot of one or more
web pages using Chrome browser.
The actor enables the setting of custom viewport size,
page load timeout, delay, proxies, and output image format.

## Results

The screenshots are stored in the the default key-value store
associated with the actor run. For each web page on input,
the dataset contains a record such as:

```
{
  "request": {
    "url": "https://www.example.com",
    "method": "GET",
    "payload": null,
    "userData": {}
  },
  "response": {
    "status": 200,
    "headers": {
      "status": "200",
      "content-encoding": "gzip",
      "cache-control": "max-age=604800",
      "content-type": "text/html; charset=UTF-8",
      "content-length": "606"
    }
  },
  "finishedAt": "2019-07-14T16:16:56.230Z",
  "screenshotUrl": "https://api.apify.com/v2/key-value-stores/x2xiRLsycdTpFQFSo/records/screenshot-2c730012.jpeg"
}
```

If an error occurs during loading or processing of a web page,
the page is retried (up to `pageMaxRetryCount` times - see input schema).
If the error persists,
the resulting dataset will contain a record such as the following:

```
{
  "request": {
    "url": "https://non-existing-page.net",
    "method": "GET",
    "payload": null,
    "userData": {}
  },
  "response": null,
  "finishedAt": "2019-07-14T16:24:41.257Z",
  "errorMessages": [
    "Error: net::ERR_NAME_NOT_RESOLVED at https://non-existing-page.net\n    at navigate ...",
    "Error: net::ERR_NAME_NOT_RESOLVED at https://non-existing-page.net\n    at navigate ...",
    "Error: net::ERR_NAME_NOT_RESOLVED at https://non-existing-page.net\n    at navigate ..."
  ]
}
```

main.js

This file is 106 lines long. Only the first 50 are shown. Show all

const _ = require('underscore');
const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();

    const requestList = await Apify.openRequestList('my-urls', input.urls);
    const keyValueStore = await Apify.openKeyValueStore();

    const proxyConfiguration = await Apify.createProxyConfiguration()

    const crawler = new Apify.BasicCrawler({
        requestList,
        handleRequestTimeoutSecs: 120,
        maxRequestRetries: input.pageMaxRetryCount,

        handleRequestFunction: async ({ request }) => {
            
            // BEFORE PAGE IS NAVIGATED TO
            // Create browser instance with or without userAgent or proxy set.
            const browser = await Apify.launchPuppeteer({
                proxyUrl: proxyConfiguration.newUrl(),
                useChrome: true,
                stealth: true,
            });
            const page = await browser.newPage();

            if (input.viewportWidth || input.viewportHeight) {
                log(request, `Setting page viewport to ${input.viewportWidth}x${input.viewportHeight}`);
                await page.setViewport({
                    width: input.viewportWidth,
                    height: input.viewportHeight
                });
            }

            log(request, 'Loading page');
            const response = await page.goto(request.url, {
                timeout: input.pageLoadTimeoutSecs * 1000,
            });
            request.response = {
                status: response.status(),
                headers: response.headers(),
            };

            // Wait (if requested)
            if (input.delaySecs > 0) {
                await new Promise(resolve => setTimeout(resolve, input.delaySecs * 1000));
            }

            log(request, `Taking screenshot`);

package.json

{
    "name": "my-actor",
    "version": "0.0.1",
    "dependencies": {
        "apify": "^2.0.0",
        "puppeteer": "*"
    },
    "scripts": {
        "start": "node main.js"
    },
    "author": "Me!"
}