Webpage DOM & CSS Analyzer avatar

Webpage DOM & CSS Analyzer

Deprecated
Go to Store
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Webpage DOM & CSS Analyzer

Webpage DOM & CSS Analyzer

jancurn/example-analyze-dom-css

Example showing how to use headless Chromium with Puppeteer to open a web page, fetch the list of DOM nodes on the pages and obtain CSS styling information for each HTML element. The actor uses the Chrome DevTools Protocol to access the required browser functionality.

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-puppeteer-chrome:15-10.1.0
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY --chown=myuser:myuser . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

main.js

1const Apify = require('apify');
2
3Apify.main(async () => {
4    const input = await Apify.getValue('INPUT');
5    
6    if (!input || !input.url) throw new Error('Invalid input, must be a JSON object with the "url" field!');
7    
8    console.log('Launching Puppeteer...');
9    const browser = await Apify.launchPuppeteer();
10    
11    console.log(`Opening URL: ${input.url}`);  
12    const page = await browser.newPage();
13    await page.goto(input.url);
14    
15    console.log(`Starting a CDP session`);
16    const client = await page.target().createCDPSession();
17    await client.send('DOM.enable');
18    await client.send('CSS.enable');
19    
20    console.log('Fetching list of DOM nodes');
21    const nodes = (await client.send('DOM.getFlattenedDocument')).nodes;
22    
23    console.log(`Analyzing CSS for each of the ${nodes.length} node`);
24    for (let i=0; i<nodes.length; i++) {
25        const node = nodes[i];
26        if (node.nodeType === 1) {
27            node.matchedStyle = await client.send('CSS.getMatchedStylesForNode', {
28                nodeId: node.nodeId
29            });
30        }
31    };
32    
33    await Apify.setValue('OUTPUT', nodes);
34});

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "1.3.4",
9        "puppeteer": "10.1.0"
10    },
11    "scripts": {
12        "start": "node main.js"
13    }
14}
Developer
Maintained by Community