Actor picture

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

No credit card required

Author's avatarApify Technologies
  • Modified
  • Users1,065
  • Runs774,065

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running actors via the API in Apify Docs.

const { ApifyClient } = require('apify-client');

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com"
        }
    ],
    "pseudoUrls": [
        {
            "purl": "https://apify.com[(/[\\w-]+)?]"
        }
    ],
    "linkSelector": "a",
    "pageFunction": async function pageFunction(context) {
        const { page, request, log } = context;
        const title = await page.title();
        log.info(`URL: ${request.url} TITLE: ${title}`);
        return {
            url: request.url,
            title
        };
    },
    "preNavigationHooks": `// We need to return array of (possibly async) functions here.
        // The functions accept two arguments: the "crawlingContext" object
        // and "gotoOptions".
        [
            async (crawlingContext, gotoOptions) => {
                const { page } = crawlingContext;
                // ...
            },
        ]`,
    "postNavigationHooks": `// We need to return array of (possibly async) functions here.
        // The functions accept a single argument: the "crawlingContext" object.
        [
            async (crawlingContext) => {
                const { page } = crawlingContext;
                // ...
            },
        ]`,
    "proxyConfiguration": {
        "useApifyProxy": false
    },
    "initialCookies": [],
    "waitUntil": [
        "networkidle2"
    ],
    "customData": {}
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("apify/puppeteer-scraper").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();