Playwright Scraper

No credit card required

Playwright Scraper

Playwright Scraper

apify/playwright-scraper

No credit card required

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn mode

import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://crawlee.dev"
        }
    ],
    "globs": [
        {
            "glob": "https://crawlee.dev/*/*"
        }
    ],
    "pseudoUrls": [],
    "excludes": [
        {
            "glob": "/**/*.{png,jpg,jpeg,pdf}"
        }
    ],
    "linkSelector": "a",
    "pageFunction": async function pageFunction(context) {
        const { page, request, log } = context;
        const title = await page.title();
        log.info(`URL: ${request.url} TITLE: ${title}`);
        return {
            url: request.url,
            title
        };
    },
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "initialCookies": [],
    "launcher": "chromium",
    "waitUntil": "networkidle",
    "preNavigationHooks": `// We need to return array of (possibly async) functions here.
        // The functions accept two arguments: the "crawlingContext" object
        // and "gotoOptions".
        [
            async (crawlingContext, gotoOptions) => {
                const { page } = crawlingContext;
                // ...
            },
        ]`,
    "postNavigationHooks": `// We need to return array of (possibly async) functions here.
        // The functions accept a single argument: the "crawlingContext" object.
        [
            async (crawlingContext) => {
                const { page } = crawlingContext;
                // ...
            },
        ]`,
    "customData": {}
};

(async () => {
    // Run the Actor and wait for it to finish
    const run = await client.actor("apify/playwright-scraper").call(input);

    // Fetch and print Actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();
Developer
Maintained by Apify
Actor stats
  • 446 users
  • 30.9k runs
  • Modified 5 days ago

You might also like these Actors