Crawlee + Playwright + Chrome

Web scraper example with Crawlee, Playwright and headless Chrome. Playwright is more modern, user-friendly and harder to block than Puppeteer.

src/main.ts

src/routes.ts

1/**
2 * This template is a production ready boilerplate for developing with `PlaywrightCrawler`.
3 * Use this to bootstrap your projects using the most up-to-date code.
4 * If you're looking for examples or want to learn more, see README.
5 */
6
7// For more information, see https://crawlee.dev
8import { PlaywrightCrawler } from '@crawlee/playwright';
9// For more information, see https://docs.apify.com/sdk/js
10import { Actor } from 'apify';
11
12// this is ESM project, and as such, it requires you to specify extensions in your relative imports
13// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
14// note that we need to use `.js` even when inside TS files
15import { router } from './routes.js';
16
17interface Input {
18    startUrls: {
19        url: string;
20        method?: 'GET' | 'HEAD' | 'POST' | 'PUT' | 'DELETE' | 'TRACE' | 'OPTIONS' | 'CONNECT' | 'PATCH';
21        headers?: Record<string, string>;
22        userData: Record<string, unknown>;
23    }[];
24    maxRequestsPerCrawl: number;
25}
26
27// Initialize the Apify SDK
28await Actor.init();
29
30// Structure of input is defined in input_schema.json
31const { startUrls = ['https://apify.com'], maxRequestsPerCrawl = 100 } =
32    (await Actor.getInput<Input>()) ?? ({} as Input);
33
34// `checkAccess` flag ensures the proxy credentials are valid, but the check can take a few hundred milliseconds.
35// Disable it for short runs if you are sure your proxy configuration is correct
36const proxyConfiguration = await Actor.createProxyConfiguration({ checkAccess: true });
37
38const crawler = new PlaywrightCrawler({
39    proxyConfiguration,
40    maxRequestsPerCrawl,
41    requestHandler: router,
42    launchContext: {
43        launchOptions: {
44            args: [
45                '--disable-gpu', // Mitigates the "crashing GPU process" issue in Docker containers
46            ],
47        },
48    },
49});
50
51await crawler.run(startUrls);
52
53// Exit successfully
54await Actor.exit();

PlaywrightCrawler template

This template is a production ready boilerplate for developing an Actor with PlaywrightCrawler. Use this to bootstrap your projects using the most up-to-date code.

We decided to split Apify SDK into two libraries, Crawlee and Apify SDK v3. Crawlee will retain all the crawling and scraping-related tools and will always strive to be the best web scraping library for its community. At the same time, Apify SDK will continue to exist, but keep only the Apify-specific features related to building Actors on the Apify platform. Read the upgrading guide to learn about the changes.

Resources

If you're looking for examples or want to learn more visit:

Crawlee + Apify Platform guide
Documentation and examples
Node.js tutorials in Academy
Scraping single-page applications with Playwright
How to scale Puppeteer and Playwright
Integration with Zapier , Make, GitHub, Google Drive and other apps
Video guide on getting scraped data using Apify API
A short guide on how to build web scrapers using code templates:

Crawlee + Cheerio

A scraper example that uses Cheerio to parse HTML. It's fast, but it can't run the website's JavaScript or pass JS anti-scraping challenges.

Starter

One‑Page HTML Scraper with Cheerio

Scrape single page with provided URL with Axios and extract data from page's HTML with Cheerio.

Starter

Crawlee + Puppeteer + Chrome

Example of a Puppeteer and headless Chrome web scraper. Headless browsers render JavaScript and are harder to block, but they're slower than plain HTTP.

Crawlee + Playwright + Camoufox

Web scraper example with Crawlee, Playwright and headless Camoufox. Camoufox is a custom stealthy fork of Firefox. Try this template if you're facing anti-scraping challenges.

Playwright + Chrome Test Runner

Example of using the Playwright Test project to run automated website tests in the cloud and display their results. Usable as an API.

Empty TypeScript project

Empty template with basic structure for the Actor with Apify SDK that allows you to easily add your own functionality.

Starter

Already have a solution in mind?

Sign up for a free Apify account and deploy your code to the platform in just a few minutes! If you want a head start without coding it yourself, browse our Store of existing solutions.

Import your code Go to store

PlaywrightCrawler template

Resources

Related templates

Already have a solution in mind?