Crawlee + Playwright + Chrome
Web scraper example with Crawlee, Playwright and headless Chrome. Playwright is more modern, user-friendly and harder to block than Puppeteer.
src/main.js
src/routes.js
1/**
2 * This template is a production ready boilerplate for developing with `PlaywrightCrawler`.
3 * Use this to bootstrap your projects using the most up-to-date code.
4 * If you're looking for examples or want to learn more, see README.
5 */
6
7// For more information, see https://docs.apify.com/sdk/js
8import { Actor } from 'apify';
9// For more information, see https://crawlee.dev
10import { PlaywrightCrawler } from 'crawlee';
11// this is ESM project, and as such, it requires you to specify extensions in your relative imports
12// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
13import { router } from './routes.js';
14
15// Initialize the Apify SDK
16await Actor.init();
17
18const {
19 startUrls = ['https://crawlee.dev'],
20} = await Actor.getInput() ?? {};
21
22const proxyConfiguration = await Actor.createProxyConfiguration();
23
24const crawler = new PlaywrightCrawler({
25 proxyConfiguration,
26 requestHandler: router,
27 launchContext: {
28 launchOptions: {
29 args: [
30 '--disable-gpu', // Mitigates the "crashing GPU process" issue in Docker containers
31 ]
32 }
33 }
34});
35
36await crawler.run(startUrls);
37
38// Exit successfully
39await Actor.exit();
PlaywrightCrawler template
This template is a production-ready boilerplate for developing an Actor with PlaywrightCrawler
. Use this to bootstrap your projects using the most up-to-date code.
We decided to split Apify SDK into two libraries, Crawlee and Apify SDK v3. Crawlee will retain all the crawling and scraping-related tools and will always strive to be the best web scraping library for its community. At the same time, Apify SDK will continue to exist, but keep only the Apify-specific features related to building actors on the Apify platform. Read the upgrading guide to learn about the changes.
Resources
If you're looking for examples or want to learn more visit:
- Crawlee + Apify Platform guide
- Documentation and examples
- Node.js tutorials in Academy
- Scraping single-page applications with Playwright
- How to scale Puppeteer and Playwright
- Integration with Zapier, Make, GitHub, Google Drive and other apps
- Video guide on getting data using Apify API
- A short guide on how to create Actors using code templates:
Scrape single page with provided URL with Axios and extract data from page's HTML with Cheerio.
A scraper example that uses Cheerio to parse HTML. It's fast, but it can't run the website's JavaScript or pass JS anti-scraping challenges.
Example of a Puppeteer and headless Chrome web scraper. Headless browsers render JavaScript and are harder to block, but they're slower than plain HTTP.
Skeleton project that helps you quickly bootstrap `CheerioCrawler` in JavaScript. It's best for developers who already know Apify SDK and Crawlee.
Example of running Cypress tests and saving their results on the Apify platform. JSON results are saved to Dataset, videos to Key-value store.
Empty template with basic structure for the Actor with Apify SDK that allows you to easily add your own functionality.