Bootstrap CheerioCrawler
Skeleton project that helps you quickly bootstrap `CheerioCrawler` in JavaScript. It's best for developers who already know Apify SDK and Crawlee.
src/main.js
src/routes.js
1import { Actor } from 'apify';
2import { CheerioCrawler } from 'crawlee';
3// this is ESM project, and as such, it requires you to specify extensions in your relative imports
4// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
5import { router } from './routes.js';
6
7await Actor.init();
8
9const proxyConfiguration = await Actor.createProxyConfiguration();
10
11const crawler = new CheerioCrawler({
12 proxyConfiguration,
13 requestHandler: router,
14});
15
16await crawler.run(['https://example.com']);
17
18await Actor.exit();
Bootstrap CheerioCrawler template
This is a project skeleton to help you bootstrap CheerioCrawler
web scraping projects in JavaScript faster. It will always use the most up-to-date configuration and include all the common files. It's made for developers already familiar with Apify SDK and Crawlee libraries.
If you're looking for examples or want to learn how to use Apify, Apify SDK, or Crawlee, check out the other templates.
Resources
- Video tutorial on building a scraper using CheerioCrawler
- Written tutorial on building a scraper using CheerioCrawler
- How to scrape a dynamic page using Cheerio
- Video guide on getting data using Apify API
- Integration with GitHub, Zapier, Make, Google Drive and others
- A short guide on how to create Actors using code templates
Scrape single page with provided URL with Axios and extract data from page's HTML with Cheerio.
A scraper example that uses Cheerio to parse HTML. It's fast, but it can't run the website's JavaScript or pass JS anti-scraping challenges.
Example of a Puppeteer and headless Chrome web scraper. Headless browsers render JavaScript and are harder to block, but they're slower than plain HTTP.
Web scraper example with Crawlee, Playwright and headless Chrome. Playwright is more modern, user-friendly and harder to block than Puppeteer.
Example of running Cypress tests and saving their results on the Apify platform. JSON results are saved to Dataset, videos to Key-value store.
Empty template with basic structure for the Actor with Apify SDK that allows you to easily add your own functionality.