Pricing

Pay per usage

Try for free

Go to Store

Cheerio Scraper

Try for free

Developed by

Apify

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

4.7 (10)

Pricing

Pay per usage

177

Total users

9.1K

Monthly users

924

Runs succeeded

>99%

Issues response

12 days

Last modified

2 months ago

Developer tools

Open source

Back to issues Create new issue

Add waiting time

Closed

raphael123 opened this issue

Hi there, would it be possible to add a loading time in seconds before crawling the page. there is some content on the site I'm trying to scrape that takes about 5s to load and it's not picking it up currently.

thanks!

Jindřich Bär (jindrich.bar)

Hello @raphael123 and thank you for your interest in this Actor!

Unfortunately, the Cheerio Scraper will never load the dynamic content you are talking about, because it cannot execute client-side Javascript. Because of this, the Cheerio Scraper can be exceptionally fast... but on the other hand, it can only be used to crawl pages that don't utilize client-side rendering (eg. fetch some extra data after the initial page load via asynchronous requests).

Your best bet is to use one of our other actors - Web Scraper. It's very similar to this Cheerio Scraper, with one big difference - under the hood, the Web Scraper uses an actual instance of Google Chrome, and because of this, it can render the webpage as if you just opened it in your web browser. Using this Actor, you can wait for some time in the custom Page function to make sure all the content has been loaded.

Did this answer your question? I'll close this issue now, but feel free to ask any additional questions in case of any problems. Thanks!

Add comment

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

8.4K

5.0

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

90K

4.4

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

870

4.2

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

471

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.6

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving

scrapingxpert

100

Camoufox Scraper

apify/camoufox-scraper

Crawls websites with stealthy Camoufox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

Javascript Library Detail Scraper

cykieffodh/javascript-library-detail-scraper

Javascript Library Detail Scraper

Michael Laflin

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

4.3

Nodejs Runner

martin.forejt/nodejs-runner

This Actor allows you to quickly run arbitrary JavaScript code in a real Node.js environment, making it ideal for testing, debugging, or executing small scripts without setting up a local Node.js instance. The actor spawns a separate Node.js process to run the provided code and captures the logs.