Pricing

Pay per usage

Try for free

Go to Apify Store

Cheerio Scraper

Try for free

Developed by

Apify

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

4.8 (11)

Pricing

Pay per usage

Issues response

12 days

Last modified

3 months ago

Developer tools

Open source

Change Log

3.0.15 (2024-10-25)

Updated Crawlee version to v3.11.5 and SDK v3.2.6
Updated Node to v22

3.0.14 (2024-04-09)

Updated Crawlee version to v3.8.0.
Updated to use new request queue in scraper

3.0.11 (2023-08-22)

Updated Crawlee version to v3.5.2.
Updated Node.js version to v18.
Added new options:
- Exclude Glob Patterns (excludes): Glob patterns to match links in the page that you want to exclude from being enqueued.

3.0 (`version-3`)

Rewrite from Apify SDK to Crawlee, see the v3 migration guide for more details.
Proxy usage is now required.

2.0 (`version-2`)

Main difference between v1 and v2 of the scrapers is the upgrade of SDK to v2, which requires node v15.10+. SDK v2 uses http2 to do the requests with cheerio-scraper, and the http2 support in older node versions were too buggy, so we decided to drop support for those. If you need to run on older node version, use SDK v1.

Please refer to the SDK 1.0 migration guide for more details about functional changes in the SDK. SDK v2 basically only changes the required node version and has no other breaking changes.

deprecated useRequestQueue option has been removed
- RequestQueue will be always used
deprecated context.html getter from the cheerio-scraper has been removed
- use context.body instead
deprecated prepareRequestFunction input option
- use pre/postNavigationHooks instead
removed puppeteerPool/autoscaledPool from the crawlingContext object
- puppeteerPool was replaced by browserPool
- autoscaledPool and browserPool and available on the crawler property of crawlingContext object
custom "Key-value store name" option in Advanced configuration is now fixed, previously the default store was always used

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

94K

4.5

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

5.0

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

2.4K

4.7

Metadata Extractor

jancurn/extract-metadata

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

Jan Čurn

1.3K

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving

scrapingxpert

118

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

1.7K

4.7

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

886

4.2

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

70K

4.4

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

475

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.