Cheerio Scraper avatar
Cheerio Scraper

Pricing

Pay per usage

Go to Store
Cheerio Scraper

Cheerio Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

4.7 (10)

Pricing

Pay per usage

166

Total users

8.6K

Monthly users

897

Runs succeeded

>99%

Issues response

11 days

Last modified

2 months ago

SP

Exclude certain paths from queue processing

Closed

Storytome_PODs opened this issue
a month ago

Hi, we want to exclude certain url paths from being queued for processing. Paths like /about, /events, etc. Tried adding a prenavigation hook to exclude certain paths, it skips them but it's still in the queue causing longer scrape time. Is it possible to remove them from queue?

RY

ryanhemmings

24 days ago

no

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

You can use the Exclude Glob Patterns input option to filter some enqueued URLs based on a pattern. See the attached screenshot for an example - this way, the crawler won't enqueue neither /about nor /blog on the crawlee.dev website.

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!