
Cheerio Scraper
Pricing
Pay per usage

Cheerio Scraper
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
4.7 (10)
Pricing
Pay per usage
166
Total users
8.6K
Monthly users
897
Runs succeeded
>99%
Issues response
11 days
Last modified
2 months ago
Exclude certain paths from queue processing
Closed
Hi, we want to exclude certain url paths from being queued for processing. Paths like /about, /events, etc. Tried adding a prenavigation hook to exclude certain paths, it skips them but it's still in the queue causing longer scrape time. Is it possible to remove them from queue?
ryanhemmings
no
Hello, and thank you for your interest in this Actor!
You can use the Exclude Glob Patterns
input option to filter some enqueued URLs based on a pattern. See the attached screenshot for an example - this way, the crawler won't enqueue neither /about
nor /blog
on the crawlee.dev
website.
I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!