Cheerio Scraper
No credit card required
Cheerio Scraper
No credit card required
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
Do you want to learn more about this Actor?
Get a demoGood Morning I found various bugs in the latest cheerio scraper, I spoke in chat with Tsveta, besides this the fact that you are obliging to use your proxy in a free account is a killer feature that will make user escape but anyway:
If I want to use one of my custom http proxy I get always error even if the proxy is working correctly. I have got error 400 from the proxy, it seems that apify is not making the proper calls.
Besides this the interface is blocking adding the socks5:// protocol even if in your documentation is stated so: https://apify.com/apify/cheerio-scraper#proxy-configuration
Hello, we are aware of this problem and looking into it now.
As for mandatory proxies, that will stay but we will provide Apify proxies for free users indefinitely soon
Another bug I spotted is that globs are not shown once you reload the page after saving it, but they are saved correctly, anyway if you can't see them and you don't know how to use the json editor it's tricky for most users
Thanks, this is a one-time issue since the Glob input type was just changed. Will be fixed asap
You're welcome, regards
Socks are not supported, we will fix the docs. As I said, very soon free accounts will be granted free proxies forever
A little bit of a mess over there, sorry to say: removing features, documentation incorrect... me I'm an hobbist but how companies can build a reliable product with this kind of premises...
Many thanks so I will look forward for the developments
You are right. Most devs build new actors with Crawlee and Apify SDK directly so the doc issue escaped. As for changes, we are aware of the bad execution of these events, it is not a norm.
Actor Metrics
443 monthly users
-
93 stars
>99% runs succeeded
28 days response time
Created in Apr 2019
Modified 2 months ago