Web Scraper avatar
Web Scraper

Pricing

Pay per usage

Go to Store
Web Scraper

Web Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

4.5 (23)

Pricing

Pay per usage

859

Total users

88K

Monthly users

4.5K

Runs succeeded

>99%

Issues response

8.1 days

Last modified

a month ago

proloser avatar

Reclaiming failed request back to the list or queue. requestHandler timed out after 60 seconds.

Closed

Dean Sofer (proloser) opened this issue
a year ago

It looks like it's crawling the page correctly but I can't figure out why this error is occurring and I'd prefer to preserve my usage

https://console.apify.com/actors/moJRLRc85AitArpNN/runs/An5339u0xNa1UKP7D#log

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

The issue you describe seems to appear randomly. It might be related to the asynchronous requests you are making inside of the Page Function. Unfortunately, I cannot provide much more help with your custom code, as I don't know what you are trying to achieve. As a quick remedy, you can also bump the requestHandler timeout by increasing the value in the Performance and Limits > Page Function timeout input option.

By the way - the website you are scraping seems completely server-side rendered (and static, i.e., without client-side JS). This means you can process it with our Cheerio Scraper as well. This Actor is much faster than Web Scraper, as it doesn't use web browsers to load the page (it uses a simple HTTP request and an HTML parser instead). I see that most of your custom code uses jQuery - migrating this to Cheerio should be fairly easy, as Cheerio supports a fairly comprehensive subset of jQuery syntax.

Migrating to Cheerio Scraper should give you your results much faster (up to 20x speed improvement) and definitely save you some platform credits as well.

I'll keep this issue open - feel free to ask additional questions if you have any - or close this issue, if you don't. Cheers!

proloser avatar

I am scraping this wordpress blog for events around my city and converting them into events with scheduling details and lat/long coordinates to display the events on a map. The async request I'm doing is to geocode the address for the event for the purposes of displaying on a map.

I will look into parsing the site with cheerio, but I am guessing I'd have to then use another actor to geocode the address as I would not be able to do this in cheerio, right?

proloser avatar

Hello I figured out how to do it with cheerio (their documentation is horrible) and it works great! thanks