
Booking Reviews Scraper
Pay $2.00 for 1,000 reviews

Booking Reviews Scraper
Pay $2.00 for 1,000 reviews
Scraper to get reviews from hotels, apartments and other accommodations listed on the Booking.com portal. Extract data using hotel URLs for review text, ratings, stars, basic reviewer info, length of stay, liked/disliked parts, room info, date of stay and more. Download in JSON, HTML, Excel, CSV.
Abnormal execution, many errors and very long execution time.
Today I ran 4 batches of about 100K reviews each. The first 3 runs finished without problems with excellent execution times of about 20m. The last run took 1.30h, reporting several errors in the log. Unfortunately the number of results returned is slightly lower than expected. Is this kind of behavior predictable? Is it my input or the platform? Does it have to do with billing? (I am working on overage). I'm trying to figure out if the batches I use are too large, I wouldn't say that considering the speed of this scraper. Thanks. Mauro.

Hi, thanks for opening this issue!
Looking at your run, there are a few problems:
- Some of the hotels in fact have no reviews, you can easily find those on output with the field:
"error": "no_reviews_found"
- Some input URLs are getting redirected to the home page, because they are invalid. Probably the hotel got removed from booking since. Unfortunately, we are not catching these at the moment, so you won't see them in the output. You will have to filter them out from the logs, which will be painful. It could be automated though, by selecting lines that contain
ERROR CustomRequestsCheerioCrawler: Request failed and reached maximum retries. Error: We were redirected to different URL
. We will fix handling and saving of the error items for these redirected URLs :)
The larger batches are fine, if you are okay with the longer scrape time, however I wouldn't recommend it for cases like this, where something fails. Whether that is for some bug in the scraper, or a wrong input by the user, you will still have to start over or filter out the failed ones and continue for those. With smaller runs, that's obviously easier to do so.

I will keep you updated here with the progress on handling the redirected URLs, but you can of course continue with the scraping in the meantime :) Thanks!
humatics
Thanks, very thorough. The links should not be so obsolete but I don't rule out these kind of problems, I simply obtained the links from the voyager booking scraper few weeks ago. Overall the scraper works as expected, we are very satisfied. Mauro.