Booking Scraper
Pay $5.00 for 1,000 results
Booking Scraper
Pay $5.00 for 1,000 results
Scrape Booking with this hotels scraper and get data about accommodation on Booking.com. You can crawl by keywords or URLs for hotel prices, ratings, addresses, number of reviews, stars. You can also download all that room and hotel data from Booking.com with a few clicks: CSV, JSON, HTML, and Excel
Do you want to learn more about this Actor?
Get a demoJust to be clear, Booking.com does have a weird way for showing address. For example: https://www.booking.com/hotel/jp/shangri-la-tokyo.html?selected_currency=EUR&lang=en-us&group_adults=2&group_children=0&no_rooms=1
Address: 100-8283 Tokyo-to, Chiyoda-ku, Marunouchi Trust Tower Main, 1-8-3 Marunouchi,, Japan
Has an empty part, which usually is a city or district. Can you also scrape address hierarchy from breadcrumbs: Japan > Tokyo-to > Tokyo > Chiyoda. URL structure is actually very helpful to get what each links is: country, region, city, district, etc.
Returning a list (or even better a dictionary) would be super helpful, to save location hierarchy.
Thank you!
Hello, thanks for this suggestion! We can add breadcrumbs for sure 👌Getting the names such as Chiyoda
or Tokyo
will be straightforward but adding breadcrumbs URLs will be challenging a bit. But we'll try our best to add those too 🙂
Thanks for the quick response! I was considering an additional dictionary like this:
1{ 2 "country": "Japan", 3 "region": "Tokyo-to", 4 ... 5}
However, having the entire breadcrumb trail would be even better:
1[ 2 { 3 "url": "https://www.booking.com/country/jp.html...", 4 "title": "Japan" 5 }, 6 ... 7]
With the breadcrumb structure, I can deduce the hierarchy based on the URL structure or other heuristics.
A plain list of titles wouldn't be very useful:
["Japan", "Tokyo-to", "Tokyo"]
It's difficult to determine the exact hierarchy from just the titles, especially since different countries have varying location structures.
Including raw breadcrumbs would be a fantastic addition. It offers flexibility, allowing users to decide how to utilize that information best.
However, having the entire breadcrumb trail would be even better:
Yeah I was thinking of adding breadcrumbs basically in the same format:
1{ 2 "breadcrumbs": [ 3 { 4 "name": "Chiyoda", 5 "fullName": "Hotels in Chiyoda", 6 "link": "https://www.booking.com/district/jp/tokyo/chiyoda.html" 7 } 8 ] 9}
Personally, I think it's a little easier to work with compared to the following dictionary format:
1{ 2 "breadcrumbs": { 3 "Chiyoda": { 4 "name": "Hotels in Chiyoda", 5 "link": "https://www.booking.com/district/jp/tokyo/chiyoda.html" 6 } 7 } 8}
A plain list of titles wouldn't be very useful
I suppose we'll manage to extract the links as well, we just need to build them from parameters such as dest_type
(district
), search string (chiyoda
) and country code jp
. That's because the full URLs such as https://www.booking.com/district/jp/tokyo/chiyoda.html
are not available directly in the HTML data our Actor works with.
Anyway, the issue is ready in our backlog and we'll let you know once this new feature gets published!
Hello, we have just published the new version of the Actor with breadcrumbs extraction 🙂
There're basically 2 modes depending on whether you run the Actor with checkIn
and checkOut
info or without it.
Run with checkIn
+ checkOut
The Actor collects 2 types of links - primary (link
) and alternative (altLink
).
The primary link includes the searchresults
substring and it also contains search parameters such as the checkIn
and checkOut
specified in the input.
Alternative link is built by the Actor using the parameters from primary link and it's a bit experimental (there're many edge cases being handled).
Alternative link is something extra that is not available on Booking when you're browsing hotels with checkIn
and checkOut
specified (you can test this in your web browser).
Example run: https://console.apify.com/view/runs/tBue6u9UMVLuhVlQB
Example breadcrumb:
1{ 2 "name": "Chiyoda", 3 "fullName": "Hotels in Chiyoda", 4 "link": "https://www.booking.com/searchresults.en-gb.html?label=gen173nr-1FCAsodUIQc2hhbmdyaS1sYS10b2t5b0gJWARotAKIAQGYAQm4AQfIAQzYAQHoAQH4AQOIAgGoAgO4Asvg7bYGwAIB0gIkZTUyZTZkZGEtZjIwMC00NzI4LTgxNjAtMmI0MGViZmMxNTIz2AIF4AIB&sid=9ce2dab9d2dcc077d60360b65be7fcf2&checkin=2024-09-19&checkout=2024-09-20&dest_id=308&dest_type=district&ss=Chiyoda&", 5 "altLink": "https://www.booking.com/district/jp/tokyo/chiyoda.html" 6}
Run without checkIn
+ checkOut
The Actor only collects a single primary link (link
) and the alternative link is always null
. This is because Booking doesn't return the link with searchresults
substring in this case and provides us with the more structured link directly. Thanks to that, the extracted link
is more reliable than the altLink
from the previous example - it is constructed by Booking and not by our Actor.
Example run: https://console.apify.com/view/runs/2ZN4EKwPs6hoCsYhP
Example breadcrumb:
1{ 2 "name": "Chiyoda", 3 "fullName": "Hotels in Chiyoda", 4 "link": "https://www.booking.com/district/jp/tokyo/chiyoda.en-gb.html?label=gen173nr-1FCAsodUIQc2hhbmdyaS1sYS10b2t5b0gJWARosgKIAQGYAQm4ARfIAQzYAQHoAQH4AQOIAgGoAgO4AsPf7bYGwAIB0gIkMDhmZTYxZDktOTkwMC00OTgyLTg5OWUtZThjNmIzODE1MWU42AIF4AIB&sid=35653b42a7ee5220836aca11d3e33fb9&breadcrumb=hotel&", 5 "altLink": null 6}