Booking Scraper avatar
Booking Scraper

Pricing

$5.00 / 1,000 results

Go to Store
Booking Scraper

Booking Scraper

voyager/booking-scraper

Developed by

Voyager

Maintained by Apify

Scrape Booking with this hotels scraper and get data about accommodation on Booking.com. You can crawl by keywords or URLs for hotel prices, ratings, addresses, number of reviews, stars. You can also download all that room and hotel data from Booking.com with a few clicks: CSV, JSON, HTML, and Excel

4.4 (10)

Pricing

$5.00 / 1,000 results

67

Monthly users

286

Runs succeeded

>99%

Response time

1.3 days

Last modified

19 days ago

2T

Invalid address for some hotels

Closed
2tunnels opened this issue
9 months ago

Just to be clear, Booking.com does have a weird way for showing address. For example: https://www.booking.com/hotel/jp/shangri-la-tokyo.html?selected_currency=EUR&lang=en-us&group_adults=2&group_children=0&no_rooms=1

Address: 100-8283 Tokyo-to, Chiyoda-ku, Marunouchi Trust Tower Main, 1-8-3 Marunouchi,, Japan

Has an empty part, which usually is a city or district. Can you also scrape address hierarchy from breadcrumbs: Japan > Tokyo-to > Tokyo > Chiyoda. URL structure is actually very helpful to get what each links is: country, region, city, district, etc.

Returning a list (or even better a dictionary) would be super helpful, to save location hierarchy.

Thank you!

lhotanok avatar

Hello, thanks for this suggestion! We can add breadcrumbs for sure 👌Getting the names such as Chiyoda or Tokyo will be straightforward but adding breadcrumbs URLs will be challenging a bit. But we'll try our best to add those too 🙂

2T

2tunnels

9 months ago

Thanks for the quick response! I was considering an additional dictionary like this:

1{
2  "country": "Japan",
3  "region": "Tokyo-to",
4  ...
5}

However, having the entire breadcrumb trail would be even better:

1[
2  {
3    "url": "https://www.booking.com/country/jp.html...",
4    "title": "Japan"
5  },
6  ...
7]

With the breadcrumb structure, I can deduce the hierarchy based on the URL structure or other heuristics.

A plain list of titles wouldn't be very useful:

["Japan", "Tokyo-to", "Tokyo"]

It's difficult to determine the exact hierarchy from just the titles, especially since different countries have varying location structures.

Including raw breadcrumbs would be a fantastic addition. It offers flexibility, allowing users to decide how to utilize that information best.

lhotanok avatar

However, having the entire breadcrumb trail would be even better:

Yeah I was thinking of adding breadcrumbs basically in the same format:

1{
2  "breadcrumbs": [
3    {
4      "name": "Chiyoda",
5      "fullName": "Hotels in Chiyoda",
6      "link": "https://www.booking.com/district/jp/tokyo/chiyoda.html"
7    }
8  ]
9}

Personally, I think it's a little easier to work with compared to the following dictionary format:

1{
2  "breadcrumbs": {
3      "Chiyoda": {
4        "name": "Hotels in Chiyoda",
5        "link": "https://www.booking.com/district/jp/tokyo/chiyoda.html"
6      }
7    }
8}

A plain list of titles wouldn't be very useful

I suppose we'll manage to extract the links as well, we just need to build them from parameters such as dest_type (district), search string (chiyoda) and country code jp. That's because the full URLs such as https://www.booking.com/district/jp/tokyo/chiyoda.html are not available directly in the HTML data our Actor works with.

Anyway, the issue is ready in our backlog and we'll let you know once this new feature gets published!

lhotanok avatar

Hello, we have just published the new version of the Actor with breadcrumbs extraction 🙂

There're basically 2 modes depending on whether you run the Actor with checkIn and checkOut info or without it.

Run with checkIn + checkOut

The Actor collects 2 types of links - primary (link) and alternative (altLink). The primary link includes the searchresults substring and it also contains search parameters such as the checkIn and checkOut specified in the input. Alternative link is built by the Actor using the parameters from primary link and it's a bit experimental (there're many edge cases being handled). Alternative link is something extra that is not available on Booking when you're browsing hotels with checkIn and checkOut specified (you can test this in your web browser). Example run: https://console.apify.com/view/runs/tBue6u9UMVLuhVlQB Example breadcrumb:

1{
2  "name": "Chiyoda",
3  "fullName": "Hotels in Chiyoda",
4  "link": "https://www.booking.com/searchresults.en-gb.html?label=gen173nr-1FCAsodUIQc2hhbmdyaS1sYS10b2t5b0gJWARotAKIAQGYAQm4AQfIAQzYAQHoAQH4AQOIAgGoAgO4Asvg7bYGwAIB0gIkZTUyZTZkZGEtZjIwMC00NzI4LTgxNjAtMmI0MGViZmMxNTIz2AIF4AIB&sid=9ce2dab9d2dcc077d60360b65be7fcf2&checkin=2024-09-19&checkout=2024-09-20&dest_id=308&dest_type=district&ss=Chiyoda&",
5  "altLink": "https://www.booking.com/district/jp/tokyo/chiyoda.html"
6}

Run without checkIn + checkOut

The Actor only collects a single primary link (link) and t... [trimmed]

Pricing

Pricing model

Pay per result 

This Actor is paid per result. You are not charged for the Apify platform usage, but only a fixed price for each dataset of 1,000 items in the Actor outputs.

Price per 1,000 items

$5.00