Tripadvisor Scraper avatar
Tripadvisor Scraper
Try for free

Pay $3.00 for 1,000 results

View all Actors
Tripadvisor Scraper

Tripadvisor Scraper

maxcopell/tripadvisor
Try for free

Pay $3.00 for 1,000 results

This unofficial Tripadvisor API is a data extraction tool able to get data on hotels, restaurants, things to do, vacation rentals, attractions, tours, and public trips. Get pricing, contact details, amenities, awards, ratings, and more. Download your data in Excel, JSON, CSV, and other formats.

Do you want to learn more about this Actor?

Get a demo
GE

Scraping with inputting startUrl of a specific hotel isn't working

Closed

agenthub opened this issue
3 months ago

Say for example:

""" headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0', 'Content-type': 'application/json' }

params = { "includeAttractions": False, "includeRestaurants": False, "includeHotels": True, "includeVacationRentals": False, "startUrls": [ "https://www.tripadvisor.com/Hotel_Review-g297550-d302126-Reviews-Jaz_Makadi_Oasis_Resort-Makadi_Bay_Hurghada_Red_Sea_and_Sinai.html", ], "language": "en", "currency": "USD", "maxItems": 1, "endPage": 1, "extendOutputFunction": "($) => { return {} }", "customMapFunction": "(object) => { return {...object} }", "proxy": { "useApifyProxy": True, "apifyProxyGroups": [ "RESIDENTIAL" ]
} } """

It's getting rejected with:

{'error': {'type': 'invalid-input', 'message': 'Input is not valid: Items in input.startUrls at positions [0] do not contain valid URLs'}}

Even though: "https://www.tripadvisor.com/Hotel_Review-g297550-d302126-Reviews-Jaz_Makadi_Oasis_Resort-Makadi_Bay_Hurghada_Red_Sea_and_Sinai.html"

is indeed a valid URL

lukas.prusa avatar

Hi, thanks for opening this issue!

I think you are mistaking the input for this Actor. The input that you've provided is in a completely different format than this scraper takes.

The error you are seeing, is as expected, the provided start URL was not in a valid format.

This is the default input for the scraper:

1{
2    "currency": "USD",
3    "includeAiReviewsSummary": false,
4    "includeAttractions": true,
5    "includeHotels": true,
6    "includePriceOffers": false,
7    "includeRestaurants": true,
8    "includeTags": true,
9    "includeVacationRentals": false,
10    "language": "en",
11    "locationFullName": "Chicago",
12    "maxItemsPerQuery": 10
13}

I hope this helps, thanks and happy scraping!

3 months ago

Can you give me an example of a start url in valid format? Your documentation says you can put startUrls as an input attribute

The JSON example you provide as the “default input” doesn’t include startUrls, but you say in the documentation that this is a valid input.

Is the documentation wrong?

lukas.prusa avatar

Sorry for the inconvenience, the automatic displayed documentation has a little flawed UI there. The input for start URLs is an array of sources.

So in your case, your case you can simply use:

1"startUrls": [
2    {
3        "url": "https://www.tripadvisor.com/Hotel_Review-g297550-d302126-Reviews-Jaz_Makadi_Oasis_Resort-Makadi_Bay_Hurghada_Red_Sea_and_Sinai.html",
4    }
5]

Also, you can try out the UI input schema, which nicely formats it for you and makes it easier to edit.

Thanks!

3 months ago

Thank you that worked!

lukas.prusa avatar

Awesome, I'm glad it helped.

Developer
Maintained by Apify
Actor metrics
  • 285 monthly users
  • 51 stars
  • 96.7% runs succeeded
  • 1.3 days response time
  • Created in Nov 2019
  • Modified about 2 hours ago
Categories