Tripadvisor Reviews Scraper avatar

Tripadvisor Reviews Scraper

Try for free

Pay $2.00 for 1,000 reviews

Go to Store
Tripadvisor Reviews Scraper

Tripadvisor Reviews Scraper

maxcopell/tripadvisor-reviews
Try for free

Pay $2.00 for 1,000 reviews

Get and download reviews for chosen places on Tripadvisor. Extract the review text, URL, rating, date of travel, published date, basic reviewer info, owner's response, helpful votes, images, review language, place details. Download reviews in XML, JSON, CSV.

Do you want to learn more about this Actor?

Get a demo
KC

Review count doesn't line up -- way to re-run on missing?

Closed

kcarriere opened this issue
a month ago

I just ran a decent-sized job. There were a few locations that had less than 100% return rate (66%, 14%, 66%, 32.5%, 80%, and 49%). For example, locationId==10105731, returned a suspiciously round 21,500 reviews. There's definitely contradictory information on TripAdvisor's side -- it lists 32,092 reviews but also "showing results of..." 25,416.

Regardless, that's between 3.9k and 10.5k reviews not scrapped.

The log at some point reads for this location: "2024-12-19T13:02:39.104Z INFO Reached max reviews to enqueue per query". That sounds like the scrapper got bounced out potentially? I'm not sure what that means.

Sorting the JSON, it's definitely scrapping the most recent (12/16/2024; 12/05/2024, 11/28/2024). Hard to tell if it's scrapping the "oldest". I can't necessarily re-run because the filter is for "only scrap reviews since" and not "only scrap reviews before".

Just trying to figure out the missing data discrepancy here, and how I could solve this.

lukas.prusa avatar

Hi, thanks for opening this issue and your patience! Sorry this issue got a bit lost for us.

The problem - You've set the global max items limit to 249.5k. Exactly where the crawler has stopped. It can be found at the bottom of the page.

The resolution - Understandably, you are now missing results for some hotels. I've gone over the logs and found the ones that've not been finished. These are the ones that you will most likely have to rescrape again:

1https://www.tripadvisor.com/Attraction_Review-g28970-d10105731-Reviews-Lincoln_Memorial-Washington_DC_District_of_Columbia.html
2https://www.tripadvisor.com/Attraction_Review-g60763-d1687489-Reviews-The_National_9_11_Memorial_Museum-New_York_City_New_York.html
3https://www.tripadvisor.com/Attraction_Review-g60982-d104386-Reviews-USS_Arizona_Memorial-Honolulu_Oahu_Hawaii.html
4https://www.tripadvisor.com/Attraction_Review-g187147-d188709-Reviews-Arc_de_Triomphe-Paris_Ile_de_France.html
5https://www.tripadvisor.com/Attraction_Review-g187323-d617423-Reviews-The_Holocaust_Memorial_Memorial_to_the_Murdered_Jews_of_Europe-Berlin.html

I hope this helps, thanks and happy scraping!

Developer
Maintained by Apify

Actor Metrics

  • 235 monthly users

  • 57 stars

  • >99% runs succeeded

  • 6.4 days response time

  • Created in Jan 2023

  • Modified 4 days ago

Categories