Google Maps Scraper avatar
Google Maps Scraper
Try for free

No credit card required

View all Actors
Google Maps Scraper

Google Maps Scraper

compass/crawler-google-places
Try for free

No credit card required

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

MV

Incomplete image collection

Closed

mvpolyakov opened this issue
4 months ago

I am trying to collect images from cafes, supplying a list of urls previously retrieved by the scraper. In many cases the scraper will save far less image urls than the business has. If the place has 40 images it may return 20, out of 90 it will save 33. The max is set at 100 (if the business has over 100 images, the scraper will usually get close to 100). I have increased the timeout to not be an issue (the run terminates in less time). In the 40 case above, I have manually verified that all 40 are visible online.

Is there some internal google maps setting that could be preventing certain images from being downloaded? Is it a scraping problem? I haven't tried rescraping the same businesses to see if results are consistent...

ondrejklinovsky avatar

Hi, thanks for the report. I tested one of the places in your run and it seems like a scraping problem - I got different number of images every time I ran it and always less than 100. We'll take a look at it, should be done by the end of next week.

MV

mvpolyakov

4 months ago

Fantastic! Thanks so much.

MV

mvpolyakov

4 months ago

One question - not directly related, but may be you will know: Some of the image urls returned by the scraper have a gps-proxy component (ex: https://lh3.googleusercontent.com/gps-proxy/ALd4DhF4E8S8Vk6Cn9YOwVOPXByZQw5scNbXFaEqP89b33GWZQuX5vHWygFUxR7WMNk6DCco8KovVrpVDXLC2K1EAeSwARytY028ybF8HVbsy6HmVIN-Ac4JOiB7k2bd64aaOIxvVZUXNUqGuKFrCMzmL__WkKrt2VxcIOiirnQ8ZP2qvRaFCQ7CJYw=w1920-h1080-k-no) and cannot be loaded. Do you know what that's about? Would that be consistent from run to run or are some images served dynamically from proxies for load balancing or something? For some businesses, a high percentage of urls look like that.

ondrejklinovsky avatar

hmm, I haven't seen such URLs before. Thank you for reporting it, we'll look into is as well.

ondrejklinovsky avatar

Hi, we've just released new version and the image collection should be fixed.

Regarding gps-proxy URLs...we're still discussing how to handle them. We'll keep you posted.

MV

mvpolyakov

4 months ago

Fantastic! Thank you.

ondrejklinovsky avatar

Hi, we've just released new version, that should be able to handle gpx-proxy URLs. see Changelog

Developer
Maintained by Apify
Actor metrics
  • 5k monthly users
  • 283 stars
  • 97.8% runs succeeded
  • 3.3 days response time
  • Created in Nov 2018
  • Modified about 9 hours ago