Google Maps Extractor avatar
Google Maps Extractor
Try for free

Pay $20.00 for 1,000 results

View all Actors
Google Maps Extractor

Google Maps Extractor

compass/google-maps-extractor
Try for free

Pay $20.00 for 1,000 results

Extract data from hundreds of places fast. Scrape Google Maps by keyword, category, location, URLs & other filters. Get addresses, contact info, opening hours, popular times, prices, menus & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

User avatar

Crawling Spaces Dubious in Accuracy

Closed

effusive_memento opened this issue
18 days ago

Hi,

I was having a conversation with Ondrej, and there seemed to be a problem with a POI scrape not returning the expected results in Hong Kong Taikoo District.

After a bit of tweaking Ondrej produced a script for me that considerably increased POI scrapes. The reason was that the App version that I was working with was outdated and a certain masking was taking place due to a browser window masking off a zoomed crop. Great. Problem solved.

However now when I'm scraping neighbouring districts "Central" and "Causeway bay". I'm getting MUCH LOWER scrapes then expected, particularly everybody intuitively knows that these areas are much more metropolitan than Taikoo, and should therefore have significantly higher POI counts.

https://console.apify.com/actors/runs/ImcGKnn3ONLARQCCh#output https://console.apify.com/actors/runs/U1HQB2KAwnzL8VHQa#output

Can someone have a look at this? I've started paying out of pocket to prove to my company that Apify is THE tool for scraping POI data but so far it's only been proved to be unreliable.

Best,

Adrian

User avatar

Hello Adrian,

Thanks a lot for your report. Ondrej is on vacation so we will push for this early next week. From preliminary testing, I don't think our scraper is way off. We are finalizing a new system that should be faster and have higher capture rate so we will report asap on it.

User avatar

Hello Adrian.,

I miscommunicated my last message. I didn't see some pages failed because there was not enough memory in your account for OCR analysis. We will launch a new version that doesn't require OCR next week. It is released under the debug build now.

Please check if these counts make more sense for your areas. https://console.apify.com/view/runs/dmehHr5ARJHbLltPy - 2191 results https://console.apify.com/view/runs/UX3VWzVGY0siNSULh - 2702 results

User avatar

effusive_memento

14 days ago

Hi Lukas,

Thanks for getting back to me so soon.

The respective 2191 and 2702 results are better than what we scraped but unfortunately not inline with our expectations.

We originally used the Taikoo area as the initial benchmarking region for our POI scrapes. Initial efforts by our team resulted in around 2190 POIs. On inspection of the scraped dataset we could see notable differences between the Apify scrape and manually comparing Google Maps POIs. In correspondence with Ondrej, we were told that a bug fix would result in much higher POI Scrapes. A resultant 3284 scrapes were provided which is much closer to expectations (Google Maps Scraper results (apify.com)https://api.apify.com/v2/key-value-stores/5Uwj3Rr9jnhIXNIeF/records/results-map).

We know that Taikoohttps://api.apify.com/v2/key-value-stores/5Uwj3Rr9jnhIXNIeF/records/results-map https://api.apify.com/v2/key-value-stores/5Uwj3Rr9jnhIXNIeF/records/results-map (3284) is a considerably less populated area of our other 3 benchmarks; so to see that the Apify scrapes of Causeway bayhttps://api.apify.com/v2/key-value-stores/CtpxiuSfRv2kWc39U/records/results-map (2702)and Kwun Tonghttps://api.apify.com/v2/key-value-stores/Vdbqr7lFzBuDHuhDF/records/results-map (2191) areas actually returned lower POI counts is surprising. Our ultimate goal is to benchmark 4 key regions of HK. Now, if we could make an argument that the efficiency of Apify's google scrapes contained a fixed margin of error across the sites I think this would soften the blow. However it's not clear that the margin of error is consistent.

I will run another Apify scrape on the Central Area using the attached CentralRegion.geojson file. Could I request that you do the same? We are trying to get the maximum number of POIs and we'd like them to be consistent. However if you notice any incorrect settings in the CentralRegion.geojson please let me know where those mistakes were made in order that we could scrape as many pois as possible.

Kind Regards,

Adrian Ma RIBA Part II, BSc. MA. Urban Analyst | Digital Infrastructure Researcher

T 852 94995014

E adrianma@ovalpartnership.commailto:joannafong@ovalpartnership.com

14/F, Malaysia Building, 50 Gloucester Road, Wan Chai, Hong Kong

www.ovalpartnership.comhttp://www.ovalpartnership.com/

[cid:0f03b2df-8d24-46ca-8e9a-72cec15f0add] https://www.ovalpartnership.com/en/

a member of the Octagonpartnership

[cid:461cd5fd-881c-48e2-8e92-03e20edbba20]https://www.facebook.com/ovalpartnership/ [cid:04e69ca7-b99f-4352-a350-f113553543e2] https://www.instagram.com/theovalpartnership/ [cid:4b96b412-416f-41a5-a0e7-ae202ecf4c31] https://twitter.com/TheOvalPartners [cid:439466b7-62bd-48c6-8f6a-1adea41a0eae] https://www.linkedin.com/authwall?trk=bf&trkInfo=AQHLljLf0EF01AAAAYJc2jqwfN7r8Z3qKwUowrg02b1mbQYXsvZjaEh0bCT4DIV4xDq4cmFEcl6xl8Kfh7tCJbvFoxMB2e7B9BSeqKaSXkVWJiDTvWgHAWn5WYdG_HejP1yxaa8=&original_referer=&sessionRedirect=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Foval-partnership%2F

User avatar

Hello,

Thanks for the feedback. We will do more thorough analysis. However, it is important to note that the actor simply results points of interest visible on Google Maps so if there are simply less places that you would think based on population, there is nothing we can do (it would need to use different scraper). So basically, any pin on the map this tool doesn't scrape is a bug but if there is no pin, we cannot of course scrape it.

User avatar

Do you have any specific places on your side that you know exist on Google maps, but are not in the scraped places? It would help us immensely... Thanks in advance...

User avatar

effusive_memento

14 days ago

Hi team,

Thanks for the responses.

In an ideal world all POIs from restaurants to public toilets would be scraped. Granted this is likely not possible due to google's api restrictions. However so long as the margin of error is the same across all sites I think this is acceptable.

Right now our Taikoo plot was the critical plot as our client has in depth knowledge of the site. Ondrej was kind enough to provide us a scrape that was in the region in excess of 6000 POIs. On inspection my intuition is that it would have passed our clients sniff test. Particularly given quite an accurate depiction of the 2 major shopping malls in the Taikoo; city plaza one and two.

Having said thet, in terms of benchmarking neighbouring districts it's a bit unbelievable that causeway bay or central would have fewer POIs than Taikoo. They are both much more established, diverse, dense than Taikoo. Ideally those catchments should return a minimum of 6000 POIs. Otherwise I think that would broadly mean that the efficacy of scraping is inconsistent.

In the case that we'd get < 6000 scrapes we'd have to start post rationalising. I'm not sure how we'd go about this just yet but as an example It could be that the density of SMEs in central is much higher and hence were not picked up by the scrapers. Perhaps... so long as there's a rational explanation for the discrepancy we should be okay.

In central the landmark complex, ifc, manning house, should display an increased density of scrapes. Although the almost entire street frontage should have shops, inclusive of some alleyways.

In causeway bay we also have a similar situation, less affluent but potentially even more dense than both Taikoo and central; with increased density showing up in Hysan place, sogo, Times Square, Windsor, Lee garden complexes, and fashion walk.

The above listed names are all major shopping complexes that should be comparable to city plaza in Taikoo.

Hope that helps but please feel free to get in touch if there are any other questions.

User avatar

Thank you for your feedback. We will dig into validation more. As I wrote, the density of shops in the areas is a good guideline but some might not be listed on Google Maps and then the scraper of course does not find those.

Also, only the debug version (until we release it likely tomorrow) should be used to evaluate since it is the fixed version.

User avatar

Hello,

I'm coming back as I realized we made a mistake with the 2 runs data I shared and we can indeed provide much more data.

Here are the 2 runs (excluding closed places): https://console.apify.com/view/runs/C9pdEv3dKZiGMVxPu - 6544 results https://console.apify.com/view/runs/mPOcnUkFdbuwjqERN - 6455 results

The reason I shared only 2,700 results previously was that I chose "onlyDataFromSearchPage": true for speed but it doesn't work with "scrapeDirectories": true so we missed most of places that are inside a parent place.

User avatar

effusive_memento

13 days ago

hi Lukas,

Thanks for getting back. Some promising results.

Having looked at the data scrapes I've come to notice that both scrapes are from Causeway Bay. With a difference of about 90 plots.

I'm guessing the discrepancy is due to the scraper itself?

Also if you could run a scrape for Central that would be incredibly helpful. I'm including the Geojson geofence above.

User avatar

Hello,

Thank you, you are right, I messed up the different locations. Getting different results is definitely a bug we will look into ASAP. I will update you with the Central (and rename the tasks on my account).

User avatar

Hello,

The new version (the same as the runs I shared last) has been released. For now, I will close this issue but we are continuing to validate to ensure a 100% capture rate. Please if you see we missed some, report another issue.

Thank you and happy to help with anything.

User avatar

effusive_memento

12 days ago

Great thanks Lukas. Haven’t had a chance to look at it today but will get back on it tomorrow.

Developer
Maintained by Apify
Actor metrics
  • 536 monthly users
  • 92.8% runs succeeded
  • 1.2 days response time
  • Created in Feb 2024
  • Modified about 5 hours ago