Google Maps Scraper avatar
Google Maps Scraper
Try for free

No credit card required

View all Actors
Google Maps Scraper

Google Maps Scraper

compass/crawler-google-places
Try for free

No credit card required

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

User avatar

Crawler capturing unrelated business categories not specified in the search terms

Open

rh_analytics opened this issue
22 days ago

Recently when I run the crawler it is capturing a large number of businesses from unrelated business categories that are not specified in the search terms. This has not happened previously and I have not changed any of the specifications of my runs to cause this. This is taking up more of my monthly usage for data, restricting my ability to work and taking up more time to clean the data files to extract what I want. Is there a way to resolve this issue within the actor?

User avatar

Hi, could you please share the old and new run? I'll take a look.

If you don't want any unrelated bussineses, try to use category filter. You can also limit the number of places per search term.

User avatar

rh_analytics

21 days ago

Hi,

When I run the actor I am specifying the business types in the search terms and defining the area by JSON code.

When reviewing the output of the most recent run (see attached) over 75% of the business categories scraped (Variable: ‘categoryName’) where not even closely related to the business categories included in my search terms (see attached).

This resulted in over 50% of the businesses scraped coming from these categories that are not related to my search terms.

Are you able to see why this is happening and if there is any way I can avoid this in the future?

User avatar

rh_analytics

17 days ago

Hi Ondrej,

Do you have any update on this?

Thanks

User avatar

Hi, thanks for the data and I apologize for late response.

The reason why you're getting more results is probably because we released a new version of the actor with improved search capabilities - it uses "search this area" feature (see screenshot) that allows us to find more places.

Regarding why you see many unrelated categories:

  1. categoryName contains only the main category of a place. All place's categories are stored in categories. Take for example this place(629th row in dataset): it's main category is Gas station, but the other categories are ATM, Cafe and restaurant. So these places should be fine.
  2. sometimes Google Maps gives us results that it thinks is "close enough" to what we want, so it gives us places like this (it's categories are Motel and Hotel).
  3. you have enabled scraping directories ("scrapeDirectories": true) - this tells the actor to scrape places that are inside another places (e.g. stores, restaurants, etc. in malls). But in your run, actor found this restaurant that has ATM inside, so actor scraped that as well.

You can get rid of the places from 2. and 3. point by adding category filters. Based on your list of search terms, I'd use the following categories:

1"categoryFilterWords": [
2        "restaurant",
3        "bar",
4        "coffee shop",
5        "cafe",
6        "ice cream shop",
7        "sandwich shop",
8        "pizza delivery service",
9        "pizzatakeaway",
10        "juice shop",
11        "brewery",
12        "diner",
13        "steak house",
14        "dessert shop",
15        "bubble tea store",
16        "tea house",
17        "pub",
18        "food court",
19        "bistro",
20        "brewpub",
21        "acai shop",
22        "chinese takeaway",
23        "sushi takeaway",
24        "art cafe",
25        "creperie"
26    ],
User avatar

rh_analytics

14 days ago

Hi Ondrej,

Thanks for getting back to me with a detailed response. I can see Zuzka Pelechová has closed the issue but I still have further questions.

With regards to the proposed solution you have provided of applying category filters – does each one have to be selected manually or can I paste them in? If I apply those filters will I still capture all of the categories in my search terms? That I essential to me.

Thanks for your explanation, understanding the scrape better now I have reviewed the extract again. I have attached the workings file – just extracted the categories, CID and URL for reference.

When looking at all categories associated with a business I found;

  • Of the 3,627 Businesses scraped, only 1,458 of them (40%) have a category that matches the search terms
  • Of the 3,627 Businesses scraped, only 1,859 of them (51%) have a category that matches the search terms or is relevant and close to the search terms
  • Of the 3,627 Businesses scraped, only 3,093 of them (84%) either match, are relevant, or are located inside somewhere whereby ‘scraping directories ("scrapeDirectories": true)’ captures them.

So there are 579 (16%) businesses with 460 different business categories that are either not relevant or located in a space that could be scraped as part of the ‘("scrapeDirectories": true)’ function. Is there any other way to not capture these (and not spend my data allowance on them)?

If I turn off the ‘scraping directories ("scrapeDirectories": true)’ is it done here and will this remove all the listings from unrelated categories? [cid:image002.png@01DA97DE.6A23DCC0]

If not, I either need to build a work around on my side or change scraping tools

Thanks for your help

User avatar

rh_analytics

3 days ago

Hi there,

I’m just wondering if someone is still looking at this?

User avatar

Hi, sorry for late reply, I took a week off.

You can either paste the categories to the JSON editor or select them manually with the select tool. There is a chance that some of the "correct" places will get discarded if you apply category filters - if you don't want this to happen, I'd suggest to not use the categories filter and instead do filtering on your own.

If you turn off the scrapeDirectories, you'll remove places from the 3. point (see my answer above), but you'll still get some "close enough" places (2. point).

If you decide to use categories filter, note that using restaurant will accept any restaurant, e.g. "Mandarin restaurant", which is not in your search terms. If you want only specific restaurants, you need to apply more specific categories, e.g. french restaurant, brunch restaurant, etc

Developer
Maintained by Apify
Actor metrics
  • 4.1k monthly users
  • 97.8% runs succeeded
  • 2.7 days response time
  • Created in Nov 2018
  • Modified 29 minutes ago