Google Maps Scraper
- compass/crawler-google-places
- Modified
- Users 34.4k
- Runs 3.6M
- Created by
Compass
Extract data from hundreds of Google Maps businesses and locations in seconds. Get Google Maps data including reviews, images, opening hours, location, popular times & more. Go beyond the limits of the official Google Places API. Download data with Google Maps extractor in JSON, CSV, Excel and more.
2023-05-25 (0.14.206)
Fixes
- Searches that redirect to a single place work again (this broke in the last update)
- Properly handle search that doesn't find anything (Google changed the design of the page)
- If the scraper reaches
maxCrawledPlacesPerSearch
or gets too many empty/redirect searches in a row (currently 20), it will quickly remove the rest of the searches for the same term from the queue. Previously it would blindly keep trying to scrape something there wasting resources. - Automatically resurrect the run if it hits Out of memory error. This started happening recently so we are still investigating the root cause.
- Shuffle the starting map locations to reduce the number of duplicates or missing queries at the start.
- Respect
maxCrawledPlacesPerSearch
whenonlyDataFromSearchPage
is used.
Features
- On every failed run, the actor triggers another actor
lukaskrivka/actor-fail-manager
via webhook.- It analyses the error and if appropriate it resurrects the run (e.g. in case of Out of memory error)
- It sends a report to the author to be able to promptly fix the issue or improve user experience if needed
2023-05-22
Fixes
- Retry pages with
allPlacesNoSearchAction
that produce captchas (can happen at very high scraping speed) - Rotate browsers more often with
allPlacesNoSearchAction
to prevent captchas. This slows down the scraping slightly. - Better validate
customeGeolocation
longitude and latitude order. Fail if out of bounds and add a warning if the values look unreasonable (inside the ocean).
Changes
- Add a new option
onlyDataFromSearchPage
(replacesexportPlaceUrls
) which allows the extraction of some of the data from the search page without going to the place's detail page. - Deprecate
exportPlaceUrls
input option. As usual, we keep it backward compatible for a long-time. - Each output place item can contain a maximum of 5000 reviews so in case there are more reviews for that place, a duplicate place is stored with the next 5000 reviews and so on. E.g. in the case of 50,000 reviews, the resulting dataset will have 10 items with the same place. This limitation is due to the size limit of a single item in the Apify dataset.
- Deprecate
"allPlacesNoSearchAction": "all_places_no_search_mouse"
input option as it was extremely slow. It now automatically fallbacks toall_places_no_search_ocr
which is the only option now.
2023-05-02
Hotfixes
- Scrolling in search and images didn't work at all for a while because the panel was moved.
- Remove invalid image URLs
2023-04-15
Changes
- Removed
proxyConfiguration
from input schema. This scraper works well with default datacenter proxies and changing it was causing issues. For special cases, proxy can stil be passed inproxyConfiguration
field in JSON input.
2023-04-11
BREAKING CHANGES
- Adjusting automatic zooming to set lower zoom for very small areas. Users rightfully complained that such high zoom produces too inefficient scrapes. The zoom curve is now flattened which means slighlty higher zoom for larger areas and significantly lower zoom for small areas. The highest automatic zoom is now capped at 17. The new example values are:
- United States - 10 zoom (10,371,139 km2)
- Germany - 12 zoom (380,878 km2)
- London - 15 zoom (1,595 km2)
- Manhattan - 16 zoon (87.5 km2)
- Soho - 17 zoom (0.35 km2)
2023-04-10
Features
- Correctly implement full geoJson specification for
customGeolocation
, you can now provide any validtype
. - Cache geolocation resolutions in global KV store to speed up the start and lessen dependency on OpenStreetMap API.
2023-04-06
BREAKING CHANGES & fixes
- Fixed and changed reviews translation after Google changed it. Now the
text
field contains the original text andtextTranslated
contains the translated text. - Due to this change,
reviewsTranslation
input setting is no longer required and was removed and we include both if available.
Features
- Added
reviewContext
andreviewDetailedRating
to all reviews where available. Examples in readme.
2023-04-04
BREAKING CHANGES
adjustZoomDynamically
is now used for all geolocation input types!locationQuery
is now the prominent location input with prefilled value
Fixes
customGeolocation
applies correct zoom again (this broke during the last release)
2023-03-29
BREAKING CHANGES
- Added
adjustZoomDynamically
input option. This changes the zoom from constant table based on geolocation type (country = zoom 12, city = zoom 15, etc.) into a calculated value based on area of the found location. Realistically, this means that very big countries might have 1-2 smaller zoom while very small areas might have 2-5 higher zoom to get more detailed scrape. Below are some examples from the new calculation:
- Minimum zoom from this is 19
- United States - 9 zoom (10,371,139 km2)
- Germany - 11 zoom (380,878 km2)
- London - 16 zoom (1,595 km2)
- Manhattan - 17 zoon (87.5 km2)
- Soho - 19 zoom (0.35 km2)
- Set
adjustZoomDynamically
to true forcustomGeolocation
.
We plan to make this the default zoom setting in the near future.
2023-03-22
Features
- Added support for shortened URLs (e.g.
https://goo.gl/maps/...
)
Fixes
- Extract
permanentlyClosed
value from JSON data - Fixed rare problem that sometimes after migration, not all URLs were processed
2023-03-15
Fixes
menu
output field is now extracted correctly again. The whole URL to the menu is provided now.
2023-03-14
Features
- Added
webResults
field to output. You have to enable that in input withincludeWebResults
field. There is a small performance impact when this is enabled.
2023-03-13
Fixes
- Large and low density countries like
Russia
andCanada
are now scraped with lower zoom to make the scrape more efficient. This applies only if whole country should be scraped.
2023-03-09
Features
- Add
locationQuery
input field. This can be used instead ofcountry
,state
,city
, etc. if those are not matching. This is mostly useful for very small states or regions. But it can also be used for free text description of the location. - Support
Dominica
country
2023-02-22
Features
- Add
reviewerPhotoUrl
andreviewImageUrls
field to review output
2023-02-22
Features
- Add
similarHotelsNearby
field to output.
2023-02-16
Fixes
- Updated
price
anddescription
extraction to support more languages.
2023-02-15
Features
- Improved
reviews
extraction, it's now faster and can extract more reviews.
2023-01-25
Fixes
- Fixed issue with
temporarilyClosed
field not being extracted properly in some cases.
Features
- Added
updatesFromCustomers
field to output.
"updatesFromCustomers": { "text": "Disneyland California Adventure small area with large park all inclusive celebrations. This is a glimpse into Los Reyes parade. I'm a true fan. Thanks", "language": "en", "postDate": "a week ago", "postedBy": { "name": "Kayla Arredondo", "url": "https://www.google.com/maps/contrib/102968882116587973980?hl=en-US", "title": "Local Guide", "totalReviews": 225 }, "media": [ { "link": "https://lh3.googleusercontent.com/ggms/AF1QipNNaoT0NSbcWOPSduvZNqJ0kSqUs-dod32FeBtr=m18", "postTime": "a week ago" } ] }
- Added
questionsAndAnswers
field to output.
"questionsAndAnswers": { "question": "Which is the best easier way to drop off a family to Disneyland Park", "answer": "best way for drop off family is at down town Disney. Drop them off then you can take a short walk to the park. ", "askDate": "5 years ago", "askedBy": { "name": "Cecilia Salcedo", "url": "https://www.google.com/maps/contrib/109041536347893604294" }, "answerDate": "5 years ago", "answeredBy": { "name": "Gabby Lujan", "url": "https://www.google.com/maps/contrib/105966144333216697667" } }
2023-01-24
Fixes
- Fixed extracting
reserveTableUrl
extraction for restaurants.
Features
- Add
reviewsFilterString
to input that enables you to filter reviews by search string. - Add
googleFoodUrl
field to output.
2023-01-24
Fixes
- Fixed place URL normalization sometimes not working. All place detail URL formats should work now, please open an issue if you find one that doesn't.
2023-01-24
Fixes
- Fixed and reworked
peopleAlsoSearch
. It is now in this format, more fields will be added to it:
"peopleAlsoSearch": [ { "category": "Czech restaurants", "title": "Restaurant Mlýnec", "reviewsCount": 2561, "totalScore": 4.7 } ]
Changes
popularTimesHistogram
,openingHours
,additionalInfo
andpeopleAlsoSearch
are now added to the data all the time. This meansincludeHistogram
,includeOpeningHours
,additionalInfo
andincludePeopleAlsoSearch
input fields no longer have any effect.- To exclude these from data on Apify platform, use the
omit
URL parameter (e.g. add to dataset URL&omit=popularTimesHistogram,openingHours,additionalInfo,peopleAlsoSearch
). This can also be chosen in the export UI.
2023-01-12
Features
- Add
reserveTableUrl
field to output for restaurants. - Add
reviewsTags
andplacesTags
fields to output.
2023-01-13
BREAKING CHANGE
- opening hours
- remove trailing "," after day
- always start with Monday (but only for English language)
2023-01-12
Features
- Add
description
field to output. - Now we scrape hotel prices, and add the selected
checkInDate
andcheckOutDate
fields to the output (The price for hotels is based on these dates). - If the place is a hotel, add
moreHotelsOptions
field to output.
2023-01-10
Changes
- The crawler now sets default maximum concurrency based on provided memory GBs. Currently, this is set to 4 times memory, so 4 GB actor will stop scaling up at 16 concurrency. This should prevent the crawler to overscale with network timeouts. You can still override this value with
maxConcurrency
input field. - The crawler sets starting concurrency at half the memory GBs, this is just improvement to help it start faster.
- Slowed down upscaling to make the crawling smoother and reduce timeouts.
2023-01-09
Fixes
- Gas price updateAt field is extracted correctly again (before this fix all dates were from 1970).
2023-01-02
Fixes
- All tiny countries (and states) now work properly (some only if used without other geolocation parameters like city).
2022-12-22
Features
- Add
searchMatching
to input that enables you to specify how the search term should match the place name.
2022-12-16
Fixes
- Some countries like
Korea
,Tanzania
andCongo
were not found by the scraper.
2022-12-06
Features
- Added
hotelStars
to output (example value "5-star hotel").
2022-11-22
Changes (to simplify input)
- Removed
lat
andlng
input fields from input schema but it will keep working as it is passed in input. Prefer using geolocation options likecity
orcountry
instead. You can also still use it in direct URLs. - Removed
maxAutomaticZoomOut
input field from input schema. It will also keep working as it is.
Features
- Added
claimThisBusiness
to output.
2022-11-21
Fixes
- Fixed wrong location assigned to some smaller countries.
2022-11-10
Features
- Added
imagesCount
to output. It is displayed even if you don't extract their URLs.
2022-09-23
Fixes
- BREAKING CHANGE: Removed
maxCrawledPlaces
from input completely (usemaxCrawledPlacesPerSearch
instead) - Fixed
maxCrawledPlacesPerSearch
leading to scraper being hang out in some cases
2022-09-06
Fixes
- Fixes unstable image extraction
2022-09-05
Fixes
- Final round of optimizations and fixes of the search process. The scraper is now probably the fastest is has ever been finally reaching about 100 places per 1 compute unit even with using geolocation.
2022-09-02
Fixes
- Several optimizations to speed up the search page (scrolling & enqueueing places)
- Fixed extraction of images
2022-08-16
Fixes
- Improve extraction of additional infos for hotels.
2022-08-15
Fixes
- Fixed actor sometimes finishing prematurely when there were still requests in the queue (caused by the new background enqueueing system)
2022-08-05
Fixes
- Fixed reviews duplications that sometimes happened.
- Fixed extraction of the temporarilyClosed field.
2022-08-03
Fixes
- Fixed reviews extraction. After Google's change, the scraper was giving only up to 10 reviews. Now it works fully again.
newest
doesn't sort properly though yet.
2022-07-21
Fixes
- Finish fast when less than 120 places are found on a page. Previous implementation waited several seconds extra.
2022-07-20
Fixes
- Search pages now use scrolling instead of pagination. This makes the crawling a little slower and reduces the maximum number of places per page from 400 to 120. Use geolocation with zoom to work around this reduction. We might increase the default zoom by 1. in the near future.
2022-05-19
Features
- Added
gasPrices
to output. Available only for gas stations in US to the best of our knowledge.
2022-05-02
Fixes
- subTitle extraction works now
2022-04-04
Fixes
- Blocked responses on the search page now properly retry the request (no more unhandled promise rejection)
- Smoother search page pagination
- More informative logs
- Fixed consent approval if browser crashes
2022-03-16
Fixes
maxCrawledPlaces
+exportPlaceUrls
was giving inconsistent number of results.
2022-03-14
Features
- Added
allPlacesNoSearch
to input. This option allows you to scrape all places shown on the map without the need for any search term. - Added
reviewsStartDate
to input to extract only reviews newer than this date. - Added
radiusKm
to thePoint
type incustomGeolocation
2022-03-04
Improvement
additionalInfo
extraction is faster now.additionalInfo
extraction for hotels and similar categories is more complete now: Data which is not displayed on the Google page but present in the Google response is also extracted.
2022-03-03
- Lowering the default zoom values. The past setup made the scraping too slow and costly. The new defaults will speed up the scraping a lot while missing only a few places. You can still manually override the
zoom
parameter. New default values are:country
orstate
-> 12county
-> 14city
-> 15postalCode
-> 16 no geolocation -> 12
2022-02-28
Fixes
location
extraction works in (almost) all cases now (search URLs and URLs with place IDs will always work).
2022-02-21
Features
- Added
oneReviewPerRow
to input to enable expanding reviews one per output row
2022-02-17
Fixes
openingHours
extraction works in almost all cases now (search URLs and URLs with place IDs will always work).
2022-01-12
- Start URLs now correctly work from uploaded CSV files or Google Sheets. It uses to trim part of the URL.
2022-01-11
- Changed
polygon
input field tocustomGeolocation
- Added deeper section into Reamde on how you can provide your own exact coordinates
2022-01-11
Breaking changes We decided it is time to change several default parameters to make the user experience smoother. These changes should not have a big effect on currect users.
city
and other geolocation parameters will have preference overlat
&long
if both are used (in 99% cases users want to use the automatic location splitting to get the most results which doesn't work with directlat
&long
)zoom
will no longer have a default value 12. Instead, it will change based on geolocation type like this:
country
or state
-> 12
county
-> 14
city
-> 17
postalCode
-> 18
no geolocation -> 12
Users will still be able to specify the zoom and override this behavior.
See Readme for more details
2021-12-14
Breaking change
reviewsSort
is now set tonewest
by default. This is because some places don't yield all reviews on other sortings (we are not sure if this is a bug or silent block on Google's side)
2021-11-15
Fixes
exportPlaceUrls
now properly dedupes the URLs- added
categories
fields listing all categories the place is listed in
2021-11-11
Fixes
- Fixed
additionalInfo
for hotels - Fixed
exportPlaceUrls
not checking for correct geolocation
2021-11-09
Fixes
website
field now displays the full URL. This fixes issue of blankfacebook.com
links.
2021-11-05
Fixes
- Fixed new layout of
additionalInfo
2021-11-03
Fixes
- Improved reliability of scraping place detail, reviews and images (improving scrolling and back button interaction)
2021-10-13
Features
- Added
menu
to output - Added
price
to output
2021-10-07
Fixes
- Fixed
popularTimesHistogram
which caused crash on some pages
2021-09-27
Fixes
- Fixed image extraction & make it optional (it should not crash the whole scrape)
2021-09-15
Fixes
- Fixed
temporarilyClosed
andpermanentlyClosed
- Added a step for normalizing input Start URLs because those with wrong format don't contain JSON data
2021-09-14
Fixes
- Fixed popular times live and histogram
2021-09-10
https://github.com/drobnikj/crawler-google-places/pull/185 https://github.com/drobnikj/crawler-google-places/issues/181
Fixes
- In like 10% cases, the reviews are in wrong order and there is less of them. We didn't find a root cause yet but we retry the page so the output gets corrected.
2021-09-07
Breaking fix
- If you did not pass
maxReviews
in the input at all (undefined
), it scraped 5 reviews as default. That was against the input schema description so it is now fixed to scrape 0 reviews in those cases.
2021-09-01
Fixes
- Fixed
placeId
extraction that was broken for some inputs - Fixed missing
imageUrls
Features
- Added option to input URLs with CID (Google My Business Listing ID) to start URLs, e.g. https://maps.google.com/?cid=12640514468890456789
- Added
cid
to output
2021-08-25
Fixes
- Fixed
maxCrawledPlaces
not finishing quickly for large country-wise searches.maxCrawledPlacesPerSearch
still has this problem
2021-08-12
Fixes
- Fixed problem that
startUrls
was not picking up all provided URLs sometimes (due to automaticuniqueKey
resolution) likesCount
in reviews
2021-08-06
Fixes
maxCrawledPlaces
now compares to total sum of all places
Features
- Added
maxCrawledPlacesPerSearch
to limit max places per search term or search URL
2021-07-26
Fixes
-
Address is now parsed correctly into components even when you supply direct place IDs
-
Migrated code from
apify
0.22.5 to 1.3.1
2021-07-13
- Added
county
to geolocation options
2021-06-03
Fixes (hopefully last fixes after the layout change)
- Scraping all images per place works again
- Fixed
additionalInfo
- Fixed
openiningHours
2021-06-03
Fixes
- Fix handling of search pages without results
- Skip empty searches that sometimes users accidentally post
2021-05-25
Features
- Added orderBy attribute to result scrape
2021-05-18
Fixes
- Fully or partially fixed consent screen issues
- Should also help with
Failed to set the 'innerHTML' property on 'Element': This document requires 'TrustedHTML' assignment.
which is caused by injecting JQuery into constent screen
2021-04-29
Fixes
- Fixed
reviewsTranslation
2021-04-28
Fixes after Google changed layout, not everything was fixed. Next batch of fixed asap!
- Fixed additional data
- Fixed search pagination getting into infinite loop
- Fixed empty search handling
- Fixed reviews not being scraped
- Fixed
totalScore
2021-03-22
Warning - Next version will be a breaking one as we will remove personal data from reviews by default. You will have to explicitly enable the fields below. Features
- Added input fields to selectively pick which personal data fields to scrape -
scrapeReviewerName
,scrapeReviewerId
,scrapeReviewerUrl
,scrapeReviewId
,scrapeReviewUrl
,scrapeResponseFromOwnerText
2021-03-17
Fixes
- Removed duplicate reviews + all reviews scraped correctly
reviewsSort
finally works correctly- Reviews scraping is now significantly faster
- Handle error that irregularly happened when scraping huge amount of reviews
Features
- Added
reviewsDistribution
- Added
publishedAtDate
(exact date),responseFromOwnerDate
andresponseFromOwnerText
for each review
2021-03-10
Fixes:
totalScore
andreviewsCount
are now correctly extracted for all languagesstartUrls
now correctly work non-.com domains and on detail places
2021-02-02
Fixes:
- Search keyword that links only to a single place (like
"London Eye"
) now works correctly
2021-01-27
Features:
- Address is parsed into
neighborhood
,street
,city
,postalCode
,state
andcountryCode
fields - Added
reviewsTranslation
option to adjust how Google translates reviews from non-English languages - Parsing ads. This means a bit more results. Those that are ads have
"isAdvertisement": true
field. - Added
useCachedPlaces
option to load places from your KV Store. Useful if you need to scrape the same places regularly. - Added
polygon
option to provide your own geolocation polygon.
Fixes:
- This one is big. We removed the infamous
Place is outside of required location (polygon)
error. The location of a place is now checked during paginating and these places are skipped. This means a massive speed up of the scraper.
2021-01-11
Features:
- Automatic screenshots of errors to see what went wrong
- Added
searchPageUrl
to output - Added
PLACES-OUT-OF-POLYOGON
record to Key-Value store. You can check what places were excluded.
Fixes:
- Fixed rare bug with saving stats
- Improvement in review sorting - but it is still not ideal, more work needs to be done
2020-11-16
- Added postal code geolocation to input
- Improved errors when location is not found
- Optimization - Removed geolocation data from intermediate requests
2020-10-29
- Fixed handling of Google consent screen
- Better input validation and deprecation logs
- Changed default for
maxImages
to1
as it doesn't require scrolling for the main image imageUrls
are returned with the highest resolution
2020-10-27
- Removed
forceEng
input in favor oflanguage
2020-10-15
- The default setup now uses
maxImages: 0
andmaxReviews: 0
to improve efficiency
2020-10-01
- added several browser options to input -
maxConcurrency
,maxPageRetries
,pageLoadTimeoutSec
,maxPagesPerBrowser
,useChrome
- rewamped input schema and readme
- Added
reviewerNumberOfReviews
andisLocalGuide
to reviews
2020-09-22
- added few extra review fields (ID, URL)
2020-07-23 small features
New features
- add an option for caching place location
- add an option for sorting of reviews
- add stats logging
2020-07 polygon search and bug fixes
breaking change
- reworked input search string
Bug fixes
- opening hour parsing (#39)
- separate locatedIn field (#32)
- update readme
New features
- extract additional info - Service Options, Highlights, Offerings,.. (#41)
- add
maxReviews
,maxImages
(#40) - add
temporarilyClosed
andpermanentlyClosed
flags (#33) - allow to scrape only places urls (#29)
- add
forceEnglish
flag into input (#24, #21) - add searching in polygon using nominatim.org
- add startUrls
- added
maxAutomaticZoomOut
to limit how far can Google zoom out (it naturally zooms out as you press next page in search)
- 2023-05-25 (0.14.206)
- 2023-05-22
- 2023-05-02
- 2023-04-15
- 2023-04-11
- 2023-04-10
- 2023-04-06
- 2023-04-04
- 2023-03-29
- 2023-03-22
- 2023-03-15
- 2023-03-14
- 2023-03-13
- 2023-03-09
- 2023-02-22
- 2023-02-22
- 2023-02-16
- 2023-02-15
- 2023-01-25
- 2023-01-24
- 2023-01-24
- 2023-01-24
- 2023-01-12
- 2023-01-13
- 2023-01-12
- 2023-01-10
- 2023-01-09
- 2023-01-02
- 2022-12-22
- 2022-12-16
- 2022-12-06
- 2022-11-22
- 2022-11-21
- 2022-11-10
- 2022-09-23
- 2022-09-06
- 2022-09-05
- 2022-09-02
- 2022-08-16
- 2022-08-15
- 2022-08-05
- 2022-08-03
- 2022-07-21
- 2022-07-20
- 2022-05-19
- 2022-05-02
- 2022-04-04
- 2022-03-16
- 2022-03-14
- 2022-03-04
- 2022-03-03
- 2022-02-28
- 2022-02-21
- 2022-02-17
- 2022-01-12
- 2022-01-11
- 2022-01-11
- 2021-12-14
- 2021-11-15
- 2021-11-11
- 2021-11-09
- 2021-11-05
- 2021-11-03
- 2021-10-13
- 2021-10-07
- 2021-09-27
- 2021-09-15
- 2021-09-14
- 2021-09-10
- 2021-09-07
- 2021-09-01
- 2021-08-25
- 2021-08-12
- 2021-08-06
- 2021-07-26
- 2021-07-13
- 2021-06-03
- 2021-06-03
- 2021-05-25
- 2021-05-18
- 2021-04-29
- 2021-04-28
- 2021-03-22
- 2021-03-17
- 2021-03-10
- 2021-02-02
- 2021-01-27
- 2021-01-11
- 2020-11-16
- 2020-10-29
- 2020-10-27
- 2020-10-15
- 2020-10-01
- 2020-09-22
- 2020-07-23 small features
- New features
- 2020-07 polygon search and bug fixes
- breaking change
- Bug fixes
- New features