SCOUTR Nordics (Google Maps Scraper)
Pricing
Pay per event
SCOUTR Nordics (Google Maps Scraper)
Under maintenanceCheaper Google Maps scraper for Nordic countries. Geocodes a start address, finds businesses within a radius, and extracts name, address, website, phone. Visits each website to fetch real contact information. Other countries will be added in different actors when the code is fully optimized.
0.0 (0)
Pricing
Pay per event
3
5
2
Last modified
22 minutes ago
An Apify Actor that:
- Geocodes a start address (Nominatim, with maps.co fallback).
- Searches Google Local results (
tbm=lcl) for your keywords across a hex-grid of centers within a given radius. - Extracts name, address, rating, reviews, phone, website, Maps URL from local result cards.
- If no website is present, performs a fallback web search (Google with DuckDuckGo/Bing fallback) constrained to the country’s ccTLD.
- Crawls the website (same registered domain, a few pages) to collect emails, additional phones, and social links when enabled.
- Normalizes phones to E.164 for Nordic countries and filters out junk emails/platform domains.
- Adds compact debug logs so you can tell if it’s actually working or just philosophizing.
Optimized for Nordic countries: Norway, Sweden, Denmark, Finland, Iceland. Language (hl), region (gl), and ccTLD filters adapt automatically based on geocoding or a manual country override.
ToS & legality: Scraping Google HTML may violate their Terms of Service. Use responsibly, at low rates, with proxies as needed, and comply with local laws and site policies. This Actor avoids official Google APIs and parses public HTML only.
What’s new (2025-11-07)
Quality & correctness
- Address cleanup: removes hours/amenities blobs like
Åpen · Stenger 17 · 61 25 13 31 · Henting i butikk · Levering. Also extracts address from both/maps/dir//…and/maps/place/…slugs, picking the richest one. Prefers candidates containing postcode + city when available. - Deduplication: global cross-tile and cross-keyword dedupe by
(name, address)so the same venue isn’t emitted multiple times.
Stability & performance
- Anti-stall work queue: keywords × tiles are queued and processed by a bounded worker pool; each keyword respects a global
max_per_keywordcap across all tiles. - Backoff & pacing: per-request sleeps, retry/backoff on
429/403, and gentle HTML fetch cadence to reduce throttling. - Compact debug logs: high-signal log lines at each major step:
[boot],[geocode],[sched],[tile],[google.lcl],[site.search],[crawl],[addr],[push],[done].
Input
{"start_address": "Karl Johans gate 1, Oslo, Norway","range_meters": 2000,"keywords": ["restaurant", "florist", "bakery"],"max_per_keyword": 120,"tile_m": 1200,"fetch_emails": true,"fallback_search": "google","max_pages_per_site": 5,"request_timeout_s": 20,"concurrency": 6,"country_override": null,"respect_robots": false,"user_agent": null}
Parameters
| Key | Type | Default | Description |
|---|---|---|---|
start_address | string | — | Geocoding center (Nominatim; maps.co fallback). |
range_meters | integer | 2000 | Radius around the center to cover with a hex grid of search points. |
keywords | array[string] | — | Categories/queries searched via Google Local (tbm=lcl). |
max_per_keyword | integer | 120 | Hard cap on total items per keyword across all grid tiles. Prevents task explosion. |
tile_m | integer | 1200 | Approx spacing between grid points. Smaller means denser coverage and more requests. |
fetch_emails | boolean | true | If true, crawl discovered websites (same registered domain) for emails/phones/socials. |
fallback_search | enum | "google" | Fallback site search provider order; rotates across Google, DuckDuckGo, Bing. |
max_pages_per_site | integer | 5 | Crawl budget per site. |
request_timeout_s | integer | 20 | HTTP timeout for requests. |
concurrency | integer | 6 | Worker count for keyword×tile jobs and a separate limiter for site crawling. |
country_override | string | null | Force country ISO2 (NO/SE/DK/FI/IS) when geocoding is ambiguous. |
respect_robots | boolean | false | When true, skip paths disallowed by robots.txt during site crawl. |
user_agent | string | null | Override the default desktop UA if needed. |
Output
Each dataset item is a JSON object like:
{"_type": "listing","source": "google_lcl","keyword": "florist","search_center": {"lat": 59.9139, "lng": 10.7522},"start_center": {"lat": 59.9139, "lng": 10.7522},"distance_m": 340,"country": "NO","name": "Blomst AS","address": "Dronningens gate 1, 0152 Oslo","rating": 4.6,"reviews": 37,"phone": "+4722000000","phones_from_site": ["+4722000000"],"gmaps_url": "https://www.google.com/maps/place/...","website": "https://blomst.no/","emails": ["post@blomst.no"],"social": {"instagram": ["https://www.instagram.com/blomst/"]},"fallback_used": false}
Field notes
address: cleaned to exclude hours/amenities/phones; prefers variants containing postcode and city.distance_m: if the Maps URL provides coordinates, distance is from the start center; else null.phone: first phone parsed from the local card text and normalized to E.164 for the target country.phones_from_site: phones harvested from the website, normalized to E.164.emails: filtered to avoid platforms (CDNs, analytics, ESPs). Same-domain prioritized; common freemail allowed.fallback_used: true if the website came from the fallback web search rather than the local card.
How it works
-
Geocoding: resolves
start_addressto lat/lng and a richaddressdict used to setgl/hland ccTLD constraints. -
Hex grid: builds a compact hexagon grid of points covering
range_meters. -
Local search: for each
(keyword, grid point)the actor fetchestbm=lclHTML pages in steps of 10 results, with backoff on throttling. -
Parsing: extracts name, Maps URL, ratings, reviews, and a cleaned address using three sources:
- Card text (with category/headline and hours removed)
/maps/dir//…address segment/maps/place/…slug when it looks address-like Picks the richest candidate, preferring ones with postcode + city.
-
Website: uses visible “Website” buttons and, if missing, performs a ccTLD-biased fallback search.
-
Crawl (optional): up to
max_pages_per_site, prioritizing/kontakt,/contact,/about, etc.- Emails from visible text,
mailto:, JSON-LD, meta tags, attributes, Cloudflaredata-cfemail, JavaScript string tricks, and base64. - Phones normalized to E.164 with country rules; rejects times/prices.
- Social links captured if present.
- Emails from visible text,
-
Deduplication:
(name, address)is used as a global key across all tiles and keywords. -
Throttling/backoff: per-request pacing, retry/backoff on
429/403, and modest default concurrency.
Debug logging
The actor writes compact, high-signal logs:
- [boot] startup, input summary, proxy usage
- [geocode] geocoding provider progress
- [sched] number of tiles, number of keywords, job queue size, workers
- [tile] which keyword/tile is executing
- [google.lcl] fetch/page status, result counts, backoffs
- [site.search] fallback site search attempts/hits
- [crawl] per-site crawl start and summary (emails/phones found)
- [addr] address pipeline result
- [push] each pushed item with short summary
- [done] completion
Tip: control verbosity via Apify log level, e.g. set
APIFY_LOG_LEVEL=DEBUGfor detailed traces.
Performance tips
- Keep
concurrencymodest; prefer residential proxies if you see many429/403. - For large areas, increase
tile_mfirst before increasingrange_meters. - Use
max_per_keywordto prevent runaway item counts when keywords are broad. - Set
fetch_emails=falsefor fast discovery runs; re-crawl email data in a second pass.
Troubleshooting
Duplicates in output
- Ensure you’re on the current build with global
(name, address)dedupe across tiles and keywords.
Addresses include hours or are missing postcode/city
- The new address resolver strips hours/amenities and merges candidates from the card text and Maps URL slugs, preferring those with postcode + city.
Actor “stalls” with many keywords
- The job queue and per-keyword caps prevent stalls. Check logs for
[sched] jobs queued=… workers=…and ongoing[google.lcl]or[crawl]lines. - If logs are quiet, raise
APIFY_LOG_LEVELtoDEBUGtemporarily to observe progress.
Few or no emails
- Many sites hide emails; try increasing
max_pages_per_siteslightly. - Ensure
fetch_emails=true. - Consider running during local business hours when some sites expose contact widgets.
Roadmap / future improvements
- Deeper Maps parsing: structured card parsing for more stable address and coordinates extraction.
- Entity resolution: fuzzy dedupe across alternate names and addresses.
- Smart pagination: adaptive stop rules based on per-keyword coverage quality.
- Site crawl heuristics: sitemap discovery and targeted link scoring for contact pages.
- Language models for email extraction: context-aware extraction where regex/DOM misses.
- Rate-aware scheduler: dynamic concurrency based on recent throttle signals.
- Optional CSV/Parquet export: direct tabular outputs with schema validation.
Notes & tips
- Blocking: Google may throttle or redirect. Use Apify proxy with appropriate pools and keep concurrency conservative.
- Languages: Parsing handles multiple container patterns and Nordic website-button labels (
nettside,webbplats, etc.). - Email quality: Filters out ESPs/CDNs/analytics/platform domains. Allows common freemail.
- Robots: Enable
respect_robotswhen policy requires it; note some sites block generic crawler paths. - Legal: Always ensure your use complies with local laws and target site terms.
Supported & planned regions
| Region | Status | Details | Link |
|---|---|---|---|
| Nordics | Optimized | Last optimized: 2025-11-07 (NO/SE/DK/FI/IS) | https://apify.com/odaudlegur/scoutr-nordics-google-maps-scraper |
| Western EU | Planned | — | — |
| Eastern EU | Planned | — | — |
| North America | Not started | — | — |
| South America | Not started | — | — |
| East/SE Asia | Not started | — | — |
| Middle East | Not started | — | — |
| Africa | Not started | — | — |
| Oceania | Not started | — | — |
Create an issue if you’d like your country prioritized.
Changelog
-
2025-11-07
- Address cleanup to remove hours/amenities/inline phones from address text.
- Address enrichment from
/maps/dir//…and/maps/place/…slugs; prefer postcode + city. - Global
(name, address)dedupe across tiles and keywords. - Anti-stall queue with per-keyword caps; improved backoff on throttling.
- Compact debug logs per major step.
-
2025-11-04
- Nordic tuning for language/region and ccTLD-biased fallback search.
- Email filtering improvements and E.164 phone normalization.
Disclaimer & License
This Apify Actor is provided “as is”, without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Please follow local laws and do not use for malicious purposes.
ToS & legality (Reminder): Scraping Google HTML may violate their Terms of Service. Use responsibly, at low rates, with proxies if needed, and comply with local laws and site policies. This Actor avoids official Google APIs and parses public HTML only.
© 2025 SLSH. All rights reserved. Copying or modifying the source code is prohibited.
On this page
Share Actor:



