SCOUTR Nordics (Google Maps Scraper)
Pricing
Pay per event
SCOUTR Nordics (Google Maps Scraper)
Specialized Google Maps scraper for Nordic countries. Geocodes a start address, finds businesses within a radius, and extracts name, address, website, phone. Visits each website to fetch real contact information. Other countries will be added in different actors when the code is fully optimized.
Pricing
Pay per event
Rating
0.0
(0)
Developer

SLASH
Actor stats
4
Bookmarked
5
Total users
2
Monthly active users
8 days ago
Last modified
Categories
Share
An Apify Actor that:
- Geocodes a start address (Nominatim, with maps.co fallback + caching and safety guards).
- Searches Google Local results (
tbm=lcl) for your keywords across a hex-grid of centers within a given radius. - Extracts name, address, rating, reviews, phone, website, Maps URL from local result cards.
- If no website is present, performs a fallback web search (Google with DuckDuckGo/Bing fallback) constrained to the country’s ccTLD.
- Crawls the website (same registered domain, a few pages) to collect emails, additional phones, and social links when enabled.
- Normalizes phones to E.164 for Nordic countries and filters out junk emails/platform domains.
- Adds compact debug logs so you can tell if it’s actually working instead of silently hanging.
Optimized for Nordic countries: Norway, Sweden, Denmark, Finland, Iceland. Language (hl), region (gl), and ccTLD filters adapt automatically based on geocoding or a manual country override.
ToS & legality: Scraping Google HTML may violate their Terms of Service. Use responsibly, at low rates, with proxies as needed, and comply with local laws and site policies. This Actor avoids official Google APIs and parses public HTML only.
What’s new (2025-11-13)
Quality & correctness
- Single record per venue: each
(name, address)combination is emitted once per keyword. There is no longer a “basic record” followed by a second “enriched record” in the dataset. - Global deduplication: global cross-tile, cross-keyword dedupe by
(name, address)so the same venue isn’t emitted multiple times when it appears in overlapping tiles. - Keyword buckets & expansion: you can now use broad category labels like
"Food & Drinks"or"Retail & Stores"; the actor expands them into multiple concrete search terms (e.g.restaurant,bakery,café, etc.) and then deduplicates results. - Distance fallback (optional):
enable_distance_fallbackcan estimatedistance_mby geocoding the listing address when the Maps URL doesn’t expose coordinates. - Address enrichment with caching: address-based geocoding (for enrichment and distance fallback) is cached in a TTL cache, so the same address is only geocoded once per run, improving consistency and speed.
Stability & performance
-
Prime tile execution & early-stop: the actor processes all keywords on the first grid point (page 0) before starting the worker pool. If no results are found for any keyword on that first tile, the run ends early with an empty dataset instead of wasting time on a hopeless grid.
-
Anti-stall work queue: keywords × tiles are queued and processed by a bounded worker pool; each keyword respects a global
max_per_keywordcap across all tiles. -
Global Google rate limiting: Google Local requests are funneled through a shared gate + token bucket, so multiple tiles/keywords don’t collectively hammer Google and trigger hard throttling.
-
Backoff & pacing: per-request sleeps, retry/backoff on
429/403, and gentle HTML fetch cadence to reduce throttling. -
Timeout guards:
- Hard timeouts around tile processing (
workerjobs) and dataset writes to prevent long-running stalls on a single tile or record. - Timeboxed geocoding and address enrichment so slow geocoders cannot freeze a page.
- Hard timeouts around tile processing (
-
Geocoding hard-fail guard: after repeated Nominatim/maps.co failures, the actor disables further geocoding for the rest of the run, so the rest of the pipeline can continue.
-
Compact debug logs: high-signal log lines at each major step:
[boot],[geocode],[sched],[prime],[tile],[google.lcl],[site.search],[crawl],[addr],[push],[done].
Input
{"start_address": "Karl Johans gate 1, Oslo, Norway","range_meters": 2000,"keywords": ["restaurant", "florist", "bakery"],"max_per_keyword": 120,"tile_m": 1200,"fetch_emails": true,"fallback_search": "google","max_pages_per_site": 5,"request_timeout_s": 20,"concurrency": 6,"country_override": null,"respect_robots": false,"user_agent": null,"enable_distance_fallback": false}
Parameters
| Key | Type | Default | Description |
|---|---|---|---|
start_address | string | — | Geocoding center (Nominatim; maps.co fallback). |
range_meters | integer | 2000 | Radius around the center to cover with a hex grid of search points. |
keywords | array[string] | — | Categories/queries searched via Google Local (tbm=lcl). Can be specific terms ("florist", "dentist") or broad buckets like "Food & Drinks". |
max_per_keyword | integer | 120 | Hard cap on total items per keyword across all grid tiles. Prevents task explosion. |
tile_m | integer | 1200 | Approx spacing between grid points. Smaller means denser coverage and more requests. |
fetch_emails | boolean | true | If true, crawl discovered websites (same registered domain) for emails/phones/socials. |
fallback_search | enum | "google" | Fallback site search provider order; rotates across Google, DuckDuckGo, Bing. |
max_pages_per_site | integer | 5 | Crawl budget per site. |
request_timeout_s | integer | 20 | HTTP timeout for requests. |
concurrency | integer | 6 | Worker count for keyword×tile jobs and a separate limiter for site crawling. |
country_override | string | null | Force country ISO2 (NO/SE/DK/FI/IS) when geocoding is ambiguous. |
respect_robots | boolean | false | When true, skip paths disallowed by robots.txt during site crawl. |
user_agent | string | null | Override the default desktop UA if needed. |
enable_distance_fallback | boolean | false | When true, geocodes listing addresses to estimate distance_m if Maps coordinates are missing. |
Output
Each dataset item is a JSON object like:
{"_type": "listing","source": "google_lcl","keyword": "florist","search_center": {"lat": 59.9139, "lng": 10.7522},"start_center": {"lat": 59.9139, "lng": 10.7522},"distance_m": 340,"country": "NO","name": "Blomst AS","address": "Dronningens gate 1, 0152 Oslo","rating": 4.6,"reviews": 37,"phone": "+4722000000","phones_from_site": ["+4722000000"],"gmaps_url": "https://www.google.com/maps/place/...","maps_url": "https://www.google.com/maps/place/...","website": "https://blomst.no/","emails": ["post@blomst.no"],"social": {"instagram": ["https://www.instagram.com/blomst/"]},"fallback_used": false,"lat": 59.9139,"lon": 10.7522,"start_address": "Karl Johans gate 1, Oslo, Norway","range_meters": 2000,"keywords": ["restaurant", "florist", "bakery"]}
Field notes
-
address: cleaned to exclude hours/amenities/phones; prefers variants containing postcode and city. -
distance_m:- Primary: distance from the start center when the Maps URL exposes coordinates.
- Optional fallback: when
enable_distance_fallback=true, attempts to geocode the listing address and compute distance from the start center. - May be
nullwhen neither coordinate source is available or geocoding fails.
-
phone: first phone parsed from the local card text and normalized to E.164 for the target country. -
phones_from_site: phones harvested from the website, normalized to E.164. -
emails: filtered to avoid platforms (CDNs, analytics, ESPs). Same-domain prioritized; common freemail allowed. -
fallback_used:trueif the website came from the fallback web search rather than the local card. -
Uniqueness: per keyword, each
(name, address)pair appears at most once in the dataset, even if it shows up in multiple tiles.
How it works
-
Geocoding: resolves
start_addressto lat/lng and a richaddressdict used to setgl/hland ccTLD constraints. -
Keyword expansion: the
keywordslist is normalized and expanded. Broad buckets like"Food & Drinks"or"Retail & Stores"fan out into many concrete queries (e.g.restaurant,bakery,café,supermarket), then deduped. -
Hex grid: builds a compact hexagon grid of points covering
range_meters. -
Prime pass for early results: the first grid point is processed for all keywords (page 0) before spinning up the worker pool, so the dataset gets initial rows quickly. If nothing is found on this prime tile, the actor stops early with an empty dataset.
-
Local search: for each
(keyword, grid point)the actor fetchestbm=lclHTML pages in steps of 10 results, with backoff on throttling. -
Parsing: extracts name, Maps URL, ratings, reviews, and a cleaned address using three sources:
- Card text (with category/headline and hours removed)
/maps/dir//…address segment/maps/place/…slug when it looks address-like It then picks the richest candidate, preferring ones with postcode + city.
-
Website: uses visible “Website” buttons and, if missing, performs a ccTLD-biased fallback search.
-
Crawl (optional): up to
max_pages_per_site, prioritizing/kontakt,/contact,/about, etc.- Emails from visible text,
mailto:, JSON-LD, meta tags, attributes, Cloudflaredata-cfemail, JavaScript string tricks, and base64. - Phones normalized to E.164 with country rules; rejects times/prices.
- Social links captured if present.
- Emails from visible text,
-
Distance fallback (optional): when
enable_distance_fallbackis true and Maps coordinates are missing, the actor geocodes the listing address and recomputesdistance_m. -
Deduplication:
(name, address)is used as a global key across all tiles and keywords; only one enriched record per venue per keyword is emitted. -
Throttling/backoff: per-request pacing, retry/backoff on
429/403, global token bucket for Google Local, and modest default concurrency.
Debug logging
The actor writes compact, high-signal logs:
- [boot] startup, input summary, proxy usage
- [geocode] geocoding provider progress
- [sched] number of tiles, number of keywords, job queue size, workers
- [prime] initial prime tile/keyword execution for early dataset results (and early-exit when no results exist)
- [tile] which keyword/tile is executing
- [google.lcl] fetch/page status, result counts, backoffs
- [site.search] fallback site search attempts/hits
- [crawl] per-site crawl start and summary (emails/phones found)
- [addr] address pipeline result
- [push] each pushed item with short summary
- [done] completion
Performance tips
- Keep
concurrencymodest; prefer residential proxies if you see many429/403. - For large areas, increase
tile_mfirst before increasingrange_meters. - Use
max_per_keywordto prevent runaway item counts when keywords are broad (especially when using buckets like"Food & Drinks"). - Set
fetch_emails=falsefor fast discovery runs; re-crawl email data in a second pass. - If you only care about approximate distance and many listings lack coordinates, consider enabling
enable_distance_fallback.
FAQ
I’m not technical. How am I supposed to use this?
In simple terms:
-
Pick a starting point Type a real-world address in
start_address(for example: “Karl Johans gate 1, Oslo, Norway”). This is the center of your search. -
Choose how far to search Set
range_metersto how far around that point you care about. 2,000–3,000 meters = “neighborhood / city center” Larger values = more area, more results, more time. -
Tell it what you’re looking for In
keywords, you can:- Put specific things:
"restaurant","florist","dentist","kindergarten". - Or use broad groups like
"Food & Drinks","Retail & Stores","Health & Beauty". The actor automatically expands these into many detailed searches.
- Put specific things:
-
Decide if you want emails or just places
fetch_emails = true: slower, but tries to visit each website to collect emails, phones, and social links.fetch_emails = false: much faster, returns basic listing info (name, address, phone, website, etc.).
-
Run the actor and wait The actor will:
- Geocode your start address.
- Sweep the area tile by tile.
- Collect matching businesses into the dataset.
-
Download your dataset When it’s done, open the dataset in Apify and export as JSON, CSV, XLSX, whatever you prefer.
What counts as a “keyword”? Can I just say “Food & Drinks”?
Yes.
You have two options:
-
Specific keyword Example:
"florist"→ the actor searches Google for florists near your area. -
Bucket keyword (broad group) Example:
"Food & Drinks"→ the actor internally expands this into many concrete queries likerestaurant,bakery,café,kiosk,bar,pub,pizzeria, etc., then deduplicates the results.
Using buckets is helpful when you don’t know all the specific labels but you want broad coverage. Just remember: broader buckets = more searches = more time and more results.
Why does it take so long sometimes?
Because the actor is doing a lot of work to avoid getting you blocked and to squeeze out contact data:
-
It scans an area, not just a single point Your
range_metersis turned into a hexagon grid of many small “tiles” around your start address. Each tile runs a search for each keyword. -
Google results come in pages of 10 For each tile, the actor walks results page by page and parses each listing card.
-
It has to be polite to Google The actor:
- Slows down between requests.
- Backs off when it sees “too many requests” or access denied.
- Uses a global rate limiter so multiple workers don’t overload Google at once.
This means more safety, but no instant gratification.
-
Website crawling is heavier than just searching If
fetch_emails=true:- Each site may have several pages visited (
/contact,/about, etc.). - It reads the HTML and JavaScript, tries to decode obfuscated emails, and normalizes phone numbers.
- All this has to respect timeouts so one slow site doesn’t freeze the whole run.
- Each site may have several pages visited (
-
Safety timeouts and retries There are time limits around:
- Tile processing
- Geocoding
- Website crawling
- Writing to the dataset
These keep things from hanging forever, but they also mean the actor would rather wait a bit and retry than fail immediately.
So if you give it:
- A large radius
- Very broad buckets (
"Food & Drinks","Retail & Stores","Events & Hospitality") fetch_emails = true- And high
max_per_keyword
…it will happily eat time while collecting a large, enriched lead list.
How do I run it for “maximum data”?
Use settings like this when you want as many enriched leads as possible, not speed:
range_meters: 3,000–7,000 (or more, depending on city size).tile_m: 1,000–1,500 (denser tiles if you want fewer gaps).keywords: mix of specific terms and buckets (e.g."Food & Drinks","Retail & Stores","Health & Beauty").max_per_keyword: 200–400 (or higher, but be mindful of dataset size).fetch_emails:true.max_pages_per_site: 5–8 (more pages = better chance of finding emails).enable_distance_fallback:trueif you care about distance and many listings miss coordinates.concurrency: moderate (e.g. 6–10) with decent proxies to avoid throttling.
This mode is for when you’re building contact/lead lists and are fine with the job taking longer.
How do I run it for a “fast scan”?
If you just want a quick overview of places, try:
range_meters: 1,000–2,000.tile_m: 500–2,000 (fewer tiles).keywords: more focused ("restaurant","café","florist"), avoid huge buckets.max_per_keyword: 50–100.fetch_emails:false(this is the biggest speed boost).max_pages_per_site: ignored iffetch_emails=false.enable_distance_fallback:false(skip extra geocoding).concurrency: 4–6.
You’ll still get name, address, phone, website, Maps URL, rating, reviews, but you skip the heavier website crawling work.
Why does it sometimes finish quickly with no results?
Two common reasons:
-
The prime tile found nothing The actor first tries all keywords at the first grid point. If literally nothing is returned for any of them, it assumes:
- The keywords are not a good match for that area; or
- There’s some issue with the search that will repeat everywhere.
In that case it stops early and returns an empty dataset instead of pointlessly scanning the rest of the grid.
-
Your keywords are very niche or misspelled If you’re using very specific or misspelled keywords, Google may have few or no local results. Try:
- More general terms (
"restaurant"instead of"organic vegan fine dining"), or - Bucket labels like
"Food & Drinks".
- More general terms (
Troubleshooting
No or very late results in the dataset
- The actor runs a prime pass on the first grid point for all keywords (page 0) before the worker pool. You should see early results from that tile; if not, check logs around
[prime],[tile], and[push]. - If the prime tile returns zero results for all keywords, the actor ends early with an empty dataset by design.
- Verify the dataset view is filtered to the current run.
Duplicates in output
-
Current builds emit one enriched record per
(name, address, keyword). -
If you still see identical rows, check for:
- Slight address variations (e.g. different postcode or formatting).
- Different keywords targeting the same venue (this is expected: per keyword, not globally).
Addresses include hours or are missing postcode/city
- The address resolver strips hours/amenities and merges candidates from the card text and Maps URL slugs, preferring those with postcode + city.
- If the card address is very minimal, enabling
enable_distance_fallbackmay also improve downstream data quality via geocoding enrichment.
Actor “stalls” with many keywords
- The job queue and per-keyword caps prevent stalls. Check logs for
[sched] jobs queued=… workers=…and ongoing[google.lcl]or[crawl]lines. - If logs are quiet, raise
APIFY_LOG_LEVELtoDEBUGtemporarily to observe progress.
Few or no emails
- Many sites hide emails; try increasing
max_pages_per_siteslightly. - Ensure
fetch_emails=true. - Consider running during local business hours when some sites expose contact widgets.
Roadmap / future improvements
- Deeper Maps parsing: structured card parsing for more stable address and coordinates extraction.
- Entity resolution: fuzzy dedupe across alternate names and addresses.
- Smart pagination: adaptive stop rules based on per-keyword coverage quality.
- Site crawl heuristics: sitemap discovery and targeted link scoring for contact pages.
- Language models for email extraction: context-aware extraction where regex/DOM misses.
- Rate-aware scheduler: dynamic concurrency based on recent throttle signals.
- Optional CSV/Parquet export: direct tabular outputs with schema validation.
Notes & tips
- Blocking: Google may throttle or redirect. Use Apify proxy with appropriate pools and keep concurrency conservative.
- Languages: Parsing handles multiple container patterns and Nordic website-button labels (
nettside,webbplats, etc.). - Email quality: Filters out ESPs/CDNs/analytics/platform domains. Allows common freemail.
- Robots: Enable
respect_robotswhen policy requires it; note some sites block generic crawler paths. - Legal: Always ensure your use complies with local laws and target site terms.
Supported & planned regions
| Region | Status | Details | Link |
|---|---|---|---|
| Nordics | Optimized | Last optimized: 2025-11-13 (NO/SE/DK/FI/IS) | https://apify.com/odaudlegur/scoutr-nordics-google-maps-scraper |
| Western EU | Planned | — | — |
| Eastern EU | Planned | — | — |
| North America | Not started | — | — |
| South America | Not started | — | — |
| East/SE Asia | Not started | — | — |
| Middle East | Not started | — | — |
| Africa | Not started | — | — |
| Oceania | Not started | — | — |
Create an issue if you’d like your country prioritized.
Changelog
-
2025-11-13
- Added keyword bucket expansion (e.g.
"Food & Drinks","Retail & Stores") that expands into multiple concrete queries. - Introduced cached, timeboxed geocoding with TTL cache and hard-fail guard to keep runs alive under provider throttling.
- Added global Google Local rate limiting (shared gate + token bucket) and stricter job-level timeouts around tile processing, geocoding, and dataset writes.
- Prime tile now runs all keywords on the first grid point and ends early if no results are found for any keyword.
- Improved logging around
[push], distance fallback, and geocoding; added a default maps.co API key for smoother fallback behavior. - Added this FAQ explaining non-technical usage and speed vs data tradeoffs.
- Added keyword bucket expansion (e.g.
-
2025-11-11
- Single enriched record per
(name, address, keyword); removed separate “basic then enriched” pushes to avoid duplicates. - Prime tile/keyword processed before the worker pool to ensure early dataset results.
- Added
enable_distance_fallbackto optionally geocode listing addresses for distance estimation. - Added hard timeouts around tile processing and dataset writes; minor logging improvements (
[prime], clearer[push]).
- Single enriched record per
-
2025-11-07
- Address cleanup to remove hours/amenities/inline phones from address text.
- Address enrichment from
/maps/dir//…and/maps/place/…slugs; prefer postcode + city. - Global
(name, address)dedupe across tiles and keywords. - Anti-stall queue with per-keyword caps; improved backoff on throttling.
- Compact debug logs per major step.
-
2025-11-04
- Nordic tuning for language/region and ccTLD-biased fallback search.
- Email filtering improvements and E.164 phone normalization.
Disclaimer & License
This Apify Actor is provided “as is”, without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Please follow local laws, do not use for malicious purposes and do not use my code to spam.
ToS & legality (Reminder): Scraping Google HTML may violate their Terms of Service. Use responsibly, at low rates, with proxies if needed, and comply with local laws and site policies. This Actor avoids official Google APIs and parses public HTML only.
© 2025 SLSH. All rights reserved. Copying or modifying the source code is prohibited.


