SERPKey Nordics avatar
SERPKey Nordics
Under maintenance

Pricing

from $1.00 / 1,000 results

Go to Apify Store
SERPKey Nordics

SERPKey Nordics

Under maintenance

Retrieve latest contact information (email/phone/address) by submitting a list of business names. Optimized for Nordic Countries.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

SLASH

SLASH

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

SERPKey

An Apify Actor that:

  1. Crawls websites to extract contact information from any URL you provide.
  2. Collects emails, phone numbers, and social media profiles from visible text, meta tags, JSON-LD, and obfuscated sources.
  3. Normalizes phone numbers to E.164 format for Scandinavian countries (Norway, Sweden, Denmark, Finland, Iceland).
  4. Filters junk emails from platforms, CDNs, analytics, and ESPs while prioritizing domain-matched emails.
  5. Detects social media links across Facebook, Instagram, LinkedIn, X/Twitter, YouTube, TikTok, and Pinterest.
  6. Optionally extracts listing items from directory-like pages (e.g., member lists, restaurant directories).
  7. Crawls same-domain pages with smart prioritization for contact/about pages.

Optimized for Scandinavian countries: Norway, Sweden, Denmark, Finland, Iceland. Phone patterns, language keywords, and email providers are tuned for Nordic websites.

Privacy & legality: Always respect website terms of service and robots.txt directives. Use responsibly and comply with local data protection laws (GDPR, etc.). This Actor crawls public HTML only.


What's new (2026-01-15)

Quality & correctness

  • Multi-source contact extraction: collects emails from mailto links, visible text, JSON-LD schemas, meta tags, and JavaScript obfuscation patterns.
  • Scandinavian phone optimization: automatically detects and normalizes Norwegian (+47), Swedish (+46), Danish (+45), Finnish (+358), and Icelandic (+354) phone numbers.
  • Smart social media detection: finds social profiles using JSON-LD sameAs, meta tags, anchor attributes, and relaxed brand filtering to catch more profiles.
  • Email quality filtering: removes analytics/CDN/ESP domains, bans wixpress.com and placeholder domains, prioritizes domain-matched emails.
  • Listing extraction: optionally detects and extracts same-domain links with meaningful text for directory-style pages.

Stability & performance

  • Cached URL parsing: uses LRU cache to avoid repeated URL parsing, improving performance on multi-page crawls.
  • Robots.txt respect (optional): parses and respects robots.txt when enabled.
  • Smart page prioritization: prioritizes /kontakt, /contact, /om-oss, /about pages using Scandinavian keywords.
  • Timeout guards: configurable HTTP timeouts with sensible defaults.
  • Deduplication: emails and phones are deduplicated per site.
  • Compact logging: clear debug logs at each crawl stage for transparency.

Input

{
"start_urls": [
{"url": "https://goderom.no"}
],
"max_pages_per_site": 6,
"timeout_seconds": 30,
"respect_robots_txt": false,
"extract_listings": false,
"strict_brand_filter": false
}

Parameters

KeyTypeDefaultDescription
start_urlsarray[object]List of URLs to crawl. Each item should be an object with a url field.
max_pages_per_siteinteger6Maximum number of pages to crawl per root URL on the same domain.
timeout_secondsinteger30HTTP read timeout in seconds for requests.
respect_robots_txtbooleanfalseIf enabled, SERPKey will respect robots.txt when deciding what pages to crawl.
extract_listingsbooleanfalseIf enabled, extracts listing items from directory-like pages (e.g., member lists, restaurant directories).
strict_brand_filterbooleanfalseIf enabled, only accept social media URLs that contain the brand/domain name. Disabled by default for coverage.

Output

Each dataset item is a JSON object like:

{
"record_type": "site",
"root_url": "https://goderom.no",
"website_details": "ok",
"pages_crawled": 6,
"email1": "gro@goderom.no",
"email2": "post@goderom.no",
"email3": "n/a",
"email4": "n/a",
"email5": "n/a",
"phone1": "+4722334455",
"phone2": "+4799887766",
"phone3": "n/a",
"social_facebook": "https://www.facebook.com/goderom",
"social_instagram": "https://www.instagram.com/goderom",
"social_linkedin": "https://www.linkedin.com/company/goderom",
"social_x": "n/a",
"social_youtube": "n/a",
"social_tiktok": "n/a",
"social_pinterest": "n/a"
}

When extract_listings is enabled, additional records with record_type: "listing_item" are included:

{
"record_type": "listing_item",
"root_url": "https://example.no",
"item_name": "Medlem: Oslo Blomster AS",
"item_url": "https://example.no/members/oslo-blomster",
"item_source_page": "https://example.no/members"
}

Field notes

  • record_type: Either "site" (main contact record) or "listing_item" (extracted directory entry).
  • website_details: "ok" if site is accessible, "unavailable" if unreachable.
  • pages_crawled: Number of pages successfully crawled for this site.
  • email1-5: Up to 5 unique emails found, prioritizing domain-matched emails. Shows "n/a" when fewer than 5 emails exist.
  • phone1-3: Up to 3 unique phone numbers, normalized to E.164 format (e.g., +47 XX XX XX XX). Shows "n/a" when fewer than 3 phones exist.
  • social_*: Social media URLs, cleaned of tracking parameters. Shows "n/a" when not found.
  • Email quality: Filters out ESPs, CDNs, analytics domains, and banned domains (wixpress.com, mysite.com, example.com).
  • Phone normalization: Automatically adds country codes to local numbers based on domain TLD context.

How it works

  1. URL normalization: Ensures all URLs have proper schemes (defaults to https://).
  2. Site availability check: Tests if the root URL is accessible before crawling.
  3. Page crawling: Crawls up to max_pages_per_site pages on the same registered domain.
  4. Smart prioritization: Scores links by priority keywords:
    • Norwegian: kontakt, om-oss, personvern, finn-oss
    • Swedish: kontakta, om-oss, integritet, dataskydd
    • Danish: kontakt, om-os, privatlivspolitik
    • Finnish: yhteystiedot, meistä, tietosuoja
    • Icelandic: hafa-samband, um-okkur
    • English: contact, about, privacy
  5. Multi-source email extraction:
    • mailto: links
    • Visible page text with regex patterns
    • JSON-LD schemas (sameAs, organization data)
    • Meta tags
    • Cloudflare data-cfemail decoding
    • JavaScript string patterns
  6. Phone number extraction:
    • tel: links
    • Visible text with Scandinavian phone patterns
    • Meta tags with telephone/phone properties
    • Context-aware normalization (adds +47 to 8-digit Norwegian numbers, etc.)
  7. Social media detection:
    • JSON-LD sameAs arrays
    • Meta property tags (og:url, etc.)
    • Anchor tags with social indicators (classes, aria-labels)
    • Link rel tags
    • Expanded domain matching (fb.com, instagr.am, youtu.be, etc.)
  8. Listing extraction (optional):
    • Identifies same-domain links with meaningful text
    • Filters out generic link text ("read more", "les mer", etc.)
    • Excludes links to files (PDFs, images, documents)
  9. Robots.txt respect (optional): Parses and follows robots.txt disallow rules when enabled.
  10. Deduplication: Emails and phones are deduplicated per site; social links are unique per platform.

Debug logging

The actor writes clear, actionable logs:

  • Site availability: Whether each root URL is accessible
  • Crawl progress: Page-by-page crawl status with URLs
  • Social detection: When social profiles are found and via which method
  • Contact extraction: Summary of emails and phones found
  • Brand token: Computed brand identifier for social filtering
  • Errors: Clear error messages with exception types

Performance tips

  • Set max_pages_per_site lower (3-4) for faster runs on large sites.
  • Use respect_robots_txt: false for maximum coverage (but respect site policies).
  • Enable extract_listings: true only when scraping directory-style sites.
  • Increase timeout_seconds for slow-loading sites (e.g., heavy JavaScript).
  • For bulk processing, batch URLs in groups of 10-20 per run.

FAQ

I'm not technical. How am I supposed to use this?

In simple terms:

  1. Add website URLs In start_urls, add the websites you want to extract contacts from:

    {
    "start_urls": [
    {"url": "https://example.no"},
    {"url": "https://another-site.se"}
    ]
    }
  2. Decide how deep to crawl Set max_pages_per_site to control how many pages to check per website:

    • 3-4 pages: Fast, good for simple sites
    • 6-8 pages: Balanced, good for most sites
    • 10+ pages: Thorough, for complex sites with many pages
  3. Run the actor and wait The actor will:

    • Visit each website
    • Crawl multiple pages looking for contact info
    • Extract and normalize emails, phones, and social profiles
  4. Download your dataset When done, export as CSV, JSON, or Excel from the Apify dataset view.


What kind of websites work best with SERPKey?

SERPKey works best with:

  • Business websites (companies, consultants, agencies)
  • Professional services (lawyers, accountants, architects)
  • Local businesses (restaurants, shops, salons)
  • Organizations (associations, non-profits, clubs)
  • Directory pages (member lists, partner pages)

It's optimized for Scandinavian websites but works globally. Scandinavian phone numbers get special handling for accurate normalization.


Why are some social media profiles marked as "n/a"?

Three main reasons:

  1. The website doesn't have that social platform Not every business uses all platforms. Many skip TikTok, Pinterest, or X/Twitter.

  2. Social links are JavaScript-rendered Some websites load social icons via JavaScript that BeautifulSoup can't see. These require browser-based scraping (not supported yet).

  3. Strict brand filtering enabled With strict_brand_filter: true, only social URLs containing the brand name are accepted. Try setting it to false for better coverage.

Tip: Social links are most reliably found when they're in:

  • JSON-LD schema markup
  • Meta tags
  • Direct anchor links in HTML
  • Footer/header sections

Why does it find generic emails like "gmail.com" instead of company emails?

SERPKey prioritizes domain-matched emails (e.g., post@goderom.no for goderom.no) but allows generic providers like Gmail, Outlook, Yahoo as fallbacks.

This is by design because:

  1. Many small businesses use personal email addresses
  2. Some freelancers only have Gmail/Outlook
  3. Contact forms may link to generic emails

If you only want domain-matched emails, filter the results afterward:

  • In CSV: filter where email domain matches the website domain
  • In JSON: check if email ends with the domain's base domain

Can it extract contacts from PDF files or images?

No. SERPKey only processes HTML content. It cannot:

  • Read text from images (no OCR)
  • Extract data from PDFs
  • Parse Word/Excel documents

However, if a website has:

  • A "Contact Us" page with visible text → ✅ Works
  • An email in a mailto: link → ✅ Works
  • A phone in <meta> tags → ✅ Works
  • Contact info only in a PDF → ❌ Doesn't work

How do I run it for "maximum data extraction"?

Use these settings for thorough contact extraction:

{
"start_urls": [{"url": "https://example.no"}],
"max_pages_per_site": 10,
"timeout_seconds": 45,
"respect_robots_txt": false,
"extract_listings": false,
"strict_brand_filter": false
}

Explanation:

  • max_pages_per_site: 10: Checks up to 10 pages (contact, about, team, services, etc.)
  • timeout_seconds: 45: Waits longer for slow sites
  • respect_robots_txt: false: Maximum coverage (but respect site policies manually)
  • strict_brand_filter: false: Accepts more social profiles

How do I run it for a "quick scan"?

For fast extraction on simple sites:

{
"start_urls": [{"url": "https://example.no"}],
"max_pages_per_site": 3,
"timeout_seconds": 20,
"respect_robots_txt": false,
"extract_listings": false,
"strict_brand_filter": false
}

This checks homepage, contact page, and about page — usually enough for basic contact info.


What does "strict_brand_filter" do?

When strict_brand_filter: false (default):

  • Accepts any social profile link that looks legitimate
  • Example: For goderom.no, accepts both:
    • facebook.com/goderom (contains brand)
    • facebook.com/goderominterior (close match)
    • facebook.com/some-other-page (if link has social indicators)

When strict_brand_filter: true:

  • Only accepts social URLs containing the exact brand token
  • Example: For goderom.no (brand token: "goderom"):
    • facebook.com/goderom
    • facebook.com/interior-design (doesn't contain "goderom")

Recommendation: Keep it false unless you're getting too many false positives.


Does it work with JavaScript-heavy sites (React, Vue, Angular)?

Partially. SERPKey uses BeautifulSoup which parses the initial HTML response. It cannot:

  • Wait for JavaScript to load
  • Click buttons or interact with the page
  • See content that only appears after JS execution

However, many modern sites server-side render their content or include contact info in:

  • Initial HTML
  • JSON-LD scripts
  • Meta tags

Which do work with SERPKey.

For JavaScript-heavy sites, consider:

  1. Checking if contact info is in the raw HTML (view source)
  2. Using browser-based scrapers if needed
  3. Looking for API endpoints that return JSON

Troubleshooting

No emails found

  • Some sites hide emails behind contact forms or images.
  • Try increasing max_pages_per_site to check more pages.
  • Check if emails are in PDFs (not supported).
  • Verify emails aren't behind login/authentication.

Phone numbers in wrong format

  • SERPKey normalizes to E.164 format (+XX format).
  • Norwegian 8-digit numbers get +47 added automatically.
  • Other countries may need manual formatting if context is unclear.

Social profiles marked "n/a" but they exist

  • Try setting strict_brand_filter: false.
  • Check if social links load via JavaScript (not supported).
  • Verify links are actual profiles, not share buttons.

Website shows "unavailable"

  • The site may be down, blocking scrapers, or behind authentication.
  • Check if site requires cookies/login.
  • Try accessing manually to verify it's online.

"listing_item" records appearing unexpectedly

  • Disable with extract_listings: false.
  • This feature is for directory-style pages only.

Scandinavian optimizations

SERPKey includes special handling for Nordic countries:

Phone numbers

  • Norway: Detects +47 format, adds country code to 8-digit numbers
  • Sweden: Handles +46 with 9-10 digits, removes leading 0
  • Denmark: Detects +45 format with 8 digits
  • Finland: Handles +358 with 9-10 digits
  • Iceland: Detects +354 with 7 digits

Priority keywords

Crawls prioritize pages with these terms:

  • Norwegian: kontakt, om-oss, personvern
  • Swedish: kontakta, integritet, dataskydd
  • Danish: om-os, privatlivspolitik
  • Finnish: yhteystiedot, tietosuoja
  • Icelandic: hafa-samband, persónuvernd

Email providers

Recognizes Scandinavian providers as legitimate:

  • Norwegian: online.no, getmail.no, c2i.net, start.no
  • Swedish: spray.se, passagen.se, bredband.net
  • Danish: jubii.dk, post.tele.dk
  • Finnish: suomi24.fi, luukku.com

Language headers

Sends Accept-Language: no,nb,nn,sv,da,fi,is,en-US to prefer Scandinavian content.


Roadmap / future improvements

  • Browser automation: Playwright/Puppeteer support for JavaScript-rendered sites
  • CAPTCHA handling: Automatic CAPTCHA solving for protected sites
  • PDF parsing: Extract contacts from linked PDF documents
  • Image OCR: Extract text from contact info images
  • Form detection: Identify and optionally fill contact forms
  • Multi-language detection: Automatically detect and handle multiple languages per site
  • Bulk URL import: CSV/Excel upload for large URL lists
  • Custom regex patterns: User-defined patterns for specialized data extraction
  • Webhook notifications: Real-time alerts when specific contacts are found

Changelog

  • 2026-01-15
    • FINALLY out of Beta - Sorry for the long wait, had to be off for a few weeks.
    • Initial release optimized for Scandinavian countries.
    • Multi-source email extraction (mailto, text, JSON-LD, meta tags, obfuscation patterns).
    • Scandinavian phone number normalization with country-specific rules.
    • Smart social media detection with relaxed brand filtering.
    • Optional listing extraction for directory-style pages.
    • Robots.txt respect (optional).
    • URL parsing cache for performance.
    • Priority keyword support for Nordic languages.
    • Email filtering for quality (removes CDN/ESP/platform domains).

Disclaimer & License

This Apify Actor is provided "as is", without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Please follow local laws, respect website terms of service, and do not use this code to spam or violate privacy regulations.

I will find you

Privacy & legality (Reminder): Always respect website robots.txt, terms of service, and applicable data protection laws (GDPR, etc.). Use responsibly and ethically. This Actor parses public HTML only.

© 2026 SLASH. All rights reserved. Copying or modifying the source code is prohibited.