Hitta.se Lead Scraper (Beta) avatar
Hitta.se Lead Scraper (Beta)

Pricing

Pay per event

Go to Apify Store
Hitta.se Lead Scraper (Beta)

Hitta.se Lead Scraper (Beta)

Developed by

SLASH

SLASH

Maintained by Community

Retrieve leads on hitta.se, the easy way. This actor will retrieve the business' name, address, email addresses, phone numbers and social links.

0.0 (0)

Pricing

Pay per event

3

3

2

Last modified

6 days ago

Hitta.se Scraper

Please note: This actor is currently in beta. Minimum cost is enabled until out of beta. You’ll receive an email update if you’re subscribed and the price is adjusted.
(apify-actor-start: $0.00001 / apify-default-dataset-item: $0.00001 until out of beta)

Apify Actor: Hitta.se scraper (listing → detail pages → tiny same-domain contact crawl)

Updates in this version

  • If there is no website or it is banned/unavailable, keep all valid detail-page emails (not only generic).
  • Email slug fallback now unescapes HTML and matches both:
    • <slug>-button-email-<email>
    • button-email-<email> (broader pattern)
  • Unescape attribute values before extracting emails to handle entity-encoded addresses.

Kept (still working as intended)

  • max_results with “Nästa” pagination (25 per page).
  • Website picked from canonical/og:url/JSON-LD first, then scored anchors; bans junk.
  • website_details: one of ok | 404 | unavailable | banned | n/a.
  • Banned: hitta.dixa.help, dixa.help, biluppgifter.se, specific DNB URL.
  • Detail-page email extraction from Hitta UI + mailto + strict regex + slug fallback.
  • Same-domain mini-crawl for emails + social links when website is OK.
  • Swedish address heuristics; phone extraction; categories.

Features

  • Crawls listing pages on Hitta.se, discovers company detail pages, and pushes normalized contact data.
  • Extracts from detail pages:
    • name
    • categories (best-effort)
    • phone
    • address (Swedish heuristics and JSON-LD)
    • website (structured first, then scored anchors with bans)
    • email1..N (strict parsing, de-duplicated)
    • website_details (status as above)
    • Socials if available: social_facebook, social_instagram, social_linkedin, social_x, social_youtube, social_tiktok, social_pinterest
  • If website is OK, performs a tiny same-domain crawl (configurable) to discover more emails and socials.

Input Configuration

Example input JSON:

{
"start_urls": [
{ "url": "https://www.hitta.se/nacka/företag/2" }
],
"max_depth": 3,
"headers": {
"User-Agent": "Mozilla/5.0 ...",
"Accept-Language": "sv-SE,sv;q=0.9,en-US;q=0.8,en;q=0.7"
},
"timeout_seconds": 30,
"site_email_max_pages": 3,
"max_results": 0
}
FieldTypeDefaultDescription
start_urlsarray[{ "url": "https://www.hitta.se/nacka/företag/2"}]One or more Hitta listing URLs.
max_depthinteger3Crawl depth from the start URLs. Listing depth is reused for pagination.
headersobjectSee code defaultsHTTP headers for requests.
timeout_secondsinteger30Read timeout for HTTP requests.
site_email_max_pagesinteger3Max pages to crawl on the same domain as the extracted website for extra contacts.
max_resultsinteger0 (no cap)Limit how many detail results to push. Pagination respects 25/page with “Nästa.”

How It Works

  1. Listing pages: extracts detail links using /verksamhet/ anchors. Follows “Nästa” to paginate.

  2. Detail pages: extracts:

    • Website from canonical, og:url, and JSON-LD; otherwise scores external anchors and bans junk domains.

    • Address from JSON-LD, microdata, and Swedish heuristics (street tokens + postcode check).

    • Phone from tel: links or strict regex for Swedish formats.

    • Emails from:

      • Hitta UI attributes (unescaped)
      • mailto: links
      • Strict regex on visible text
      • Slug-pattern fallback: <slug>-button-email-<email> and button-email-<email>
  3. Website status: HEAD/GET to classify website_details as ok, 404, unavailable, banned, or n/a.

  4. Same-domain mini-crawl (if website OK): fetch up to site_email_max_pages pages for more emails and socials.

  5. Email policy:

    • If no website or website is banned/unavailable, keep all valid detail-page emails.
    • If website is OK, keep emails that match the website base-domain or are generic providers (Gmail, Outlook, etc.).

Example Output

{
"source_url": "https://www.hitta.se/foeretag/exempel-ab/123456",
"name": "Exempel AB",
"categories": "Bygg, Renovering",
"phone": "08 123 45 67",
"address": "Exempelgatan 10, 123 45 Stockholm",
"website": "https://www.exempel.se",
"email1": "info@exempel.se",
"email2": "support@exempel.se",
"website_details": "ok",
"social_facebook": "https://www.facebook.com/exempel-ab",
"social_instagram": "https://www.linkedin.com/exempel-ab",
"social_linkedin": "https://www.linkedin.com/company/exempel-ab",
"social_x": "https://www.x.com/exempel-ab",
"social_youtube": "https://www.youtube.com/exempel-ab",
"social_tiktok": "https://www.tiktok.com/exempel-ab",
"social_pinterest": "https://www.pinterest.com/exempel-ab"
}

If no valid emails are found, the actor emits "email1": "n/a".


Notes

  • Pagination: respects Hitta’s “Nästa” flow, ~25 results per page.
  • Bans: hitta.dixa.help, dixa.help, biluppgifter.se, and a specific DNB marketing URL are excluded.
  • Email hygiene: strict regex, HTML-unescape, de-duplication, and tracking-pattern filtering.
  • Address quality: prefers JSON-LD PostalAddress, then microdata, then heuristics requiring Swedish postcode + street token.

Disclaimer & License

This Apify Actor is provided "as is", without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Use it, modify it, break it, or improve it — but you do so at your own risk.

© 2025 SLSH. All rights reserved. Copying or modifying the source code is prohibited.