PeachParser (Beta) avatar
PeachParser (Beta)

Pricing

Pay per event

Go to Apify Store
PeachParser (Beta)

PeachParser (Beta)

Crawl arbitrary websites, checks which are alive, and crawls them for emails and social links. Filters common telemetry and template junk.

Pricing

Pay per event

Rating

0.0

(0)

Developer

SLASH

SLASH

Maintained by Community

Actor stats

3

Bookmarked

3

Total users

2

Monthly active users

3 days ago

Last modified

Share

PeachParser

PeachParser is an Apify Actor developed by SLASH for crawling websites to extract:

  • Emails (from visible content and mailto: links)
  • Social profiles (Facebook, Instagram, LinkedIn, X, YouTube, TikTok, Pinterest)
  • Optional listing items from directory-like pages (for example, lists of choirs, restaurants, or organizations)

It is optimized for small to medium websites where you want both contact details and, optionally, all items listed on a directory page (such as https://www.sverigeskorforbund.se/korer).


Key Features

  1. Site-level contact extraction

    • Extracts emails from:
      • mailto: links
      • Visible page text (up to a configurable limit)
    • Filters:
      • Removes tracking / telemetry addresses
      • Blocks placeholder / bogus domains (e.g. mysite.com, *.wixpress.com)
      • Accepts emails that match the website’s domain or come from generic providers (Gmail, Outlook, etc.)
  2. Social profile detection

    • Detects social links from:
      • JSON-LD (sameAs arrays)
      • Regular anchor tags (<a href="...">)
    • Supports:
      • Facebook
      • Instagram
      • LinkedIn
      • X (Twitter)
      • YouTube
      • TikTok
      • Pinterest
    • Applies a brand token (derived from the hostname) to avoid irrelevant social links from third-party widgets when possible.
  3. Optional listing extraction

    • When extract_listings is enabled, PeachParser tries to identify listing items on pages:
      • Looks for same-domain <a> links with meaningful text
      • Skips obviously generic link text such as “read more”, “les mer”, “more info”, etc.
      • Avoids file downloads and non-HTML resources
    • Each listing item is stored as a separate dataset record with:
      • record_type = "listing_item"
      • item_name
      • item_url
      • item_source_page
  4. Smart crawling

    • Restricts crawling to a single domain (supports www. and bare domain equivalence)
    • Skips non-HTML responses and resources with unwanted file extensions (.pdf, images, archives, etc.)
    • Prioritizes URLs with contact-related keywords (kontakt, contact, om-oss, about, personvern, etc.)
    • Respects max_pages_per_site to control workload
  5. Robots.txt (optional)

    • When respect_robots_txt is enabled, PeachParser:
      • Fetches and parses robots.txt (with a short timeout and size limit)
      • Uses it to decide whether a URL may be crawled

Notes

  • Keep max_pages_per_site modest for reliability and to avoid hitting rate limits.
  • Results depend on site structure and the presence of contact information in public pages.
  • Respect terms of service and local laws.

Supported & planned regions

RegionStatusDetailsLink
NordicsOptimizedLast optimized: 2025-11-11 (NO/SE/DK/FI/IS)
Western EUPlanned
Eastern EUPlanned
North AmericaNot started
South AmericaNot started
East/SE AsiaNot started
Middle EastNot started
AfricaNot started
OceaniaNot started

Create an issue if you’d like your country prioritized.


Disclaimer & License

This Apify Actor is provided “as is”, without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Please follow local laws and do not use for malicious purposes.

ToS & legality (Reminder): Great scraping comes with great responsibility. Follow local laws and do not use my code to spam.

I will find you

© 2025 SLSH. All rights reserved. Copying or modifying the source code is prohibited.