PeachParser (Beta)
Pricing
Pay per event
PeachParser (Beta)
Crawl arbitrary websites, checks which are alive, and crawls them for emails and social links. Filters common telemetry and template junk.
Pricing
Pay per event
Rating
0.0
(0)
Developer

SLASH
Actor stats
3
Bookmarked
3
Total users
2
Monthly active users
3 days ago
Last modified
Categories
Share
PeachParser
PeachParser is an Apify Actor developed by SLASH for crawling websites to extract:
- Emails (from visible content and
mailto:links) - Social profiles (Facebook, Instagram, LinkedIn, X, YouTube, TikTok, Pinterest)
- Optional listing items from directory-like pages (for example, lists of choirs, restaurants, or organizations)
It is optimized for small to medium websites where you want both contact details and, optionally, all items listed on a directory page (such as https://www.sverigeskorforbund.se/korer).
Key Features
-
Site-level contact extraction
- Extracts emails from:
mailto:links- Visible page text (up to a configurable limit)
- Filters:
- Removes tracking / telemetry addresses
- Blocks placeholder / bogus domains (e.g.
mysite.com,*.wixpress.com) - Accepts emails that match the website’s domain or come from generic providers (Gmail, Outlook, etc.)
- Extracts emails from:
-
Social profile detection
- Detects social links from:
- JSON-LD (
sameAsarrays) - Regular anchor tags (
<a href="...">)
- JSON-LD (
- Supports:
- X (Twitter)
- YouTube
- TikTok
- Applies a brand token (derived from the hostname) to avoid irrelevant social links from third-party widgets when possible.
- Detects social links from:
-
Optional listing extraction
- When
extract_listingsis enabled, PeachParser tries to identify listing items on pages:- Looks for same-domain
<a>links with meaningful text - Skips obviously generic link text such as “read more”, “les mer”, “more info”, etc.
- Avoids file downloads and non-HTML resources
- Looks for same-domain
- Each listing item is stored as a separate dataset record with:
record_type = "listing_item"item_nameitem_urlitem_source_page
- When
-
Smart crawling
- Restricts crawling to a single domain (supports
www.and bare domain equivalence) - Skips non-HTML responses and resources with unwanted file extensions (
.pdf, images, archives, etc.) - Prioritizes URLs with contact-related keywords (
kontakt,contact,om-oss,about,personvern, etc.) - Respects
max_pages_per_siteto control workload
- Restricts crawling to a single domain (supports
-
Robots.txt (optional)
- When
respect_robots_txtis enabled, PeachParser:- Fetches and parses
robots.txt(with a short timeout and size limit) - Uses it to decide whether a URL may be crawled
- Fetches and parses
- When
Notes
- Keep
max_pages_per_sitemodest for reliability and to avoid hitting rate limits. - Results depend on site structure and the presence of contact information in public pages.
- Respect terms of service and local laws.
Supported & planned regions
| Region | Status | Details | Link |
|---|---|---|---|
| Nordics | Optimized | Last optimized: 2025-11-11 (NO/SE/DK/FI/IS) | — |
| Western EU | Planned | — | — |
| Eastern EU | Planned | — | — |
| North America | Not started | — | — |
| South America | Not started | — | — |
| East/SE Asia | Not started | — | — |
| Middle East | Not started | — | — |
| Africa | Not started | — | — |
| Oceania | Not started | — | — |
Create an issue if you’d like your country prioritized.
Disclaimer & License
This Apify Actor is provided “as is”, without warranty of any kind — express or implied — including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. Please follow local laws and do not use for malicious purposes.
ToS & legality (Reminder): Great scraping comes with great responsibility. Follow local laws and do not use my code to spam.
© 2025 SLSH. All rights reserved. Copying or modifying the source code is prohibited.