Contact Details Scraper avatar
Contact Details Scraper
Try for free

No credit card required

View all Actors
Contact Details Scraper

Contact Details Scraper

vdrmota/contact-info-scraper
Try for free

No credit card required

Free email extractor to extract and download emails, phone numbers, Facebook, Twitter, LinkedIn, and Instagram profiles from any website. Extract contact information at scale from lists of URLs and download the data as Excel, CSV, JSON, HTML, and XML.

User avatar

Scrape email returns an image path

Closed

Happy Hour Hunter (Happy_Hour_Hunter) opened this issue
8 months ago

I've ran into a case where the following inputs:

1{
2  "startUrls": [
3    {
4      "url": "https://www.hotelsouthmelbourne.com/"
5    }
6  ],
7  "maxRequests": 5,
8  "maxRequestsPerStartUrl": 1,
9  "maxDepth": 2,
10  "sameDomain": true,
11  "considerChildFrames": true,
12  "proxyConfig": {
13    "useApifyProxy": true
14  }
15}

results with 3 emails:

1"emails": [
2      "7@300x-100.jpg",
3      "17@300x-100.jpg",
4      "info@hotelsouthmelbourne.com"
5    ],

As you can see the two .jpg images fit the regex pattern for an email! Can this be updated to exclude known extensions for images and maybe fonts too? It used to be common practice to use @ symbol in filename to target different density devices so filtering out known filetypes will ensure a bit more robustness of this otherwise great scraper! Thanks,, Dav

User avatar

Hi,

The issue has been fixed in the latest build, we added a check for images and font extensions.

Developer
Maintained by Apify
Actor metrics
  • 838 monthly users
  • 99.9% runs succeeded
  • 31 days response time
  • Created in May 2019
  • Modified 4 days ago