Email ✉️  & Phone ☎️ Extractor avatar
Email ✉️ & Phone ☎️ Extractor
Try for free

7 days trial then $30.00/month - No credit card required now

View all Actors
Email ✉️  & Phone ☎️ Extractor

Email ✉️ & Phone ☎️ Extractor

anchor/email-phone-extractor
Try for free

7 days trial then $30.00/month - No credit card required now

Extract emails, phone numbers, and other useful contact information (twitter, linkedIn...) from any list of websites you provide. Best tool for contact lead generation. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.

BO

Not extracting all data?

Closed

bomp opened this issue
3 months ago

Hi I'm testing your scraper, but I find the way to make it work. Example: to start I would like to scrape all contact information from all companies within this site https://www.mwcbarcelona.com/exhibitors/ I have tested it but it is not extracting all data. See these examples: https://www.mwcbarcelona.com/exhibitors/28808-alat { "depth": 1, "url": "https://www.mwcbarcelona.com/exhibitors/28808-alat", "domain": "mwcbarcelona.com", "emails": [], "phones": [], "phonesUncertain": [], "linkedIns": [], "twitters": [ "https://twitter.com/mwchub" ], "instagrams": [ "https://www.instagram.com/gsmaonline/", "https://www.instagram.com/mwcseries/" ], "facebooks": [ "https://www.facebook.com/alatTechnologies/", "https://www.facebook.com/mobileworldcongress/", "https://www.facebook.com/share.php" ], "youtubes": [ null ] },

Or this one

{ "depth": 1, "url": "https://www.mwcbarcelona.com/exhibitors/26905-kyndryl", "domain": "mwcbarcelona.com", "emails": [], "phones": [], "phonesUncertain": [], "linkedIns": [], "twitters": [ "https://twitter.com/mwchub" ], "instagrams": [ "https://www.instagram.com/gsmaonline/", "https://www.instagram.com/mwcseries/" ], "facebooks": [ "https://www.facebook.com/mobileworldcongress/", "https://www.facebook.com/share.php" ], "youtubes": [ "https://youtu.be/5JDPyJ3WKqY", "https://youtu.be/IVjhbOM31PI", null ] },

In both cases it it not extracting website domain, linkedin and twitter

what im doing wrong?

anchor avatar

guillim (anchor)

3 months ago

Hello 👋

Thanks for your message. And definitely congrats because you catcher a bug for twitter ! I forgot to actualise the x.com links since Elon changed the domain… that’s why ! I will push a fix in the coming days.

However, I am it sur I understand your second point about « domain » because from the example you quote, I see domain : mwcbarcelona.com http://mwcbarcelona.com Which sounds good to me. Can you tell me more ?

BO

bomp

3 months ago

let me give you some examples In this case: https://www.mwcbarcelona.com/exhibitors/26905-kyndryl The social links and web sites are: https://x.com/Kyndryl https://www.linkedin.com/company/kyndryl/mycompany/ http://www.kyndryl.com/

On this other case: https://www.mwcbarcelona.com/exhibitors/28808-alat https://x.com/alat_tech https://www.linkedin.com/company/alat-technologies https://www.facebook.com/alatTechnologies/ https://www.alat.com/

I have two questions:

  1. Can you make the actor to extract those links from the page?
  2. is there any way to restrict / limit the pages that are crawled using regex, so that e.g: in the case of this start url: https://www.mwcbarcelona.com/exhibitors/ the actor only scrapes the pages with the links that follow this format: https://www.mwcbarcelona.com/exhibitors/[^\s]+

Here are some example matches with this regex:

https://www.mwcbarcelona.com/exhibitors/26881-zeroerror-ai https://www.mwcbarcelona.com/exhibitors/26253-6wind https://www.mwcbarcelona.com/exhibitors/28761-a-champs-interactive-training-solutions

Thanks

anchor avatar

guillim (anchor)

2 months ago

Hello, 1 - You were right for twitter and Linkedin links that were missing. I pushed a fix today, and you should be able to see the results now. About the "domain" however, It's out of the scope of this actor due to the amount of external links website usually have. Is that something you really need to get ? If so, I would like more info on why this is necessary so that i can find a way. 2 - Yes, you can restrict the urls using the "Pseudo urls" property doing exaclty what you request (see INPUT documentation)

Developer
Maintained by Community
Actor metrics
  • 260 monthly users
  • 25 stars
  • 97.3% runs succeeded
  • 17 hours response time
  • Created in Oct 2021
  • Modified 8 days ago