Email ✉️ & Phone ☎️ Extractor
7 days trial then $30.00/month - No credit card required now
Email ✉️ & Phone ☎️ Extractor
7 days trial then $30.00/month - No credit card required now
Extract emails, phone numbers, and other contact information like Twitter, LinkedIn, Instagram... from websites you provide. Best for lead generation and data enrichment. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.
Hi I'm testing your scraper, but I find the way to make it work. Example: to start I would like to scrape all contact information from all companies within this site https://www.mwcbarcelona.com/exhibitors/ I have tested it but it is not extracting all data. See these examples: https://www.mwcbarcelona.com/exhibitors/28808-alat { "depth": 1, "url": "https://www.mwcbarcelona.com/exhibitors/28808-alat", "domain": "mwcbarcelona.com", "emails": [], "phones": [], "phonesUncertain": [], "linkedIns": [], "twitters": [ "https://twitter.com/mwchub" ], "instagrams": [ "https://www.instagram.com/gsmaonline/", "https://www.instagram.com/mwcseries/" ], "facebooks": [ "https://www.facebook.com/alatTechnologies/", "https://www.facebook.com/mobileworldcongress/", "https://www.facebook.com/share.php" ], "youtubes": [ null ] },
Or this one
{ "depth": 1, "url": "https://www.mwcbarcelona.com/exhibitors/26905-kyndryl", "domain": "mwcbarcelona.com", "emails": [], "phones": [], "phonesUncertain": [], "linkedIns": [], "twitters": [ "https://twitter.com/mwchub" ], "instagrams": [ "https://www.instagram.com/gsmaonline/", "https://www.instagram.com/mwcseries/" ], "facebooks": [ "https://www.facebook.com/mobileworldcongress/", "https://www.facebook.com/share.php" ], "youtubes": [ "https://youtu.be/5JDPyJ3WKqY", "https://youtu.be/IVjhbOM31PI", null ] },
In both cases it... [trimmed]
Hello 👋
Thanks for your message. And definitely congrats because you catcher a bug for twitter ! I forgot to actualise the x.com links since Elon changed the domain… that’s why ! I will push a fix in the coming days.
However, I am it sur I understand your second point about « domain » because from the example you quote, I see domain : mwcbarcelona.com http://mwcbarcelona.com Which sounds good to me. Can you tell me more ?
let me give you some examples In this case: https://www.mwcbarcelona.com/exhibitors/26905-kyndryl The social links and web sites are: https://x.com/Kyndryl https://www.linkedin.com/company/kyndryl/mycompany/ http://www.kyndryl.com/
On this other case: https://www.mwcbarcelona.com/exhibitors/28808-alat https://x.com/alat_tech https://www.linkedin.com/company/alat-technologies https://www.facebook.com/alatTechnologies/ https://www.alat.com/
I have two questions:
- Can you make the actor to extract those links from the page?
- is there any way to restrict / limit the pages that are crawled using regex, so that e.g: in the case of this start url: https://www.mwcbarcelona.com/exhibitors/ the actor only scrapes the pages with the links that follow this format: https://www.mwcbarcelona.com/exhibitors/[^\s]+
Here are some example matches with this regex:
https://www.mwcbarcelona.com/exhibitors/26881-zeroerror-ai https://www.mwcbarcelona.com/exhibitors/26253-6wind https://www.mwcbarcelona.com/exhibitors/28761-a-champs-interactive-training-solutions
Thanks
Hello, 1 - You were right for twitter and Linkedin links that were missing. I pushed a fix today, and you should be able to see the results now. About the "domain" however, It's out of the scope of this actor due to the amount of external links website usually have. Is that something you really need to get ? If so, I would like more info on why this is necessary so that i can find a way. 2 - Yes, you can restrict the urls using the "Pseudo urls" property doing exaclty what you request (see INPUT documentation)
Actor Metrics
190 monthly users
-
50 stars
>99% runs succeeded
3.3 hours response time
Created in Oct 2021
Modified 2 months ago