Email ✉️  & Phone ☎️ Extractor avatar

Email ✉️ & Phone ☎️ Extractor

Try for free

7 days trial then $30.00/month - No credit card required now

Go to Store
Email ✉️  & Phone ☎️ Extractor

Email ✉️ & Phone ☎️ Extractor

anchor/email-phone-extractor
Try for free

7 days trial then $30.00/month - No credit card required now

Extract emails, phone numbers, and other contact information like Twitter, LinkedIn, Instagram... from websites you provide. Best for lead generation and data enrichment. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.

SP

Does not really work

Closed

spr123 opened this issue
3 months ago
anchor avatar

Anchor (anchor)

3 months ago

Hello, Thanks so much for using my actor. And thank you also for your feedback, it's very valuable to me. I could run on my side with the same Start URL and it found 4 email, and some phones. So it's working, and the problem you have is a bit different. I looked at your logs, and found out that you need more memory. It's very easy to fix : in the INPUT settings, open the "Run options" and add more memory (1024 is the minimum to make sure it works) see the images attached for more details. Hope it helps.

SP

spr123

3 months ago

I increased it as you mentioned. but it seemed to be doing many many requests, 85 and I aborted it. ( it was also expensive due to these many requests )

How do I get the actor to only search the Home page and Contact us page

It does not make any sense it just crawls through tons of pages hoping to find contact details

For example it searched this page

https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear

Would it not make sense the actor searches the home page, if none found it then looks at links like Contact us

anchor avatar

Anchor (anchor)

3 months ago

I agree. Gimme 10min and I will answer your question

anchor avatar

Anchor (anchor)

3 months ago

I think it's a simple fix to do : in the INPUT settings, please put 1 as the maximum pages per start URL. This should do exaclty as you want. You can also select "stay within domain" but it's more for when you need more than 1 per start URL. Let me know if its works

anchor avatar

Anchor (anchor)

3 months ago

Can you try "maximum link depth" to 0 ?

SP

spr123

3 months ago

Nope, got zero results, any chance you could test this, this is costing me money every time I run it. You have the url :)

anchor avatar

Anchor (anchor)

3 months ago

I did run on my side, and it works fine. it's what I was saying in my first message ! Here is what I have as results (see pictures). I also screenshotted you my INPUT for you to see what I put exactly. I hope it helps

SP

spr123

3 months ago

Are you putting in the url of the contact page ? we added the home page. So just to clarify, we have 1000's of home page urls, we want to scan, we are asking if there is no contact details on the home page, the actor then scans pages for example with the word "contact" in it.

anchor avatar

Anchor (anchor)

3 months ago

two options for the behaviour of the Actor :

  • you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there
  • you allow the crawler to click on links and see by itself. However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps
anchor avatar

Anchor (anchor)

3 months ago
  • Last option, you put directly the contact pages instead of the homepages
SP

spr123

3 months ago

"Last option, you put directly the contact pages instead of the homepages"

Not an option, we have 1000's of home page urls, don't think other users would have the actual contact page url either. Using scrapers like you would generally have the home page.

"you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there"

This will only pull contact details if they are on the home page, many many websites do not contain the contact information on their home page but another page.

"you allow the crawler to click on links and see by itself. "

This makes no sense, you could be crawling 100's of pages that will never have contact information like the url I gave earlier.

https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear

However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps.

I am not sure I understand your last comment.

Surely it would make sense that you can scan the home page and also scan a page that contains the word Contact, I don't see how this is complicated at all, and makes a lot of sense to me. Your actor is made to scrape contact details right ?

as per the below example, and many many websites do not have their contact details on the home page, but a separate page like the below one.... [trimmed]

anchor avatar

Anchor (anchor)

3 months ago

I found out an easy way. In the settings, you will add a pseudoURL : it’s a regex rule that every link must follow otherwise it’s discarded.

In our case, something like

.*(contact|help|service).*

Should do the trick. Feel free to test your own regex using regexr.com for instance.

Let me know if it solves your inquiry

anchor avatar

Anchor (anchor)

3 months ago

For your info, I tested it and it was working 🙂

SP

spr123

3 months ago
anchor avatar

Anchor (anchor)

3 months ago

Yeah it's because you forgot to change

  • the "maxDepth" to something like 2 (so that the Actor does not stop at the Start URL)
  • the "maxRequests" to something like 100 (so that the Actor can click on a few links without being blocked by this limit)
  • the "maxRequestsPerStartUrl" to something like 3 (so that the Actor can click on at least one link, if you want it to click on the contact page for instance)
anchor avatar

Anchor (anchor)

3 months ago

I see that you decided to set "onlyOneEmailPerDomain": true Note that once the Actor finds at least 1 email, he will skip any other URL with the same domain. So if you want emails of the same website, but from different pages, you would better change this to false

SP

spr123

3 months ago

Getting closer thanks :)

Phone number is wrong though, it does not seem to like spaces.

Any chance you can also scrape address's ?

anchor avatar

Anchor (anchor)

3 months ago

I fixed it. You should see Australian numbers (at least the one from your website)

About the postal address I am sorry but that is technically impossible, due to the variety of typings and formats.

Hope you finally get something working out !!!

Let me know if I can close this issue after that.

anchor avatar

Anchor (anchor)

3 months ago

Closing but feel free to reopen if necessary

Developer
Maintained by Community

Actor Metrics

  • 190 monthly users

  • 50 stars

  • >99% runs succeeded

  • 3.3 hours response time

  • Created in Oct 2021

  • Modified 2 months ago