Email ✉️ & Phone ☎️ Extractor
7 days trial then $30.00/month - No credit card required now
Email ✉️ & Phone ☎️ Extractor
7 days trial then $30.00/month - No credit card required now
Extract emails, phone numbers, and other contact information like Twitter, LinkedIn, Instagram... from websites you provide. Best for lead generation and data enrichment. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.
Did not pick up email, phone or address
https://mosmannaustralia.com/pages/customer-service-help-page
Hello, Thanks so much for using my actor. And thank you also for your feedback, it's very valuable to me. I could run on my side with the same Start URL and it found 4 email, and some phones. So it's working, and the problem you have is a bit different. I looked at your logs, and found out that you need more memory. It's very easy to fix : in the INPUT settings, open the "Run options" and add more memory (1024 is the minimum to make sure it works) see the images attached for more details. Hope it helps.
I increased it as you mentioned. but it seemed to be doing many many requests, 85 and I aborted it. ( it was also expensive due to these many requests )
How do I get the actor to only search the Home page and Contact us page
It does not make any sense it just crawls through tons of pages hoping to find contact details
For example it searched this page
https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear
Would it not make sense the actor searches the home page, if none found it then looks at links like Contact us
I agree. Gimme 10min and I will answer your question
I think it's a simple fix to do : in the INPUT settings, please put 1 as the maximum pages per start URL. This should do exaclty as you want. You can also select "stay within domain" but it's more for when you need more than 1 per start URL. Let me know if its works
No it did not work unfortunately
https://console.apify.com/actors/bxrabKhLv1c3fLmoj/runs/ngBV1HMYik6stXQei#output
Can you try "maximum link depth" to 0 ?
Nope, got zero results, any chance you could test this, this is costing me money every time I run it. You have the url :)
I did run on my side, and it works fine. it's what I was saying in my first message ! Here is what I have as results (see pictures). I also screenshotted you my INPUT for you to see what I put exactly. I hope it helps
Are you putting in the url of the contact page ? we added the home page. So just to clarify, we have 1000's of home page urls, we want to scan, we are asking if there is no contact details on the home page, the actor then scans pages for example with the word "contact" in it.
two options for the behaviour of the Actor :
- you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there
- you allow the crawler to click on links and see by itself. However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps
- Last option, you put directly the contact pages instead of the homepages
"Last option, you put directly the contact pages instead of the homepages"
Not an option, we have 1000's of home page urls, don't think other users would have the actual contact page url either. Using scrapers like you would generally have the home page.
"you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there"
This will only pull contact details if they are on the home page, many many websites do not contain the contact information on their home page but another page.
"you allow the crawler to click on links and see by itself. "
This makes no sense, you could be crawling 100's of pages that will never have contact information like the url I gave earlier.
https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear
However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps.
I am not sure I understand your last comment.
Surely it would make sense that you can scan the home page and also scan a page that contains the word Contact, I don't see how this is complicated at all, and makes a lot of sense to me. Your actor is made to scrape contact details right ?
as per the below example, and many many websites do not have their contact details on the home page, but a separate page like the below one.... [trimmed]
I found out an easy way. In the settings, you will add a pseudoURL : it’s a regex rule that every link must follow otherwise it’s discarded.
In our case, something like
.*(contact|help|service).*
Should do the trick. Feel free to test your own regex using regexr.com for instance.
Let me know if it solves your inquiry
For your info, I tested it and it was working 🙂
Apologies did not seem to work for me
https://console.apify.com/actors/runs/Q6LBcfKnG48TqItY6#output
Yeah it's because you forgot to change
- the "maxDepth" to something like 2 (so that the Actor does not stop at the Start URL)
- the "maxRequests" to something like 100 (so that the Actor can click on a few links without being blocked by this limit)
- the "maxRequestsPerStartUrl" to something like 3 (so that the Actor can click on at least one link, if you want it to click on the contact page for instance)
I see that you decided to set "onlyOneEmailPerDomain": true Note that once the Actor finds at least 1 email, he will skip any other URL with the same domain. So if you want emails of the same website, but from different pages, you would better change this to false
Getting closer thanks :)
Phone number is wrong though, it does not seem to like spaces.
Any chance you can also scrape address's ?
I fixed it. You should see Australian numbers (at least the one from your website)
About the postal address I am sorry but that is technically impossible, due to the variety of typings and formats.
Hope you finally get something working out !!!
Let me know if I can close this issue after that.
Closing but feel free to reopen if necessary
Actor Metrics
190 monthly users
-
50 stars
>99% runs succeeded
3.3 hours response time
Created in Oct 2021
Modified 2 months ago