Email ✉️  & Phone ☎️ Extractor avatar
Email ✉️ & Phone ☎️ Extractor
Try for free

7 days trial then $30.00/month - No credit card required now

View all Actors
Email ✉️  & Phone ☎️ Extractor

Email ✉️ & Phone ☎️ Extractor

anchor/email-phone-extractor
Try for free

7 days trial then $30.00/month - No credit card required now

Extract emails, phone numbers, and other useful contact information from any list of websites you provide. Lets you specify URL patterns for your bot to follow. Export your data in structured formats and dominate your outreach game

User avatar

Dont 100% understand this 3 settings

Closed

Michall2117 opened this issue
19 days ago

what is the difference of this 3: Maximum pages per start URL Maximum link depth Total maximum pages

User avatar

Michall2117

19 days ago

see image

User avatar

guillim (anchor)

19 days ago

Hello and thanks for your message

I will update the documentation that it is better from now on

Here is what I can tell you about the three settings meanwhile:

Maximum pages per start URL: This setting determines the maximum number of pages the crawler will fetch for each initial URL it encounters. For example, if this setting is set to 100, the crawler will only fetch up to 100 pages from each unique starting URL it encounters during its crawling process.

Maximum link depth: This setting specifies the maximum number of links away from the initial URL that the crawler will follow during its traversal. In other words, it limits how many levels deep the crawler will explore from the starting URL. For instance, if the maximum link depth is set to 3, the crawler will only follow links up to three levels away from the initial URL. Basically links that are not directly available in the URL you specify have less chances to be crawled.

Total maximum pages: This setting sets the overall limit during its crawling process. Once this limit is reached, the crawler will stop fetching additional pages, regardless of how many pages it has fetched from each individual starting URL or the depth of the links it has followed.

I hope it helps

User avatar

Michall2117-owner

17 days ago

##- Indtast dit svar oven for denne linje -##

Din forespørgsel (166689) er blevet besvaret - se nedenfor. Du kan tilføjer yderligere kommentarer ved at besvare denne e-mail.


Michael R, 4. apr. 2024 15.25 CEST

Hi and thanks

so as I understand you then:

Maximum pages per start URL: will "always" stay on the domaín - eks: www.domain1. /home - www.domain1. /contact - www.domain1. /product ect..... ??

Maximum link depth: will "count" all other domains - eks. starting url: www.domain1. /home - www.domain2. /home - www.domain3. /home ect..... ??

Total maximum pages: will count all the sites that are wisit: eks. www.domain1. /home + www.domain1. /contact + www.domain1. /product + www.domain2. /home + www.domain3. /home = 5 sits - and if my list is 100 domaisn and i only set this on 5 then it will stop crawling after the first 2 domains ??

Ha' en dejlig dag!

Med venlig hilsen

Michael Roger

Telefon: +45 7060 3553

Save NoR ApS / CVR: 10101913

Just-Half-Price dk / Travel-Deal dk / My-Price dk / Deal-Koeb dk


guillim, 2. apr. 2024 09.14 CEST

Dont 100% understand this 3 settings (anchor/email-phone-extractor)

Hello and thanks for your message

I will update the documentation that it is better from now on

Here is what I can tell you about the three settings meanwhile:

Maximum pages per start URL: This setting determines the maximum number of

pages the crawler will fetch for each initial URL it encounters. For

example, if this setting is set to 100, the crawler will only fetch up to

100 pages from each unique starting URL it encounters during its crawling

process.

Maximum link depth: This setting specifies the maximum number of links away

from the initial URL that the crawler will follow during its traversal. In

other words, it limits how many levels deep the crawler will explore from

the starting URL. For instance, if the maximum link depth is set to 3, the

crawler will only follow links up to three levels away from the initial

URL. Basically links that are not directly available in the URL you specify

have less chances to be crawled.

Total maximum pages: This setting sets the overall limit during its

crawling process. Once this limit is reached, the crawler will stop

fetching additional pages, regardless of how many pages it has fetched from

each individual starting URL or the depth of the links it has followed.

I hope it helps

You are receiving this email because you are subscribed to the issue.

Reply to this email directly, or view the issue on Apify

[https://console.apify.com/actors/bxrabKhLv1c3fLmoj/issues/fehFxTcohSONb8Pqh].]

User avatar

guillim (anchor)

16 days ago

Almost :)

"Maximum pages per start URL: will "always" stay on the domaín - eks: www.domain1. /home - www.domain1. /contact - www.domain1. /product ect..... ??"

---> When you talk about staying on the domain, this is related to another parameter called "Stay within domain". If you don't check this box, then if domain1 has a link to domain2, your crawler will follow this link. And it will count into the "maximum pages per start url". To avoid this behaviour, you need to check the box, or specify some pseudoUrl.

"Maximum link depth: will "count" all other domains - eks. starting url: www.domain1. /home - www.domain2. /home - www.domain3. /home ect..... ??"

---> yes

"Total maximum pages: will count all the sites that are wisit: eks. www.domain1. /home + www.domain1. /contact + www.domain1. /product + www.domain2. /home + www.domain3. /home = 5 sits - and if my list is 100 domaisn and i only set this on 5 then it will stop crawling after the first 2 domains ??"

---> you almost got it right. In your case, you have listed 5 pages. So if your pages limit is 5, domain1, domain2, domain3 will get crawled. But if I add another one, it will not :

1.www.domain1. /home -> crawled
2.www.domain1. /contact -> crawled
3.www.domain1. /product -> crawled
4.www.domain2. /home -> crawled
5.www.domain3. /home -> crawled
6.www.domain4. /home -> not crawled

User avatar

closing this as stalling issue. feel free to reopen

Developer
Maintained by Community
Actor metrics
  • 218 monthly users
  • 99.6% runs succeeded
  • 1.2 days response time
  • Created in Oct 2021
  • Modified about 1 month ago