Actor picture

Naked Domains Analyzer

jancurn/analyze-domains

Crawls and downloads web pages running on a list of provided naked domains e.g. "example.com". The actor stores HTML snapshot, screenshot, text body, and HTTP response headers of all the pages. It also extracts email addresses, phones, social handles for Facebook, Twitter, LinkedIn, and Instagram.

No credit card required

Author's avatarJan Čurn
  • Modified
  • Users103
  • Runs2,960
Actor picture

Naked Domains Analyzer

Domains

domains

Optional

string

List of domains to crawl. The domains must be naked, i.e. specified without a protocol and sub-domains (e.g. example.com).

Domains file URL

domainsFileUrl

Optional

string

URL of a text file that contains the list of domains to crawl. The domains must be naked, i.e. specified without a protocol and sub-domains (e.g. example.com). This field is useful if you have a large number of domains.

Domains file offset

domainsFileOffset

Optional

integer

Indicates how many domains from the file should be skipped in the beginning. This is useful if you only want to crawl a portion of the domains.

Domains file count

domainsFileCount

Optional

integer

Indicates how many domains from the file should be crawled, starting from the offset. This is useful if you only want to crawl a portion of the domains. Leave empty to crawl all domains.

Use Chrome

useChrome

Optional

boolean

If checked, the actor uses Chrome instead of Puppeteer's Chromium for the crawling. This might help to prevent blocking of some pages.

Use Apify Proxy

useApifyProxy

Optional

boolean

If checked, the actor uses Apify Proxy to access the target pages. This might help to prevent blocking of some pages.

Max page retries

maxRequestRetries

Optional

integer

Indicates how many times shall the crawler retry to load a page on error.

Crawl domain links

crawlLinkCount

Optional

integer

Indicates how many links from the main page going to the same domain shall also be crawled.

Crawl HTTPS version

crawlHttpsVersion

Optional

boolean

If checked, the actor attempts to crawl HTTPs version of the website (e.g. https://example.com for domain example.com).

Crawl www. sub-domain

crawlWwwSubdomain

Optional

boolean

If checked, the actor attempts to crawl www. sub-domain of the website (e.g. http://www.example.com for domain example.com).

Save screenshots

saveScreenshot

Optional

boolean

If checked, the actor stores screenshots of all loaded pages into the key-value store.

Save HTML content

saveHtml

Optional

boolean

If checked, the actor stores HTML content of all loaded pages into the key-value store.

Save text content

saveText

Optional

boolean

If checked, the actor stores text content of all loaded pages into the dataset results.

Consider child frames

considerChildFrames

Optional

boolean

If checked, the actor searches for social handles even in the content of the first-level child frames. The 'page.text' also contains the combined text of the main frame and direct child frames.