Actor picture

Website Checker Runner Puppeteer

lukaskrivka/website-checker-puppeteer

Checks the provided website using Puppeteer. This is a low level runner, most likely you want to use the high level master actor - https://apify.com/lukaskrivka/website-checker

No credit card required

Author's avatarLuk谩拧 K艡ivka
  • Modified
  • Users6
  • Runs24
Actor picture

Website Checker Runner Puppeteer

URLs to check

urlsToCheck

Required

array

A static list of URLs to check for captchas. To be able to add new URLs on the fly, enable the Use request queue option. For details, see Start URLs in README.

Proxy Configuration

proxyConfiguration

Optional

object

Specifies proxy servers that will be used by the scraper in order to hide its origin. For details, see Proxy configuration in README.

Enabled

saveSnapshot

Optional

boolean

Will save HTML for Cheerio and HTML + screenshot for Puppeteer/Playwright

Link Selector

linkSelector

Optional

string

A CSS selector saying which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. This setting only applies if Use request queue is enabled. To filter the links added to the queue, use the Pseudo-URLs setting. If Link selector is empty, the page links are ignored. For details, see Link selector in README.

Pseudo-URLs

pseudoUrls

Optional

array

Specifies what kind of URLs found by Link selector should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in [] brackets, e.g. http://www.example.com/[.*]. This setting only applies if the Use request queue option is enabled. If Pseudo-URLs are omitted, the actor enqueues all links matched by the Link selector. For details, see Pseudo-URLs in README.

Repeat checks on provided URLs

repeatChecksOnProvidedUrls

Optional

integer

Will access each URL multiple times. Useful to test the same URL or bypass blocking of the first page.

Max number of pages checked per domain

maxNumberOfPagesCheckedPerDomain

Optional

integer

The maximum number of pages that the checker will load. The checker will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value. If set to 0, there is no limit.

Maximum concurrent pages checked per domain

maxConcurrentPagesCheckedPerDomain

Optional

integer

Specifies the maximum number of pages that can be processed by the checker in parallel for one domain. The checker automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target website.

Maximum number of concurrent domains checked

maxConcurrentDomainsChecked

Optional

integer

Specifies the maximum number of domains that should be checked at a time. This setting is relevant when passing in more than one URL to check.

Retire browser instance after request count

retireBrowserInstanceAfterRequestCount

Optional

integer

How often will the browser itself rotate. Pick a higher number for smaller consumption, pick a lower number to rotate (test) more proxies.

Headfull browser (XVFB)

puppeteer.headfull

Optional

boolean

Only works for Puppeteer type!

Use Chrome

puppeteer.useChrome

Optional

boolean

Only works for Puppeteer type! Be careful that Chrome is not guaranteed to work with Puppeteer.

Wait for

puppeteer.waitFor

Optional

string

Only works for Puppeteer type. Will wait on each page. You can provide number in ms or a selector.