Website Checker Runner Puppeteer

Pricing

Pay per usage

Website Checker Runner Puppeteer

Checks the provided website using Puppeteer. This is a low level runner, most likely you want to use the high level master actor - https://apify.com/lukaskrivka/website-checker

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Lukáš Křivka

Maintained by Community

Actor stats

Bookmarked

235

Total users

Monthly active users

4 months ago

Last modified

URLs to check

urlsToCheckarrayRequired

A static list of URLs to check for captchas. To be able to add new URLs on the fly, enable the Use request queue option.

For details, see Start URLs in README.

Proxy Configuration

proxyConfigurationobjectOptional

Specifies proxy servers that will be used by the scraper in order to hide its origin.

For details, see Proxy configuration in README.

Default value of this property is {}

Enabled

saveSnapshotbooleanOptional

Will save HTML for Cheerio and HTML + screenshot for Puppeteer/Playwright

Link Selector

linkSelectorstringOptional

A CSS selector saying which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. This setting only applies if Use request queue is enabled. To filter the links added to the queue, use the Pseudo-URLs setting.

If Link selector is empty, the page links are ignored.

For details, see Link selector in README.

Pseudo-URLs

pseudoUrlsarrayOptional

Specifies what kind of URLs found by Link selector should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in [] brackets, e.g. http://www.example.com/[.*]. This setting only applies if the Use request queue option is enabled.

If Pseudo-URLs are omitted, the actor enqueues all links matched by the Link selector.

For details, see Pseudo-URLs in README.

Default value of this property is []

Repeat checks on provided URLs

repeatChecksOnProvidedUrlsintegerOptional

Will access each URL multiple times. Useful to test the same URL or bypass blocking of the first page.

Max number of pages checked per domain

maxNumberOfPagesCheckedPerDomainintegerOptional

The maximum number of pages that the checker will load. The checker will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.

If set to 0, there is no limit.

Default value of this property is 100

Maximum concurrent pages checked per domain

maxConcurrentPagesCheckedPerDomainintegerOptional

Specifies the maximum number of pages that can be processed by the checker in parallel for one domain. The checker automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target website.

Default value of this property is 50

Maximum number of concurrent domains checked

maxConcurrentDomainsCheckedintegerOptional

Specifies the maximum number of domains that should be checked at a time. This setting is relevant when passing in more than one URL to check.

Default value of this property is 5

Retire browser instance after request count

retireBrowserInstanceAfterRequestCountintegerOptional

How often will the browser itself rotate. Pick a higher number for smaller consumption, pick a lower number to rotate (test) more proxies.

Default value of this property is 10

Headfull browser (XVFB)

puppeteer.headfullbooleanOptional

Only works for Puppeteer type!

Use Chrome

puppeteer.useChromebooleanOptional

Only works for Puppeteer type! Be careful that Chrome is not guaranteed to work with Puppeteer.

Wait for

puppeteer.waitForstringOptional

Only works for Puppeteer type. Will wait on each page. You can provide number in ms or a selector.

Website Checker Runner Playwright

lukaskrivka/website-checker-playwright

Checks the provided website using Playwright. This is a low level runner, most likely you want to use the high level master actor - https://apify.com/lukaskrivka/website-checker

Lukáš Křivka

159

Website Checker Runner Cheerio

lukaskrivka/website-checker-cheerio

Checks the provided website using cheerio. This is a low level runner, most likely you want to use the high level master actor - https://apify.com/lukaskrivka/website-checker

Lukáš Křivka

297

Website-checker-starter

vaclavrut/website-checker-starter

Works with lukaskrivka/website-checker. The idea is that this actor manages more URLs on the input, will start website-checker with 10 runs at a time and store all data to one datasets.

Vaclav Rut

Example Puppeteer

apify/example-puppeteer

Example showing how to use headless Chromium with Puppeteer to open a web page, determine its dimensions, save a screenshot, and print the page to PDF. This actor must use images with Puppeteer (Node.js 8 + Puppeteer on Debian).

Apify

291

4.6

Monitoring Runner

apify/monitoring-runner

The monitoring runner is a part of the Apify Monitoring Suite (apify/monitoring). See its readme for more information and how to use this.

Apify

136

4.5

Website Checker Workload

lukaskrivka/website-checker-workload

Creates reasonable workloads for analyzing any website with the Website Checker actor and combines the resulting data. This is the easiest way to analyze any website for compute unit usage and anti-scraping blocking.

Lukáš Křivka

Example Code Runner (Puppeteer)

apify/example-code-runner-puppeteer

Generic Actor to run code examples from the documentation via "Run on Apify" links.

Apify

633

4.3

Website extract

mrahil/my-actor

It is website extractor

Mohammed Rahil

134

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

11K

4.9