Website Checker Workload avatar
Website Checker Workload
Try for free

No credit card required

View all Actors
Website Checker Workload

Website Checker Workload

lukaskrivka/website-checker-workload
Try for free

No credit card required

Creates reasonable workloads for analyzing any website with the Website Checker actor and combines the resulting data. This is the easiest way to analyze any website for compute unit usage and anti-scraping blocking.

Creates reasonable workloads for analyzing any website with Website Checker and combines the resulting data. This is the easiest way to analyze any website for compute units usage and blocking.

This actor runs a Website Checker for each proxy group and for both browser/Puppeteer and Cheerio scraper. Those checks are run in parallel with reasonable default values and the output of all checkers in combined into a single output breakdown. This gives you quite a nice idea how difficult and costly will be scraping the site with different methods and can save precious time you would spend with manual checks.

Input

FieldTypeDefaultDescription
websiteStringhttps://apify.comWebsite URL where you want to start checking
runBrowserBooleantrueRun the checker with browser
runCheerioBooleantrueCheck with Cheerio
proxyGroupsArray['auto', 'BUYPROXIES84958']List of proxy groups you want to test. Can be also auto to run with all proxies
maxPagesPerCheckNumber200Max pages per each check
runInParallelBooleantrueWhat to scrape from each page, default is "posts" the other option is "comments"

Output

The output is saved to the default Key-Value store as OUTPUT record. It is a combined output from all Website Checker runs with added spent compute units.

For example for input consisting of

1"runBrowser": true,
2"runCheerio": true,
3"proxyGroups": ["auto", "BUYPROXIES84958"]

The actor will run 4 checkers with all possible combinations:

1{
2    "puppeteer/auto": {
3        "computeUnits": 0.45,
4        "pagesPerComputeUnit": 444,
5        "timeouted": 0,
6        "failedToLoadOther": 9,
7        "accessDenied": 0,
8        "recaptcha": 0,
9        "distilCaptcha": 24,
10        "statusCodes": {
11            "200": 3,
12            "401": 2,
13            "403": 5,
14            "405": 24
15        },
16        "total": 43
17    },
18    "puppeteer/BUYPROXIES84958": {
19        "computeUnits": 0.45,
20        "pagesPerComputeUnit": 444,
21        "timeouted": 0,
22        "failedToLoadOther": 9,
23        "accessDenied": 0,
24        "recaptcha": 0,
25        "distilCaptcha": 24,
26        "statusCodes": {
27            "200": 3,
28            "401": 2,
29            "403": 5,
30            "405": 24
31        },
32        "total": 43
33    },
34    "cheerio/auto": {
35        "computeUnits": 0.05,
36        "pagesPerComputeUnit": 4000,
37        "timeouted": 0,
38        "failedToLoadOther": 9,
39        "accessDenied": 0,
40        "recaptcha": 0,
41        "distilCaptcha": 24,
42        "statusCodes": {
43            "200": 3,
44            "401": 2,
45            "403": 5,
46            "405": 24
47        },
48        "total": 43
49    },
50    "cheerio/BUYPROXIES84958": {
51        "computeUnits": 0.05,
52        "pagesPerComputeUnit": 4000,
53        "timeouted": 0,
54        "failedToLoadOther": 9,
55        "accessDenied": 0,
56        "recaptcha": 0,
57        "distilCaptcha": 24,
58        "statusCodes": {
59            "200": 3,
60            "401": 2,
61            "403": 5,
62            "405": 24
63        },
64        "total": 43
65    },
66}
Developer
Maintained by Community
Actor metrics
  • 1 monthly users
  • 100.0% runs succeeded
  • days response time
  • Created in Nov 2020
  • Modified 7 months ago
Categories