Website Checker Workload
No credit card required
Website Checker Workload
No credit card required
Creates reasonable workloads for analyzing any website with the Website Checker actor and combines the resulting data. This is the easiest way to analyze any website for compute unit usage and anti-scraping blocking.
Creates reasonable workloads for analyzing any website with Website Checker and combines the resulting data. This is the easiest way to analyze any website for compute units usage and blocking.
This actor runs a Website Checker for each proxy group and for both browser/Puppeteer and Cheerio scraper. Those checks are run in parallel with reasonable default values and the output of all checkers in combined into a single output breakdown. This gives you quite a nice idea how difficult and costly will be scraping the site with different methods and can save precious time you would spend with manual checks.
Input
Field | Type | Default | Description |
---|---|---|---|
website | String | https://apify.com | Website URL where you want to start checking |
runBrowser | Boolean | true | Run the checker with browser |
runCheerio | Boolean | true | Check with Cheerio |
proxyGroups | Array | ['auto', 'BUYPROXIES84958'] | List of proxy groups you want to test. Can be also auto to run with all proxies |
maxPagesPerCheck | Number | 200 | Max pages per each check |
runInParallel | Boolean | true | What to scrape from each page, default is "posts" the other option is "comments" |
Output
The output is saved to the default Key-Value store as OUTPUT
record. It is a combined output from all Website Checker runs with added spent compute units.
For example for input consisting of
1"runBrowser": true, 2"runCheerio": true, 3"proxyGroups": ["auto", "BUYPROXIES84958"]
The actor will run 4 checkers with all possible combinations:
1{ 2 "puppeteer/auto": { 3 "computeUnits": 0.45, 4 "pagesPerComputeUnit": 444, 5 "timeouted": 0, 6 "failedToLoadOther": 9, 7 "accessDenied": 0, 8 "recaptcha": 0, 9 "distilCaptcha": 24, 10 "statusCodes": { 11 "200": 3, 12 "401": 2, 13 "403": 5, 14 "405": 24 15 }, 16 "total": 43 17 }, 18 "puppeteer/BUYPROXIES84958": { 19 "computeUnits": 0.45, 20 "pagesPerComputeUnit": 444, 21 "timeouted": 0, 22 "failedToLoadOther": 9, 23 "accessDenied": 0, 24 "recaptcha": 0, 25 "distilCaptcha": 24, 26 "statusCodes": { 27 "200": 3, 28 "401": 2, 29 "403": 5, 30 "405": 24 31 }, 32 "total": 43 33 }, 34 "cheerio/auto": { 35 "computeUnits": 0.05, 36 "pagesPerComputeUnit": 4000, 37 "timeouted": 0, 38 "failedToLoadOther": 9, 39 "accessDenied": 0, 40 "recaptcha": 0, 41 "distilCaptcha": 24, 42 "statusCodes": { 43 "200": 3, 44 "401": 2, 45 "403": 5, 46 "405": 24 47 }, 48 "total": 43 49 }, 50 "cheerio/BUYPROXIES84958": { 51 "computeUnits": 0.05, 52 "pagesPerComputeUnit": 4000, 53 "timeouted": 0, 54 "failedToLoadOther": 9, 55 "accessDenied": 0, 56 "recaptcha": 0, 57 "distilCaptcha": 24, 58 "statusCodes": { 59 "200": 3, 60 "401": 2, 61 "403": 5, 62 "405": 24 63 }, 64 "total": 43 65 }, 66}
Actor Metrics
1 monthly user
-
2 stars
>99% runs succeeded
Created in Nov 2020
Modified a year ago