Apr 5, 2024
Performance improvements, API updates, and the new Adaptive Playwright Crawler
New
Update
Actor
API
Performance improvements
As part of our continuous performance improvement initiative, we're happy to announce that we successfully improved the Apify API response time by 50% on average and the 90th-percentile startup time of Actors by about 20%. We will continue improving Apify in this direction.
API updates
User limits endpoint now returns maxConcurrentActorJobs
and activeActorJobCount
properties enabling users to keep an eye on the concurrency limit.
We also added the missing endpoint /actor-builds/:build-id/log,
allowing you to quickly access the log of certain builds without a need for an Actor run ID.
Adaptive Playwright Crawler
Try out Crawlee's new AdaptivePlaywrightCrawler class abstraction, which is an extension of PlaywrightCrawler that uses a more limited request handler interface so that it's able to switch to HTTP-only crawling when it detects that it may be possible. This way, you can achieve lower costs when crawling multiple websites.
1const crawler = new AdaptivePlaywrightCrawler({ 2 renderingTypeDetectionRatio: 0.1, 3 async requestHandler({ querySelector, pushData, enqueueLinks, request, log }) { 4 // This function is called to extract data from a single web page 5 const $prices = await querySelector('span.price') 6 7 await pushData({ 8 url: request.url, 9 price: $prices.filter(':contains("$")').first().text(), 10 }) 11 12 await enqueueLinks({ selector: '.pagination a' }) 13 }, 14}); 15 16await crawler.run([ 17 'http://www.example.com/page-1', 18 'http://www.example.com/page-2', 19]);