
Web Scraper
Pricing
Pay per usage

Web Scraper
Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
4.5 (23)
Pricing
Pay per usage
854
Total users
88K
Monthly users
4.5K
Runs succeeded
>99%
Issues response
9 days
Last modified
a month ago
Crawl is stopping after 40 listings with no error message
Closed
Crawl is stopping after 40 listings with no error message even though there are much more (over 800). I don't understand the issue. There are only 10 listings per page. So, if it was a pagination issue, I would think it would only pull those 10. And if it was an issue with the code then it wouldn't pull any or at least have an error.
Hello @ColabReggie and thank you for your interest in this Actor!
It is a pagination issue - the Actor never visits any index page other than the first one. All the additional (~30) listing results are being enqueued from the "You May Also Be Interested In" sections of the (first 10) listings (see e.g. https://medspa.com/listing/removery-denver and scroll all the way down).
There are multiple ways of scraping such websites with this Actor, I'll present the one I consider the easiest and most flexible:
Note: If you're in a hurry, I fixed your code in this run - feel free to copy the input :)
- Remove the
Link Selector
(leave the field blank). Because of different page types, we'll handle the link enqueueing ourselves. - For the first start URL, click the button that says
Advanced
and add{"label": "START"}
to the user data field. This way, we'll be able to tell the index page apart from the actual listings. - In your page function, you can do something like:
async function pageFunction(context) {const { url, userData: { label } } = context.request;const $ = context.jQuery;const log = context.log.info;if(label === 'START') {log('Scraping the index page on ' + url);// Find and enqueue all the listings from the current pageconst listingsOnThisPage = $('.lf-item > a').map((_, element) => $(element).attr('href'));for (let listingUrl of ... [trimmed]