
Web Scraper
Pricing
Pay per usage

Web Scraper
Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
4.5 (22)
Pricing
Pay per usage
700
Total users
82.6k
Monthly users
4k
Runs succeeded
>99%
Issue response
32 days
Last modified
20 days ago
Crawler only goes for 3 pages and stops
Closed
I'm using Web Scraper When I run it, Crawler only goes for 3 pages and stops. Also why each time I'm running it, it tries to Scrap the same URLs, should't they be excluded since they already Scraped?
Hello and thank you for your interest in this Actor!
The Actor only crawls 3 webpages because you tell it to only enqueue links that start with https://kubernetes.io/docs/home/
(by setting the glob option to https://kubernetes.io/docs/home/**
). Now check out the webpage - the links in there lead to https://kubernetes.io/docs/concepts/...
, https://kubernetes.io/docs/setup/...
etc. If you want to crawl these links as well, set the glob to something like https://kubernetes.io/docs/**
.
Regarding your second question - every Actor Run is separate and stateless (at least in the case of Web Scraper). We believe that this is the way to go - because you don't know if a page hasn't changed... until you scrape it and see the content.
I'll close this issue now, but feel free to ask any additional questions. Cheers!