Web Scraper avatar
Web Scraper

Pricing

Pay per usage

Go to Store
Web Scraper

Web Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

4.5 (22)

Pricing

Pay per usage

700

Total users

82.6k

Monthly users

4k

Runs succeeded

>99%

Issue response

32 days

Last modified

20 days ago

DS

Crawler only goes for 3 pages and stops

Closed

demonstrative_space opened this issue
a year ago

I'm using Web Scraper When I run it, Crawler only goes for 3 pages and stops. Also why each time I'm running it, it tries to Scrap the same URLs, should't they be excluded since they already Scraped?

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

The Actor only crawls 3 webpages because you tell it to only enqueue links that start with https://kubernetes.io/docs/home/ (by setting the glob option to https://kubernetes.io/docs/home/**). Now check out the webpage - the links in there lead to https://kubernetes.io/docs/concepts/..., https://kubernetes.io/docs/setup/... etc. If you want to crawl these links as well, set the glob to something like https://kubernetes.io/docs/**.

Regarding your second question - every Actor Run is separate and stateless (at least in the case of Web Scraper). We believe that this is the way to go - because you don't know if a page hasn't changed... until you scrape it and see the content.

I'll close this issue now, but feel free to ask any additional questions. Cheers!