Web Scraper avatar
Web Scraper
Try for free

No credit card required

View all Actors
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

User avatar

Crawler only goes for 3 pages and stops

Closed

demonstrative_space opened this issue
a month ago

I'm using Web Scraper When I run it, Crawler only goes for 3 pages and stops. Also why each time I'm running it, it tries to Scrap the same URLs, should't they be excluded since they already Scraped?

User avatar

Hello and thank you for your interest in this Actor!

The Actor only crawls 3 webpages because you tell it to only enqueue links that start with https://kubernetes.io/docs/home/ (by setting the glob option to https://kubernetes.io/docs/home/**). Now check out the webpage - the links in there lead to https://kubernetes.io/docs/concepts/..., https://kubernetes.io/docs/setup/... etc. If you want to crawl these links as well, set the glob to something like https://kubernetes.io/docs/**.

Regarding your second question - every Actor Run is separate and stateless (at least in the case of Web Scraper). We believe that this is the way to go - because you don't know if a page hasn't changed... until you scrape it and see the content.

I'll close this issue now, but feel free to ask any additional questions. Cheers!

Developer
Maintained by Apify
Actor metrics
  • 3.7k monthly users
  • 98.8% runs succeeded
  • 3.6 days response time
  • Created in Mar 2019
  • Modified about 1 month ago