Web Scraper avatar
Web Scraper
Try for free

No credit card required

View all Actors
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

User avatar

Crawling not working well

Closed

agat opened this issue
3 months ago

I attempted to crawl the website https://jcyared.com, setting the maximum number of pages per crawl (maxPagesPerCrawl parameter) to 20. However, I only managed to retrieve 2 pages. Could someone explain why this might have occurred?

User avatar

there are two problems in your input:

  • you set maxCrawlingDepth to 0, which means nothing nested will be enqueued
  • you set the globs to https://jcyared.com/* which means no nesting as well (as this accepts https://jcyared.com/foo but not https://jcyared.com/foo/bar), you want https://jcyared.com/** to allow multiple slashes in the URL path

The second one is the important bit. Here is a run with those two fixed, which seems to work as expected (I've aborted it after a few minutes but it went through more than 40 pages already):

https://console.apify.com/view/runs/pDfb703n0fEdaUHyv

Developer
Maintained by Apify
Actor metrics
  • 3.4k monthly users
  • 99.9% runs succeeded
  • 3.2 days response time
  • Created in Mar 2019
  • Modified about 2 months ago