![Web Scraper avatar](https://images.apifyusercontent.com/rSycYnQcYLGbeVmu0KEvfJzQCBrJH7XWIv1O6VJVk1U/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9tb0pSTFJjODVBaXRBcnBOTi9abjh2YldUaWthN2FuQ1FNbi1TRC0wMi0wMi5wbmc.webp)
No credit card required
![Web Scraper](https://images.apifyusercontent.com/rSycYnQcYLGbeVmu0KEvfJzQCBrJH7XWIv1O6VJVk1U/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9tb0pSTFJjODVBaXRBcnBOTi9abjh2YldUaWthN2FuQ1FNbi1TRC0wMi0wMi5wbmc.webp)
Web Scraper
No credit card required
Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
Crawling not working well
Closed
I attempted to crawl the website https://jcyared.com, setting the maximum number of pages per crawl (maxPagesPerCrawl parameter) to 20. However, I only managed to retrieve 2 pages. Could someone explain why this might have occurred?
![adamek avatar](https://apify-image-uploads-prod.s3.amazonaws.com/EgPtw3oej6TaDt5qn/4My5YgvNjXFwBaiEw-IMG_1676_%28kopie%29.jpg)
there are two problems in your input:
- you set
maxCrawlingDepth
to 0, which means nothing nested will be enqueued - you set the globs to
https://jcyared.com/*
which means no nesting as well (as this acceptshttps://jcyared.com/foo
but nothttps://jcyared.com/foo/bar
), you wanthttps://jcyared.com/**
to allow multiple slashes in the URL path
The second one is the important bit. Here is a run with those two fixed, which seems to work as expected (I've aborted it after a few minutes but it went through more than 40 pages already):
- 2.3k monthly users
- 119 stars
- 99.9% runs succeeded
- 5.2 days response time
- Created in Mar 2019
- Modified about 1 month ago