No credit card required
Web Scraper
No credit card required
Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
I attempted to crawl the website https://jcyared.com, setting the maximum number of pages per crawl (maxPagesPerCrawl parameter) to 20. However, I only managed to retrieve 2 pages. Could someone explain why this might have occurred?
there are two problems in your input:
- you set
maxCrawlingDepth
to 0, which means nothing nested will be enqueued - you set the globs to
https://jcyared.com/*
which means no nesting as well (as this acceptshttps://jcyared.com/foo
but nothttps://jcyared.com/foo/bar
), you wanthttps://jcyared.com/**
to allow multiple slashes in the URL path
The second one is the important bit. Here is a run with those two fixed, which seems to work as expected (I've aborted it after a few minutes but it went through more than 40 pages already):
- 3.7k monthly users
- 98.7% runs succeeded
- 3.3 days response time
- Created in Mar 2019
- Modified about 1 month ago