Web Scraper avatar

Web Scraper

Try for free

No credit card required

Go to Store
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using JavaScript code. The Actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Do you want to learn more about this Actor?

Get a demo
nikita-sviridenko avatar

We passed 40 websites -> it returned 1 description

Closed

Nikita Sviridenko (nikita-sviridenko) opened this issue
a month ago

https://www.loom.com/share/0feaf14b3b18436ebb8752389381b63e

Issue Overview:

We encountered significant challenges while attempting to scrape multiple websites. Despite submitting a batch of around 40 URLs, only two were successfully processed, even though the system indicated success. The key issues are as follows:

  1. Bulk Processing Limitation: The current setup does not support efficient bulk processing of websites. Handling a large input, such as thousands of URLs, is infeasible without creating individual tasks for each, which is both impractical and costly.

  2. Error Tracking and Transparency: The system does not provide a way to map errors to specific URLs, making it difficult to identify and address issues for individual websites.

  3. Processing Failures: Most of the submitted URLs were ignored, with no clear indication of why this occurred, despite there being no apparent limitations (e.g., task limits).

Resolution Needed: A more scalable solution is required to process bulk website inputs effectively and provide detailed feedback for each URL, including handling errors systematically.

nikita-sviridenko avatar

Attaching input

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

Note that you're setting the maxResultsPerCrawl input to 1. This means the Actor will stop after producing one result. You can also see this in the Actor log (User set limit of 1 results was reached. Finishing the crawl.).

See my "fixed" run here, where I just set the maxResultsPerCrawl option to 100. Even though some pages are still missing (the servers were unreachable, even from my own computer), the Actor produces 46 results.

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!

Developer
Maintained by Apify

Actor Metrics

  • 2.5k monthly users

  • 331 stars

  • >99% runs succeeded

  • 37 days response time

  • Created in Mar 2019

  • Modified 5 months ago