Web Scraper avatar
Web Scraper

Pricing

Pay per usage

Go to Store
Web Scraper

Web Scraper

apify/web-scraper

Developed by

Apify

Maintained by Apify

Crawls arbitrary websites using the Chrome browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

4.5 (22)

Pricing

Pay per usage

563

Monthly users

3.5k

Runs succeeded

>99%

Response time

10 days

Last modified

2 months ago

nikita-sviridenko avatar

We passed 40 websites -> it returned 1 description

Closed
Nikita Sviridenko (nikita-sviridenko) opened this issue
4 months ago

https://www.loom.com/share/0feaf14b3b18436ebb8752389381b63e

Issue Overview:

We encountered significant challenges while attempting to scrape multiple websites. Despite submitting a batch of around 40 URLs, only two were successfully processed, even though the system indicated success. The key issues are as follows:

  1. Bulk Processing Limitation: The current setup does not support efficient bulk processing of websites. Handling a large input, such as thousands of URLs, is infeasible without creating individual tasks for each, which is both impractical and costly.

  2. Error Tracking and Transparency: The system does not provide a way to map errors to specific URLs, making it difficult to identify and address issues for individual websites.

  3. Processing Failures: Most of the submitted URLs were ignored, with no clear indication of why this occurred, despite there being no apparent limitations (e.g., task limits).

Resolution Needed: A more scalable solution is required to process bulk website inputs effectively and provide detailed feedback for each URL, including handling errors systematically.

nikita-sviridenko avatar

Attaching input

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

Note that you're setting the maxResultsPerCrawl input to 1. This means the Actor will stop after producing one result. You can also see this in the Actor log (User set limit of 1 results was reached. Finishing the crawl.).

See my "fixed" run here, where I just set the maxResultsPerCrawl option to 100. Even though some pages are still missing (the servers were unreachable, even from my own computer), the Actor produces 46 results.

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.