Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

User avatar

A clear error message with the webhook for sites that have no parsed data

Closed

sai_sampath opened this issue
a month ago

As the cheerio cannot work with dynamic content well, Some sites would have no data at all.

A clear error message should be added in these cases so that we can implement a fallback in such cases in our own code.

Currently, For the given example run given below, Please check the data once, A clear indication like this would be good.

Thank you.

User avatar

Hello again!

Unfortunately, it seems that you have deleted the run you're linking here. Could you please share a link to an actual run, so we can take a look at that? In general, it's quite hard to (automatically) decide whether something is missing from the crawled webpage contents.

However, if you want to switch to Cheerio where possible, but still be able to parse dynamic content on websites, we recently released a new crawler type Adaptive switching between Playwright and raw HTTP client. The name says it all - it's constantly comparing the Cheerio and Browser output and switches between those to optimize the crawl speed and cost while still keeping all the contents. Give it a try and let us know how it went.

Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2k monthly users
  • 99.9% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified 3 days ago