Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

Try for free

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

User avatar

Include HTML Element (instead of exclude)


matthias.amberg opened this issue
3 months ago

A lot of CMS Webpages have a marker that defines the actual content of the page (excluding navigation and stuff like that). For instance as a


A feature where you can explicitly include certain HTML Query paths instead of excluding would be nice in the HTML processing settings.

User avatar

Hello @matthias.amberg and thank you for your interest in this Actor!

I see why you might want something like this, but at the same time, it feels our current extraction setup is robust enough (so there should be no need for this). In other words - our HTML extractors should do this step automatically.

Either way, I'm very curious about this - we'll be happy to hear about your use case for this! Do you have an example of a page where this feature would help you? Thanks!

Maintained by Apify
Actor metrics
  • 2k monthly users
  • 99.7% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified 9 days ago