Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

User avatar

Can we limit the number of pages inside a child?

Open

sai_sampath opened this issue
20 days ago

Often, The websites that are getting crawled are either too big or too small with the depth configuration.

To fix this, We can limit the number of pages but some pages are not getting scrapped due to this.

To handle this, Can we have a new param, say maxPagesPerChild which could allow us to limit of pages a child can add.

Can you please look into this use case? Thank you.

User avatar

Hello and thank you for your interest in this Actor!

There were some similar feature requests earlier, if I remember correctly. Back then, we backlogged those, because it was just one user's idea - but now that we see that multiple users could benefit from such a feature, we'll increase the priority on this.

However, implementing this requires untrivial changes to the Actor and the way it's tracking the visited pages. We'll look into this and keep you posted here.

Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2k monthly users
  • 99.9% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified 3 days ago