Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.9 (41)

Pricing

Pay per usage

1545

Total users

60K

Monthly users

7.8K

Runs succeeded

>99%

Issues response

7.9 days

Last modified

3 days ago

SS

Can we limit the number of pages inside a child?

Closed

sai_sampath opened this issue
a year ago

Often, The websites that are getting crawled are either too big or too small with the depth configuration.

To fix this, We can limit the number of pages but some pages are not getting scrapped due to this.

To handle this, Can we have a new param, say maxPagesPerChild which could allow us to limit of pages a child can add.

Can you please look into this use case? Thank you.

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

There were some similar feature requests earlier, if I remember correctly. Back then, we backlogged those, because it was just one user's idea - but now that we see that multiple users could benefit from such a feature, we'll increase the priority on this.

However, implementing this requires untrivial changes to the Actor and the way it's tracking the visited pages. We'll look into this and keep you posted here.

Cheers!

jiri.spilka avatar

Hi, I apologize for the very late response.

We have not prioritized this feature as we haven't received similar requests. Currently, we don’t plan to implement it in the near future.

I’m sorry about this. I’ll go ahead and close this issue for now, but we may revisit this decision later. Thank you for your understanding.

Best regards, Jiri