
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1310
Total users
49.4k
Monthly users
6.9k
Runs succeeded
>99%
Issue response
3.8 days
Last modified
7 days ago
This crawler took too much time and too much compute power
Closed
This crawler ran for more than 12hrs. It probaly went into a loop and did not stop for hours of running until the billing limit was reached

Hi, thank you for using Website Content Crawler.
From the Actor run logs, there is no indication that the crawler went into a loop - it skipped all pages that it had already crawled. It seems that the website you were trying to crawl is too large and contains too many pages.
This can cause crawling to take a long time. To prevent this, we recommend setting the maxPages
parameter or specifying the Actor run timeout option to prevent this issue. You can even exclude some pages from being crawled by setting excludeUrlGlobs
.
Sorry that this happened to you. Please set the limits next time or keep an eye on the Actor run.
Jakub