Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler

Developed by

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.1k

Monthly users

6k

Runs succeeded

>99%

Response time

2.3 days

Last modified

7 days ago

H2

This crawler took too much time and too much compute power

Closed
H24 opened this issue
a month ago

This crawler ran for more than 12hrs. It probaly went into a loop and did not stop for hours of running until the billing limit was reached

jakub.kopecky avatar

Hi, thank you for using Website Content Crawler.

From the Actor run logs, there is no indication that the crawler went into a loop - it skipped all pages that it had already crawled. It seems that the website you were trying to crawl is too large and contains too many pages.

This can cause crawling to take a long time. To prevent this, we recommend setting the maxPages parameter or specifying the Actor run timeout option to prevent this issue. You can even exclude some pages from being crawled by setting excludeUrlGlobs.

Sorry that this happened to you. Please set the limits next time or keep an eye on the Actor run.

Jakub

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.