
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1.1k
Monthly users
6k
Runs succeeded
>99%
Response time
2.3 days
Last modified
7 days ago
Can the Website Content Crawler Split Runs to Finish Within 15 Minutes for Large Sites?
I am building an AI chatbot that allows customers to either provide their website's sitemap or simply enter the base URL along with the total number of pages to be crawled.
However, when the number of URLs reaches around 500, the crawling process can take an hour or more to complete. During this time, users are left waiting with no indication of progress. To improve the user experience, we are looking for a solution within Apify that enables the crawling process to be split into multiple runs, each limited to a maximum of X URLs (e.g., 50). This would allow us to provide users with quicker progress updates.
Our question is: Is it possible to split the crawling process into multiple runs, ensuring that each run is completed within 15 minutes while handling large websites?

Hi, thank you for using the Website Content Crawler.
First, you can try increasing the memory limit of the Actor run—for example, to 16 GB—and boosting the initial concurrency in the Crawler settings for better performance. I’d recommend trying this simpler approach first. Please see this example run: https://console.apify.com/view/runs/WASVGBad1KDxNIRLj
For progress updates, you can retrieve the Actor run status, which the Website Content Crawler updates automatically (e.g., "Crawled 270/508 pages, 0 failed requests, desired concurrency 24"). This can be accessed via the SDK or the API endpoint https://docs.apify.com/api/v2/actor-run-get using the statusMessage
attribute.
Yes, you can split crawling into multiple Actor runs on Apify, capping each at 50 URLs to stay under 15 minutes. Since runs don’t share state, divide the website into independent sections (e.g., by URL patterns) to avoid overlap. For speed, use the Cheerio scraper - it’s faster but skips JavaScript rendering.
Let me know if this works for you.
Jakub
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.