
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1310
Total users
49.4k
Monthly users
6.9k
Runs succeeded
>99%
Issue response
3.8 days
Last modified
7 days ago
Feature Request: Automatic Recrawling Within Same Task Run for RAG System Integration
Open
I'm currently utilizing the Website Content Crawler in my Retrieval-Augmented Generation (RAG) system to extract and process website content. My objective is to automatically recrawl pages after fix duration of time like 30 days, all within the same task run, to ensure my system remains up-to-date with the latest information. Could you please advise if such a feature is currently supported or if there are recommended approaches to achieve this functionality? If not, I would like to suggest this as a feature request for future development.

Hi,
Thank you for using Website Content Crawler!
For your use case, you can use Apify Schedules (https://docs.apify.com/platform/schedules) to schedule monthly task runs. Additionally, you can set up a webhook integration in the task (https://docs.apify.com/platform/integrations) to receive notifications when the task completes, and then retrieve the results.
Let me know if this works for you,
Jakub