Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1310

Total users

49.4k

Monthly users

6.9k

Runs succeeded

>99%

Issue response

3.8 days

Last modified

7 days ago

SN

Feature Request: Automatic Recrawling Within Same Task Run for RAG System Integration

Open

sprouto_net opened this issue
20 days ago

I'm currently utilizing the Website Content Crawler in my Retrieval-Augmented Generation (RAG) system to extract and process website content. My objective is to automatically recrawl pages after fix duration of time like 30 days, all within the same task run, to ensure my system remains up-to-date with the latest information. Could you please advise if such a feature is currently supported or if there are recommended approaches to achieve this functionality? If not, I would like to suggest this as a feature request for future development.

jakub.kopecky avatar

Hi,

Thank you for using Website Content Crawler!

For your use case, you can use Apify Schedules (https://docs.apify.com/platform/schedules) to schedule monthly task runs. Additionally, you can set up a webhook integration in the task (https://docs.apify.com/platform/integrations) to receive notifications when the task completes, and then retrieve the results.

Let me know if this works for you,

Jakub