Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
GL

Automating Web Content Crawling for Real-Time Updates

Open

glovebubble opened this issue
4 months ago

I am using a web content crawler API, and I want the crawler to run every time the website (blog) I am crawling adds new content or edits existing content. Listening to blog changes

janbuchar avatar

Hello, and thank you for your interest in Website Content Crawler! Basically, I see two options here. You could set up a Schedule for your crawling task and write some script to compare every new result with the previous one. Or you could use the content-checker actor, have it trigger a webhook when it finishes, and in the webhook you could inspect the result and then optionally call Website Content Crawler.

Developer
Maintained by Apify

Actor Metrics

  • 3.9k monthly users

  • 709 stars

  • >99% runs succeeded

  • 2.1 days response time

  • Created in Mar 2023

  • Modified 17 days ago