![Website Content Crawler avatar](https://images.apifyusercontent.com/1VrdawICnxIwM4X5JzRJHPBmLx0OpmiNxtHGGLmxdu8/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9hWUcwbDlzN2RiQjdqM2diUy9QZlRvRU5rSlp4YWh6UER1My1DbGVhblNob3RfMjAyMy0wMy0yOF9hdF8xMC40MC4yMF8yeC5wbmc.webp)
No credit card required
![Website Content Crawler](https://images.apifyusercontent.com/1VrdawICnxIwM4X5JzRJHPBmLx0OpmiNxtHGGLmxdu8/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9hWUcwbDlzN2RiQjdqM2diUy9QZlRvRU5rSlp4YWh6UER1My1DbGVhblNob3RfMjAyMy0wMy0yOF9hdF8xMC40MC4yMF8yeC5wbmc.webp)
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.
bug: Actor gets stuck on large sitemaps
Open
Based on the logs from this actor's run it looks like it was migrated from host to host without much else happening. It would be helpful to know what the issue was here and if there is a way for us to avoid it in the future.
We were also charged for quite a bit of usage without anything to show for it so perhaps we could get some of that credited back?
Hello and thank you for your interest in this Actor!
This seems like an internal error in the Actor. We are working on the fix (we already have it prepared, now we just need to test it properly). I'll let our support team know and they'll look into reimbursing your expenses promptly.
I'll also keep this issue open to track the underlying issue - and will keep you posted once this gets fixed.
Thank you - and sorry for the inconvenience.
Just so you know - our Support team just reimbursed you the lost credits as a credit offset.
The issue is still there (i.e. running the Actor now will likely result in the same outcome) - we'll keep you posted once that changes.
Cheers!
MavenAGI
Thanks for the prompt reply and wonderful support!
MavenAGI
We just hit this issue again on another run[1]. Is there something we can check for (programmatically) to determine if we're running into this bug? Most of our runs complete without a problem so the error case is definitely an outlier.
Hello again,
we're sorry to hear that you've run into problems with this Actor again.
You can set up monitoring alerts on the Alerts tab - here, you can e.g. have Apify send you an email if the run duration exceeds a given limit.
To further mitigate the credit loss, you can also set the Actor timeout lower (see Run options > Timeout
) - after checking your previous runs, I found out all of them (that weren't failing) finished under 41000
seconds (cca 11.5 hours). The current timeout limit of 604,800
seconds might therefore be too excessive and can be lowered.
Thank you again for bringing this up and sorry for the inconvenience.
- 2.8k monthly users
- 317 stars
- 100.0% runs succeeded
- 4 days response time
- Created in Mar 2023
- Modified 1 day ago