Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

MV

bug: Actor gets stuck on large sitemaps

Open

MavenAGI opened this issue
2 months ago

Based on the logs from this actor's run it looks like it was migrated from host to host without much else happening. It would be helpful to know what the issue was here and if there is a way for us to avoid it in the future.

We were also charged for quite a bit of usage without anything to show for it so perhaps we could get some of that credited back?

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

This seems like an internal error in the Actor. We are working on the fix (we already have it prepared, now we just need to test it properly). I'll let our support team know and they'll look into reimbursing your expenses promptly.

I'll also keep this issue open to track the underlying issue - and will keep you posted once this gets fixed.

Thank you - and sorry for the inconvenience.

jindrich.bar avatar

Just so you know - our Support team just reimbursed you the lost credits as a credit offset.

The issue is still there (i.e. running the Actor now will likely result in the same outcome) - we'll keep you posted once that changes.

Cheers!

MV

MavenAGI

2 months ago

Thanks for the prompt reply and wonderful support!

MV

MavenAGI

2 months ago

We just hit this issue again on another run[1]. Is there something we can check for (programmatically) to determine if we're running into this bug? Most of our runs complete without a problem so the error case is definitely an outlier.

[1] https://console.apify.com/organization/5WhuE8XiPsnLiYsmv/actors/aYG0l9s7dbB7j3gbS/runs/vMRA3ElqRIaV3TQ4x#log

jindrich.bar avatar

Hello again,

we're sorry to hear that you've run into problems with this Actor again.

You can set up monitoring alerts on the Alerts tab - here, you can e.g. have Apify send you an email if the run duration exceeds a given limit. To further mitigate the credit loss, you can also set the Actor timeout lower (see Run options > Timeout) - after checking your previous runs, I found out all of them (that weren't failing) finished under 41000 seconds (cca 11.5 hours). The current timeout limit of 604,800 seconds might therefore be too excessive and can be lowered.

Thank you again for bringing this up and sorry for the inconvenience.

Developer
Maintained by Apify
Actor metrics
  • 2.8k monthly users
  • 317 stars
  • 100.0% runs succeeded
  • 4 days response time
  • Created in Mar 2023
  • Modified 1 day ago