Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
MV

The request queue hasn't had activity for 300s, resetting internal state

Closed

MavenAGI opened this issue
2 months ago

We have multiple crawls that are logging this message over and over again without making any progress.

Oscardz avatar

This is an known issue that we are investigating at the moment. Once it's fixed, I will gladly reimburse the money for those Runs. Sorry for the inconvenience, and I will keep you posted on the progress of this fix.

MV

MavenAGI

2 months ago

Thank you. It seems to be related to the 0.3.50 release as we didn't see it happening before then and most of our runs with that build have hit this issue. We're updating our code to request runs with the 0.3.49 build as a workaround.

Oscardz avatar

Yes, you are right. There was a bug in the Request Queue, but it's been fixed in the latest version of the Actor. We apologize for the inconvenience, and you will be reimbursed for the spending accordingly.

RM

reachable_mule

2 months ago

i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?

jiri.spilka avatar

I apologize for the delay. The issue has been fixed in the latest version, 0.3.52.

As for the reimbursement, I'm currently looking into it. Please allow me some time.

XS

axstv

2 months ago

we're still seeing this issue as well in 0.3.52. with the same logs: The request queue hasn't had activity for 300s, resetting internal state

I assume that this will be reimbursed as well

jiri.spilka avatar

@reachable_mule

i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?

I got the information that it was reimbursed. Note that you need to increase the platform limit (Billing > Platform limit) by the reimbursed amount to make use of the reimbursement.

jiri.spilka avatar

@axstv I’ve checked your runs and noticed the log The request queue hasn't had activity for 300s, resetting internal state, but this occurs only for a limited period of time. The other users faced it for hours.

The bigger issue seems to be with handling sitemaps. You're trying to retrieve two results, but the attempt to fetch the sitemaps is causing a block, and it takes around 15 minutes to actually start scraping.

I've created a new issue to track it. We’ll take a closer look. Please subscribe to the new issues. Let me close this issues and we will discuss everything in the new one.

In the meantime, if you don’t need to use sitemaps, could you retry with Consider URLs from Sitemaps set to False? I tried this and was able to get all the results in under 60 seconds.

RM

reachable_mule

2 months ago

Yes I will try that and thank you very much for the amazing support.

Developer
Maintained by Apify

Actor Metrics

  • 3.9k monthly users

  • 711 stars

  • >99% runs succeeded

  • 2.2 days response time

  • Created in Mar 2023

  • Modified 17 days ago