Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoWe have multiple crawls that are logging this message over and over again without making any progress.
This is an known issue that we are investigating at the moment. Once it's fixed, I will gladly reimburse the money for those Runs. Sorry for the inconvenience, and I will keep you posted on the progress of this fix.
Thank you. It seems to be related to the 0.3.50 release as we didn't see it happening before then and most of our runs with that build have hit this issue. We're updating our code to request runs with the 0.3.49 build as a workaround.
Yes, you are right. There was a bug in the Request Queue, but it's been fixed in the latest version of the Actor. We apologize for the inconvenience, and you will be reimbursed for the spending accordingly.
i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?
I apologize for the delay. The issue has been fixed in the latest version, 0.3.52
.
As for the reimbursement, I'm currently looking into it. Please allow me some time.
we're still seeing this issue as well in 0.3.52.
with the same logs: The request queue hasn't had activity for 300s, resetting internal state
I assume that this will be reimbursed as well
@reachable_mule
i have also had this issue - and the run consumed a large amount of $ credits. How do I request reimbursement?
I got the information that it was reimbursed. Note that you need to increase the platform limit (Billing > Platform limit) by the reimbursed amount to make use of the reimbursement.
@axstv I’ve checked your runs and noticed the log The request queue hasn't had activity for 300s, resetting internal state
, but this occurs only for a limited period of time. The other users faced it for hours.
The bigger issue seems to be with handling sitemaps. You're trying to retrieve two results, but the attempt to fetch the sitemaps is causing a block, and it takes around 15 minutes to actually start scraping.
I've created a new issue to track it. We’ll take a closer look. Please subscribe to the new issues. Let me close this issues and we will discuss everything in the new one.
In the meantime, if you don’t need to use sitemaps, could you retry with Consider URLs from Sitemaps
set to False?
I tried this and was able to get all the results in under 60 seconds.
Yes I will try that and thank you very much for the amazing support.
Actor Metrics
3.9k monthly users
-
711 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 17 days ago