Similarweb Scraper
7 days trial then $15.00/month - No credit card required now
Similarweb Scraper
7 days trial then $15.00/month - No credit card required now
The most comprehensive Similarweb Scraper you will ever find. Obtain data on website popularity and receive it in formats such as JSON, XML, CSV, Excel, or an HTML table.
Please do not close this issue before checking properly the logs of the run. I encountered this error several times before reaching the monthly budget limit. I had to resurrect several times the actor. The monthly budget limit has nothing to do with this repeated error. Please check the logs carefully. Thanks
Hey there,
I passed the information to the Engineering Team, and they are investigating the root cause of the problem. Will let you know soon.
Best
Hey again,
Thank you very much for reaching out. It seems like the actor starts to bloat at a certain amount of time. Can you please try the actor with a lower amount of input and check how it goes? The actor already has the mechanism to retry, therefore it should be okay to kickstart the actor to initiate it with 1K input.
I'll keep the issue open until it is resolved. Best
Thanks for your reply. I have a dataset of around 13,700 urls. Do I have to try 1K at the time? Is there any way to clear the bloated actor or fine tune so it does not bloat for a larger dataset like mine?
Hey again,
A couple of weeks ago, it should have worked. Unfortunately, Similarweb integrated Captcha into their system where it is randomly popping up in front of the users. We can divert it as much as possible by mimicking the users with the help of the Residential Proxies. However, we do not currently have any other way around. Of course, the Engineering team is working (and will be working) on the problem. Maybe you can try out some external proxies like Brightdata's Unblocker Proxies. It should help in the long run like this.
The main problem is that these protection services start to identify the bot account in the long run and block the actor. That's why the actor bloats up. When this becomes in the OS-level detection, splitting the actor's run into multiple small chunks works. When it becomes on the proxy level, a larger proxy pool or special proxy pool is needed.
I hope this helps. Best
Closing the issue due to the stale state, but feel free to open a new one.