Similarweb Scraper avatar

Similarweb Scraper

Try for free

7 days trial then $15.00/month - No credit card required now

View all Actors
Similarweb Scraper

Similarweb Scraper

epctex/similarweb-scraper
Try for free

7 days trial then $15.00/month - No credit card required now

The most comprehensive Similarweb Scraper you will ever find. Obtain data on website popularity and receive it in formats such as JSON, XML, CSV, Excel, or an HTML table.

WO

There was an uncaught exception during the run of the Actor and it was not handled

Closed

world33 opened this issue
4 months ago

Please do not close this issue before checking properly the logs of the run. I encountered this error several times before reaching the monthly budget limit. I had to resurrect several times the actor. The monthly budget limit has nothing to do with this repeated error. Please check the logs carefully. Thanks

epctex avatar

epctex (epctex)

4 months ago

Hey there,

I passed the information to the Engineering Team, and they are investigating the root cause of the problem. Will let you know soon.

Best

epctex avatar

epctex (epctex)

4 months ago

Hey again,

Thank you very much for reaching out. It seems like the actor starts to bloat at a certain amount of time. Can you please try the actor with a lower amount of input and check how it goes? The actor already has the mechanism to retry, therefore it should be okay to kickstart the actor to initiate it with 1K input.

I'll keep the issue open until it is resolved. Best

WO

world33

4 months ago

Thanks for your reply. I have a dataset of around 13,700 urls. Do I have to try 1K at the time? Is there any way to clear the bloated actor or fine tune so it does not bloat for a larger dataset like mine?

epctex avatar

epctex (epctex)

4 months ago

Hey again,

A couple of weeks ago, it should have worked. Unfortunately, Similarweb integrated Captcha into their system where it is randomly popping up in front of the users. We can divert it as much as possible by mimicking the users with the help of the Residential Proxies. However, we do not currently have any other way around. Of course, the Engineering team is working (and will be working) on the problem. Maybe you can try out some external proxies like Brightdata's Unblocker Proxies. It should help in the long run like this.

The main problem is that these protection services start to identify the bot account in the long run and block the actor. That's why the actor bloats up. When this becomes in the OS-level detection, splitting the actor's run into multiple small chunks works. When it becomes on the proxy level, a larger proxy pool or special proxy pool is needed.

I hope this helps. Best

epctex avatar

epctex (epctex)

4 months ago

Closing the issue due to the stale state, but feel free to open a new one.

Developer
Maintained by Community

Actor Metrics

  • 22 monthly users

  • 3 stars

  • >99% runs succeeded

  • 18 hours response time

  • Created in Oct 2023

  • Modified 2 days ago

Categories