Similarweb Quick Scraper avatar

Similarweb Quick Scraper

Try for free

Pay $10.00 for 1,000 results

Go to Store
Similarweb Quick Scraper

Similarweb Quick Scraper

mscraper/similarweb-quick-scraper
Try for free

Pay $10.00 for 1,000 results

A quick scraper for Similarweb. Get needed data instantly for domains of your choice. Export accumulated data into formats such as HTML, JSON, or Excel.

WO

Failed urls

Closed

world33 opened this issue
a year ago

Hello,

  1. Are the failed URLs inevitably occuring in each actor run due to bad proxies and the strong bot mitigation protections similarweb is adopting?
  2. Is there any way to rerun the actor only with the failed URLs once a a run is concluded? Or is there a way to download also the failed urls to easily and quickly identify them and start a new run only with those? It is very time consuming to identify the failed urls otherwise. Maybe when downloading the results it would be possible to add a field called Status where it says Succeded or Failed for each URLs initially added to the input field. At the moment only the succeded URLs are exported if I am not wrong. Thanks
mscraper avatar
  1. Now, they only block IPs. I recommend to try Oxylabs or Brightdata, and pass their proxy string as a custom proxy.
  2. I will save all failed keywords to the key-value store to simplify the process. I can't add it to the dataset because it will cost the same as a successful result.

I will let you know here when I add it.

mscraper avatar

Ok, now an actor saves the failed-websites.json file to the key-value store if any websites fail. I added it as the last step of the actor before the end.

WO

world33

a year ago

Thank you very much!

Developer
Maintained by Community

Actor Metrics

  • 35 monthly users

  • 15 stars

  • >99% runs succeeded

  • Created in Jun 2023

  • Modified 4 months ago

Categories