Extended GPT Scraper
No credit card required
Extended GPT Scraper
No credit card required
Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.
Do you want to learn more about this Actor?
Get a demoHello,
If you could have a look at our actors log, because we have around 3000 websites to crawl, it goes well on the begining and after a lot of errors appears when crawling very basic websites. I'm ok with the ssl errors, but not the websites that have no problems.
Could you please try to find tune your script to avoid so many retries of websites that load correctly ?
pwoChdGX9BndgCIrJ
Anyone here ?
Hi Hosting, thanks for opening this issue!
Yes, I can confirm that this is a bug in the scraper. Exactly as you say, this is a problem for large crawls.
Basically, the crawler tries to automatically scale up and down the requests concurrency, but it looks like for large enough crawls with a lot of memory, it scales too rapidly up and down periodically. We will investigate this and find a proper value for the scaling function so that it stays in an optimal speed.
Btw we will look into the SSL error, I think we could adjust the browser to ignore it, because that's most likely just a website security concern. Though I can't promise that, because it might not be possible to remove that from the browser.
I will keep you updated here, thanks!
ok thank you, keep us in touch please, more large crawls to come
Hi again, thanks for your patience!
We've just updated the scraper with both of the fixes :) It will now scale accordingly and ignore the HTTPS certificate errors for the broken websites. Note that it will scale up a little slower than previously, but at least it should not overshoot it anymore.
Try it out and let me know how it works, thanks!
Actor Metrics
79 monthly users
-
46 stars
>99% runs succeeded
5.8 days response time
Created in Jun 2023
Modified 6 days ago