Extended GPT Scraper avatar

Extended GPT Scraper

Try for free

No credit card required

View all Actors
Extended GPT Scraper

Extended GPT Scraper

drobnikj/extended-gpt-scraper
Try for free

No credit card required

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Do you want to learn more about this Actor?

Get a demo
CB

Some domains are disappearing from the results

Closed

cooperative_bureau opened this issue
a month ago

Hi I am uploading a list of domains to process, after the task completes I am only uploading results for a part of the domains and the rest are “lost”. Moreover, I load the unprocessed ones again with the next task and they are processed.

Here is an example with one domain that was not included in the report twice and the third time it was processed https://console.apify.com/actors/tasks/t12CcmeQfXMszMbIZ/runs/wtoXCHjOZ1aNSAbAq#output

Thanks!

CB

cooperative_bureau

a month ago

Is it possible to receive in the results records about domains that could not be processed? Personally, it would solve my problem =)

CB

cooperative_bureau

a month ago
CB

cooperative_bureau

a month ago
CB

cooperative_bureau

a month ago

If I now re-load these 182 failed domains into the job, half of them will be successfully processed.

CB

cooperative_bureau

a month ago

That is 1600 domains I am processing for the third time and need at least one more time

lukas.prusa avatar

Hi, thanks for opening this issue!

Yes, I agree adding this would make sense :) From what I understand, two things here would improve this situation:

  • Increasing or allow changing the max request retries setting (currently it's just 3)
  • Outputting failed items to output

Both of these would make sense :) We will investigate and discuss it with the team. I will keep you updated here, thanks!

CB

cooperative_bureau

a month ago

Lukáš, thank you. Then I'm not active for now, waiting for your message =)

paja avatar

Hi again, just a little update - this Actor is not really our top priority at the moment, but your suggestions are really good so we will implement them eventually. Just be patient with us, please :) We'll let you know once it's done!

lukas.prusa avatar

Hi again, I'm happy to inform you that we've just updated the scraper with the update :)

  • All failed pages will now be pushed to output
  • We've also improved handling of the GPT requests, though don't expect anything substantial. I've noticed that you were having some rate limiting problems in your run from OpenAI, which this should help with a little, although it's still not any rate limit management solution.

Let me know how it works now, thanks and happy scraping!

Developer
Maintained by Apify

Actor Metrics

  • 79 monthly users

  • 46 stars

  • >99% runs succeeded

  • 5.8 days response time

  • Created in Jun 2023

  • Modified 6 days ago