馃敟 Linkedin Companies & Profiles Bulk Scraper avatar
馃敟 Linkedin Companies & Profiles Bulk Scraper
Try for free

2 days trial then $29.00/month - No credit card required now

View all Actors
馃敟 Linkedin Companies & Profiles Bulk Scraper

馃敟 Linkedin Companies & Profiles Bulk Scraper

bebity/linkedin-premium-actor
Try for free

2 days trial then $29.00/month - No credit card required now

Companies & Profiles Linkedin scraper. Get comprehensive profiles of individuals and companies based on your keywords and filters. Unleash the power of data! 馃寪馃攳

User avatar

Duplicated URLs - not resolved

Open

bronto_vidar opened this issue
5 months ago

Hello, the duplicated urls issue hasn't been resolved even though it's closed. In the linked run, there has been exactly 29k unique and valid URLs. We had to kill the run when the run was at 31k and still going. As well as having multiple URLs duplicated in the output.

This issues is becoming really annoying as the run has been running for more than 5 hours and cost us money again to get to the point that we need to kill it and run it again as multiple times in the past already...

User avatar

influential_hoopoe

5 months ago

Hello,

Please contact us at contact@bebity.io with the sample input + results to get this issue. We have some limitations due to Apify and we therefore are not able to provide 100% guaranty that the scraper will fully match your needs. We are currently the only solution for a 30$ scraper able to scrap linkedin profiles and company without connection required and no daily limits. If you need 99.9% uptime or very high quality + quantity results, please contact us for custom solutions.

User avatar

bronto_vidar

5 months ago

The run has been added in this issue and contains both the input and the results. The results alone are too big to be sending it through an email (~380MB) so please use the provided run in this issue. Unfortunately the issue doesn't happen every time it runs, usually retrying the job with the exact same data helps (without changing anything), but that's not a feasible solution as the original one needs to be aborted and run again -> more than twice the cost on the CPU usage.

User avatar

influential_hoopoe

5 months ago

Thanks. We tried with your input and aren't able to reproduce the issue.To assist you further, could you please let us know how frequently this problem occurs? Any additional details you can provide will be helpful in diagnosing and resolving the issue.

User avatar

bronto_vidar

5 months ago

It just happened again, restarting this original run (connected to this issue) didn't fix it this time. I can't say how often that happens as we would have to implement monitoring for that on our side to compare the number. But it's usually that we spot the issue on longer running jobs where we see that the input count of the URLs and the output count of the results do not match in the ratio where the output results count is higher than the input urls count and having multiple duplicated urls in the results. By the way I had to kill the run for the second time today with each instance running over 5h and $5 each on the resources cost (just want to point out that this issue just today cost us 1/3 of the monthly rent..).

User avatar

boundless_hood

5 months ago

Hey we've got the same issues They are running again and again until time out We are tryong to monitor to put the good time out based on the volume of inputs, but it's generated a lot of duplicates :/

Developer
Maintained by Community
Actor metrics
  • 231 monthly users
  • 98.5% runs succeeded
  • 15.0 days response time
  • Created in Jul 2023
  • Modified 20 days ago