This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?
See alternative ActorsLeboncoin Scraper
Extremely fast Scraper that Extracts ads from leboncoin.fr
Hello Victor,
I tried to retrieve specific ads using your actor as discussed in our first conversation. I added an array of startUrls like :
1[ 2 {"url": "https://www.leboncoin.fr/ventes_immobilieres/2198189945.htm"}, 3 {"url": "https://www.leboncoin.fr/ventes_immobilieres/2201767813.htm"}, 4 {"url": "https://www.leboncoin.fr/ventes_immobilieres/2205212373.htm"}, 5 {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197876906.htm"}, 6 {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197875868.htm"}, 7 ... 8]
But the results timed out after 1 hour. You can see the runs here if you have access :
- 2022-10-10 17:34 - https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/itbXMsyZ1xxLazbIc#log
- 2022-10-10 17:34 - https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/kILUxgEEGj30qu12o#log
I can lower the number of urls if this is the issue. But perhaps you will have better insights to identify the origin of the timeouts ?
Best resgards,
Hi Basile, It appears that some of the ads have been disabled. I will issue an update that ignores ads that have been disabled instead of continually retrying.
Hello Basile, I issued the update that skips disabled ads. Please confirm that the scraper works as expected before I close the issue.
Best regards,
Hello Victor,
Sorry for the delay. This is better, my last run last 17 minutes instead of 1 hour timeout : https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/Qc0rucuXXh3enWALm#log
But I only got 13 results over 50 urls asked. Looking at the logs, I understand that you detected some of the ads have been disabled, and some of the ads are in unknown state (timeout). Am I correct ?
1# Ad disabled example 22022-10-24T17:43:29.839Z ERROR BasicCrawler: [This ad is disabled] https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm [WebSocket is not open] Cette annonce est désactivée 3# Ad timeout example 42022-10-24T17:56:00.552Z ERROR BasicCrawler: Request failed and reached maximum retries. requestHandler timed out after 60 seconds (tCx1fKWrrmlNWEg). {"id":"tCx1fKWrrmlNWEg","url":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm","method":"GET","uniqueKey":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm"}
If I am right, I need to known if the missing ads are disabled our timeout. Would it be possible for you to add those ads in the results list with a specific status ? For example :
1[ 2 { 3 "list_id": 2226162047, 4 "status": "disabled", 5 "url": "https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm", 6 }, 7 { 8 "list_id": 2226153374, 9 "status": "unknown", 10 "url": "https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm", 11 }, 12]
Best regards,
Hi Basile, You are correct on your assessment. I will work on an update that returns the ads with their status ASAP.
You're the best, thanks Victor !
Thanks, Please confirm that the updated actor has fixed this issue. Best regards
Hello Victor,
Sorry I confirm the fix, thank you !