Leboncoin Scraper avatar
Leboncoin Scraper
Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Leboncoin Scraper

Leboncoin Scraper

mcdowell/leboncoin-scraper

Extremely fast Scraper that Extracts ads from leboncoin.fr

User avatar

Retrieve specific ads

Closed

BasileDataimo opened this issue
2 years ago

Hello Victor,

I tried to retrieve specific ads using your actor as discussed in our first conversation. I added an array of startUrls like :

1[
2  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2198189945.htm"},
3  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2201767813.htm"},
4  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2205212373.htm"},
5  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197876906.htm"},
6  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197875868.htm"},
7  ...
8]

But the results timed out after 1 hour. You can see the runs here if you have access :

I can lower the number of urls if this is the issue. But perhaps you will have better insights to identify the origin of the timeouts ?

Best resgards,

User avatar

Hi Basile, It appears that some of the ads have been disabled. I will issue an update that ignores ads that have been disabled instead of continually retrying.

User avatar

Hello Basile, I issued the update that skips disabled ads. Please confirm that the scraper works as expected before I close the issue.

Best regards,

User avatar

BasileDataimo

2 years ago

Hello Victor,

Sorry for the delay. This is better, my last run last 17 minutes instead of 1 hour timeout : https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/Qc0rucuXXh3enWALm#log

But I only got 13 results over 50 urls asked. Looking at the logs, I understand that you detected some of the ads have been disabled, and some of the ads are in unknown state (timeout). Am I correct ?

1# Ad disabled example
22022-10-24T17:43:29.839Z ERROR BasicCrawler: [This ad is disabled] https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm [WebSocket is not open] Cette annonce est désactivée
3# Ad timeout example
42022-10-24T17:56:00.552Z ERROR BasicCrawler: Request failed and reached maximum retries. requestHandler timed out after 60 seconds (tCx1fKWrrmlNWEg). {"id":"tCx1fKWrrmlNWEg","url":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm","method":"GET","uniqueKey":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm"}

If I am right, I need to known if the missing ads are disabled our timeout. Would it be possible for you to add those ads in the results list with a specific status ? For example :

1[
2    {
3        "list_id": 2226162047,
4        "status": "disabled",
5        "url": "https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm",
6    },
7    {
8        "list_id": 2226153374,
9        "status": "unknown",
10        "url": "https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm",
11    },
12]

Best regards,

User avatar

Hi Basile, You are correct on your assessment. I will work on an update that returns the ads with their status ASAP.

User avatar

BasileDataimo

2 years ago

You're the best, thanks Victor !

User avatar

Thanks, Please confirm that the updated actor has fixed this issue. Best regards

User avatar

BasileDataimo

a year ago

Hello Victor,

Sorry I confirm the fix, thank you !

Developer
Maintained by Community