Leboncoin Scraper avatar
Leboncoin Scraper

Deprecated

Pricing

$30.00/month + usage

Go to Store
Leboncoin Scraper

Leboncoin Scraper

Deprecated

Developed by

Victor McDowell

Victor McDowell

Maintained by Community

Extremely fast Scraper that Extracts ads from leboncoin.fr

0.0 (0)

Pricing

$30.00/month + usage

1

Total users

41

Monthly users

1

Last modified

3 years ago

BS

Retrieve specific ads

Closed

BasileDataimo opened this issue
3 years ago

Hello Victor,

I tried to retrieve specific ads using your actor as discussed in our first conversation. I added an array of startUrls like :

[
{"url": "https://www.leboncoin.fr/ventes_immobilieres/2198189945.htm"},
{"url": "https://www.leboncoin.fr/ventes_immobilieres/2201767813.htm"},
{"url": "https://www.leboncoin.fr/ventes_immobilieres/2205212373.htm"},
{"url": "https://www.leboncoin.fr/ventes_immobilieres/2197876906.htm"},
{"url": "https://www.leboncoin.fr/ventes_immobilieres/2197875868.htm"},
...
]

But the results timed out after 1 hour. You can see the runs here if you have access :

I can lower the number of urls if this is the issue. But perhaps you will have better insights to identify the origin of the timeouts ?

Best resgards,

mcdowell avatar

Hi Basile, It appears that some of the ads have been disabled. I will issue an update that ignores ads that have been disabled instead of continually retrying.

mcdowell avatar

Hello Basile, I issued the update that skips disabled ads. Please confirm that the scraper works as expected before I close the issue.

Best regards,

BS

BasileDataimo

3 years ago

Hello Victor,

Sorry for the delay. This is better, my last run last 17 minutes instead of 1 hour timeout : https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/Qc0rucuXXh3enWALm#log

But I only got 13 results over 50 urls asked. Looking at the logs, I understand that you detected some of the ads have been disabled, and some of the ads are in unknown state (timeout). Am I correct ?

# Ad disabled example
2022-10-24T17:43:29.839Z ERROR BasicCrawler: [This ad is disabled] https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm [WebSocket is not open] Cette annonce est désactivée
# Ad timeout example
2022-10-24T17:56:00.552Z ERROR BasicCrawler: Request failed and reached maximum retries. requestHandler timed out after 60 seconds (tCx1fKWrrmlNWEg). {"id":"tCx1fKWrrmlNWEg","url":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm","method":"GET","uniqueKey":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm"}

If I am right, I need to known if the missing ads are disabled our timeout. Would it be possible for you to add those ads in the results list with a specific status ? For example :

[
{
"list_id": 2226162047,
"status": "disabled",
"url": "https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm",
},
{
"list_id": 2226153374,
"status": "unknown",
"url": "https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm",
},
]

Best regards,

mcdowell avatar

Hi Basile, You are correct on your assessment. I will work on an update that returns the ads with their status ASAP.

BS

BasileDataimo

3 years ago

You're the best, thanks Victor !

mcdowell avatar

Thanks, Please confirm that the updated actor has fixed this issue. Best regards

BS

BasileDataimo

3 years ago

Hello Victor,

Sorry I confirm the fix, thank you !