Leboncoin Scraper avatar
Leboncoin Scraper

Deprecated

Pricing

$30.00/month + usage

Go to Store
Leboncoin Scraper

Leboncoin Scraper

Deprecated

Developed by

Victor McDowell

Maintained by Community

Extremely fast Scraper that Extracts ads from leboncoin.fr

0.0 (0)

Pricing

$30.00/month + usage

1

Monthly users

1

Last modified

3 years ago

BS

Retrieve specific ads

Closed

BasileDataimo opened this issue
3 years ago

Hello Victor,

I tried to retrieve specific ads using your actor as discussed in our first conversation. I added an array of startUrls like :

1[
2  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2198189945.htm"},
3  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2201767813.htm"},
4  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2205212373.htm"},
5  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197876906.htm"},
6  {"url": "https://www.leboncoin.fr/ventes_immobilieres/2197875868.htm"},
7  ...
8]

But the results timed out after 1 hour. You can see the runs here if you have access :

I can lower the number of urls if this is the issue. But perhaps you will have better insights to identify the origin of the timeouts ?

Best resgards,

mcdowell avatar

Hi Basile, It appears that some of the ads have been disabled. I will issue an update that ignores ads that have been disabled instead of continually retrying.

mcdowell avatar

Hello Basile, I issued the update that skips disabled ads. Please confirm that the scraper works as expected before I close the issue.

Best regards,

BS

BasileDataimo

3 years ago

Hello Victor,

Sorry for the delay. This is better, my last run last 17 minutes instead of 1 hour timeout : https://console.apify.com/actors/1gq6JJBYQFbM7kbke/runs/Qc0rucuXXh3enWALm#log

But I only got 13 results over 50 urls asked. Looking at the logs, I understand that you detected some of the ads have been disabled, and some of the ads are in unknown state (timeout). Am I correct ?

1# Ad disabled example
22022-10-24T17:43:29.839Z ERROR BasicCrawler: [This ad is disabled] https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm [WebSocket is not open] Cette annonce est désactivée
3# Ad timeout example
42022-10-24T17:56:00.552Z ERROR BasicCrawler: Request failed and reached maximum retries. requestHandler timed out after 60 seconds (tCx1fKWrrmlNWEg). {"id":"tCx1fKWrrmlNWEg","url":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm","method":"GET","uniqueKey":"https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm"}

If I am right, I need to known if the missing ads are disabled our timeout. Would it be possible for you to add those ads in the results list with a specific status ? For example :

1[
2    {
3        "list_id": 2226162047,
4        "status": "disabled",
5        "url": "https://www.leboncoin.fr/ventes_immobilieres/2226162047.htm",
6    },
7    {
8        "list_id": 2226153374,
9        "status": "unknown",
10        "url": "https://www.leboncoin.fr/ventes_immobilieres/2226153374.htm",
11    },
12]

Best regards,

mcdowell avatar

Hi Basile, You are correct on your assessment. I will work on an update that returns the ads with their status ASAP.

BS

BasileDataimo

3 years ago

You're the best, thanks Victor !

mcdowell avatar

Thanks, Please confirm that the updated actor has fixed this issue. Best regards

BS

BasileDataimo

2 years ago

Hello Victor,

Sorry I confirm the fix, thank you !

Pricing

Pricing model

Rental 

To use this Actor, you have to pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period. You also pay for the Apify platform usage.

Free trial

3 days

Price

$30.00