Pricing

$29.00/month + usage

Go to Store

seloger mass products scraper (by search URL) ⚡

Try for free

Developed by

Azzouzana

🔥Très simple! Entrez le lien vers la page de recherche et obtenir les résultats! ⚡ Extraire rapidement les infos détaillées sur les propriétés ( titre, description, photos, évaluations énergétique prix, contacts, transport et plus encore) à faible coût, avec exportation en JSON, CSV, HTML, EXCEL...

5.0 (1)

Pricing

$29.00/month + usage

Total users

Monthly users

Runs succeeded

53%

Issues response

2.6 days

Last modified

6 days ago

Real estate

Automation

Lead generation

Back to issues Create new issue

Start Issue

Closed

xo7 opened this issue

Hi it's seems since last version (0.0.172) actor crash at startup with my url :

https://www.seloger.com/list.htm?projects=2,5&types=2,12,11,1&natures=1,2,4&places=[{%22subDivisions%22:[%2275%22]}]&surface=NaN/45&sort=d_dt_crea&mandatorycommodities=0&enterprise=0&qsVersion=1.0&m=search_refine-redirection-search_results

with log : 2024-12-13T18:27:13.713Z ACTOR: Pulling Docker image of build jZtjIWkrVu7xa6wDJ from repository. 2024-12-13T18:27:14.412Z ACTOR: Creating Docker container. 2024-12-13T18:27:14.511Z ACTOR: Starting Docker container. 2024-12-13T18:27:18.781Z INFO System info {"apifyVersion":"3.2.6","apifyClientVersion":"2.10.0","crawleeVersion":"3.12.1","osType":"Linux","nodeVersion":"v20.18.1"} 2024-12-13T18:27:18.930Z bypassing bot protection... Please be patient :) 2024-12-13T18:28:16.408Z WARN Request: We've encountered a POST Request with a payload. This is fine. Just letting you know that if your requests point to the same URL and differ only in method and payload, you should see the "useExtendedUniqueKey" option of Request constructor. 2024-12-13T18:28:30.094Z pass through....

and finish without any result or query, can you fix that ?

thanks

Azzouzana (azzouzana)

Hi!

checking this...

Azzouzana (azzouzana)

Well, seems like Apify FR proxies were flagged and blocked by datadome. I'll set it up to use my own proxies (Which I pay for) in one hour and later will change it so Actor's users will input their own proxies as an input (something that I didn't aiming to keep it as simple as possible for users)

Azzouzana (azzouzana)

I have a question: I noticed that you're always trying to scrape all of the 7K listings. Is that needed? (I'm asking to understand your use case)

Azzouzana (azzouzana)

I'll update it to use more EU proxies origin countries and not only France, should help. Will let you know.

Azzouzana (azzouzana)

Should be up again! Could you please confirm? And please reach out to me and let's discuss your specific use-case and see whether improvements/adjustments could be made :) My Discord username is @azzouzana

jeremy.xo7

I try to fetch new announces with specific filters one Time by day

Thanks for your help I will test it

Azzouzana (azzouzana)

I can plan to work on a mode that would definitely help you out so the actor, based on previous executions outcome, would only scrape new listings & return delisted items

Azzouzana (azzouzana)

Hi, I’ve released the Delta Mode feature, which, based on a checkbox input, instructs the actor to return only new or delisted ads since its last run. To use it, please use the version 0.1. Test it out with a small listing count first and let me know how it works. Thank you!

xo7

I try new version but this fail with message : 2024-12-18T04:36:22.055Z Not paying user, only handling first 50 results. To get all results, please subscribe (can you check it ?)

Some points if this can help you in future :

an easy way to improve perf and avoid "caching" can be an input param to select a date, and only scrape with details "announce" after this date (this can be more efficient than fetch all)
other quick tips : limit the announce number to fetch

Thanks

Azzouzana (azzouzana)

Thanks a lot for the feedback.

I've just pushed an attempt for isPaying check, please let me know how it goes. (If you're a paid Apify user & you're still facing that, please let me know, most likely something with Apify platform but I believe it should be good). Now regardless, and to test the new monitoring mode, could you please try doing so with a listing that that less than 50 and let me know your feedback.

an easy way to improve perf and avoid "caching" can be an input param to select a date, and only scrape with details "announce" after this date (this can be more efficient than fetch all) => Thanks. definitely makes sense! Noted!
other quick tips : limit the announce number to fetch => I previously worked on this but it didn't work well with monitoring mode enabled, will have to think about this again. Probably they have to be mutually exclusive.

xo7

Hi,

I progress in my testing, it's seems better but I encounter error :

2024-12-20T01:29:01.023Z /usr/src/app/node_modules/@crawlee/core/storages/dataset.js:41
2024-12-20T01:29:01.026Z         throw new Error(`Data item${s}is too large (size: ${bytes} bytes, limit: ${limitBytes} bytes)`);
2024-12-20T01:29:01.028Z               ^
2024-12-20T01:29:01.030Z
2024-12-20T01:29:01.032Z Error: Data item is too large (size: 71529285 bytes, limit: 9436240 bytes)
2024-12-20T01:29:01.034Z     at checkAndSerialize (/usr/src/app/node_modules/@crawlee/core/storages/dataset.js:41:15)
2024-12-20T01:29:01.036Z     at Dataset.pushData (/usr/src/app/node_modules/@crawlee/core/storages/dataset.js:206:29)
2024-12-20T01:29:01.038Z     at Actor.pushData (/usr/src/app/node_modules/apify/actor.js:527:24)
2024-12-20T01:29:01.040Z     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2024-12-20T01:29:01.043Z     at async file:///usr/src/app/src/main.js:86:5

Can you catch this error and continue process ?

Best

Azzouzana (azzouzana)

Something with the dataset size limit. Please share the run and will check this first thing tomorrow. (Also, did you confirm monitoring is Ok with a not-so-large search results?)

xo7

You can find the run here : https://console.apify.com/organization/TzEYl4RGm5rKPyOU5/actors/dqFjeUv7Nrv7lRatk/runs/ZMRSNKZwQFCvDOmOs

Regarding documentation (if this can help) :

The size of the data is limited by the receiving API and therefore pushData() will only allow objects whose JSON representation is smaller than 9MB. When an array is passed, none of the included objects may be larger than 9MB, but the array itself may be of any size.

The function internally chunks the array into separate items and pushes them sequentially. The chunking process is stable (keeps order of data), but it does not provide a transaction safety mechanism. Therefore, in the event of an uploading error (after several automatic retries), the function's Promise will reject and the dataset will be left in a state where some of the items have already been saved to the dataset while other items from the source array were not. To overcome this limitation, the developer may, for example, read the last item saved in the dataset and re-attempt the save of the data from this item onwards to prevent duplicates.

Regarding monitoring mode with a small base it's seems to work has expected

Thanks

xo7

I think this is related as now you try to send 1 line with everything in "newsAds" (and for a big results set you reach 9MB limit)

I think you should use the same output as before (1 line by announce) and maybe add a "state": "new" or "state": "delisted" in the row, this will be more usefull to debug and check results in the console.

Azzouzana (azzouzana)

Thank for the feedback!

For the size limitation, that's definitely it. Will work on it this weekend.

xo7

hello any news ?

Azzouzana (azzouzana)

Hey 👋 I've worked on adjusting the delta mode, and there's a field "apify_monitoring_status" which signifies whether the ad is new or delisted. Could you test it out with a small listening and let me know. Thanks!

xo7

Hey, it's seems this work (yay) but I have an issue regarding monitoring mode.

Between each execution, it's seems monitoring mode detect all urls as "new" (and so crawl all list) can you share how you identify ad as "new", can you confirm if this is based on permalink without parameter ?

Thanks

xo7

If this can help you, it seems monitoring detection occur after fetching because at the end, I have a dataset output only with "new" and "delisted". Can you update code to only "deep scrape" if "new" only ?

Best

Azzouzana (azzouzana)

Hi!

Thanks for the feedback & catching that bug and thanks also for bearing up with me :) Definitely worth it. Working on it today. I'll let you know.

xo7

Hi,

Happy new year, I come back to you about the issue (can you fix it) ?

Thanks

xo7

It's seems ok now .. Thanks for you help

Azzouzana (azzouzana)

Hi & happy new year!

This was OK since last week but I forgot to follow up here. Thanks for your feedback! I'm closing this issue now.

Add comment

Pap.fr mass products scraper (by search URL)

azzouzana/pap-fr-mass-products-scraper-by-search-url

🔥Très simple! Entrez le lien vers la page de recherche pap.fr et obtenir les résultats! ⚡ Extraire rapidement les infos détaillées sur les propriétés ( titre, description, photos, évaluations énergétique prix, contacts, transport et plus encore) à faible coût, avec exportation en JSON, CSV, EXCEL..

Azzouzana

5.0

seloger mass products scraper (by ads URLs) ⚡

azzouzana/seloger-mass-products-scraper-by-items-urls

🔥 🔥Très simple! Entrez les URL des biens pour en obtenir les titres, descriptions, photos, évaluations énergétiques, prix, contacts, transports et plus encore ⚡ Extraire les biens depuis seloger.com rapidement avec les infos détaillées à faible coût, avec exportation en JSON, CSV, EXCEL, etc.

Azzouzana

5.0

Lefigaro immobilier mass products scraper (by ads URLs)

azzouzana/lefigaro-immobilier-mass-products-scraper-by-items-urls

🔥 Très simple! Entrez les URLs des biens lefigaro immobilier pour en obtenir les titres, descriptions, photos, évaluations énergétiques, prix, contacts et plus encore ⚡ Extraire les biens depuis seloger.com rapidement avec les infos détaillées à faible coût, avec exportation en JSON, CSV, EXCEL..

Azzouzana

5.0

Pap.fr mass products scraper (by ads URLs)

azzouzana/pap-fr-mass-products-scraper-by-items-urls

🔥 Très simple! Entrez les URLs des biens pap.fr pour en obtenir les titres, descriptions, photos, évaluations énergétiques, prix, contacts, transports et plus encore ⚡ Extraire les biens depuis seloger.com rapidement avec les infos détaillées à faible coût, avec exportation en JSON, CSV, EXCEL..

Azzouzana

5.0

SeLoger bureaux-commerces scraper (by search URL) 💙

azzouzana/seloger-bureaux-commerces-scraper-by-search-url

🔥 Scraping des annonces SeLoger bureaux-commerces ! ⚡ Rapide et économique, il permet d'extraire titres, descriptions, photos, prix, contacts, évaluation énergétique et bien plus. Exportez en JSON, CSV, HTML, EXCEL ouAPI. Entrez l’URL de la page de recherche et laissez-nous faire le reste ! 🚀

Azzouzana

5.0

Seloger.com Scraper

lexis-solutions/seloger-scraper

Scrape property listings from SeLoger—including locations, prices, features, and agent info. Ideal for real estate analytics, market research, and portal development. Fast, structured, and customizable extraction from France’s leading property platform.

Lexis Solutions

Bien ici Scraper 🏠

qpayre/bien-ici-scraper

🏠🔍 Maximize real estate insights with our Bien ici scraper actor! 📈 Extract Bien Ici data effortlessly, fueling smart decisions and unveiling market trends. Elevate your real estate game today. 🚀

QPS

Statut Titre De Sejour

saswave/statut-titre-de-sejour

Extraction des données de status de délivrance du titre de séjour. Premier demande et renouvellement de titre de séjour https://sso.anef.dgef.interieur.gouv.fr administration etrangers en france interieur gouv https://administration-etrangers-en-france.interieur.gouv.fr

SASWAVE

rdv-prefecture.interieur.gouv.fr Scraper

saswave/rdv-prefecture-interieur-gouv-fr-scraper

Email automatique d'alerte pour les créneaux disponible. Permet de ne pas rater la prise de rendez vous et ainsi renouveler le document français (titre de séjour, visa, pièce identité etc ..) Déclenchement régulier avec le Schedule de Apify

SASWAVE

Lefigaro Immobilier mass products scraper (by search URL)

azzouzana/lefigaro-immobilier-mass-products-scraper-by-search-url

🔥 Entrez le lien vers la page de recherche lefigaro immobilier et obtenir les résultats! ⚡ Extraire rapidement les infos détaillées sur les propriétés (titre, description, photos, évaluations énergétique prix, contacts, transport et plus encore) à faible coût, avec exportation en JSON, CSV, EXCEL..

Azzouzana

5.0