Doctolib avatar

Doctolib

Try for free

3 days trial then $9.00/month - No credit card required now

Go to Store
Doctolib

Doctolib

anchor/doctolib
Try for free

3 days trial then $9.00/month - No credit card required now

Scraping Doctolib is now super easy and cheap! Extract phones, names, contact, timings, image and addresses of medics, doctors, hospitals... Best part : you can even customize what info to extract from Doctolib!

SE

Scraping not complete

Closed

serjio opened this issue
4 months ago

Hi! I used your actor from Apify to scrape doctors from Doctolib.fr but ran into a problem: with a browser, this search (https://www.doctolib.fr/medecin-generaliste/france?language=16) returns 81 results, but the scraper returns only 20 (the log says 21, but the first string is empty). See the log attached. Could you please suggest what could be the source of the problem? Or is it due to the site's protection from scrapers?

SE

serjio

4 months ago

After the fix, the scraper found 30 pages but saved 0 results (Timeout error), see the log attached

anchor avatar

Anchor (anchor)

4 months ago

Thanks for your issue here :)

There is one thing you might try : reset the "pageFunction" to the default value. Let me know if this fixes it. What I think that causes the problem is that you may have been updated to the version 0.5 of the Actor but it kept your last INPUT. Since I made changes to the pagefunction, it needs to be updated as well Or if you prefer, here is the JSON version you can use as the INPUT :

{ "hideSearchPages": true, "maxPagesPerCrawl": 90, "pageFunction": "async function pageFunction(context) {\n\n let data = {}\n let userData = context.request.userData\n data.url = context.request.url\n data.label = userData.label\n \n if(userData && userData.label === 'doctor'){ \n data.nom = await context.page.locator('#main-content h1').innerText({timeout:6000})\n data.tarif = await context.innerTextwrapper(context,'#payment_means')\n data.horaire_contact = await context.innerTextwrapper(context,'#openings_and_contact')\n data.description = await context.innerTextwrapper(context,'.dl-profile-bio')\n data.specialite = await context.innerTextwrapper(context,'.dl-profile-header-speciality')\n data.expertise = await context.innerTextwrapper(context,'#skills')\n try{\n data.phones = await context.getPhones(data.horaire_contact)\n }catch(e){\n context.log.info('Phones not found',e); \n }\n try{\n data.image = await cont... [trimmed]

anchor avatar

Anchor (anchor)

4 months ago

Guessing this worked so closing the issue. feel free to reopen if necessary

Developer
Maintained by Community

Actor Metrics

  • 9 monthly users

  • 4 stars

  • >99% runs succeeded

  • Created in Jul 2022

  • Modified 21 days ago

Categories