Doctolib avatar
Doctolib

Pricing

$9.00/month + usage

Go to Store
Doctolib

Doctolib

Developed by

Anchor

Anchor

Maintained by Community

Scraping Doctolib is now super easy and cheap! Extract phones, names, contact, timings, image and addresses of medics, doctors, hospitals... Best part : you can even customize what info to extract from Doctolib!

1.0 (1)

Pricing

$9.00/month + usage

5

Total users

157

Monthly users

12

Runs succeeded

88%

Issues response

22 hours

Last modified

14 days ago

LE

Output only URL?

Closed

alexalexalexalex opened this issue
a month ago

I tried to scrape Doctolib based on the tutorial that you have provided, and unfortunately, no data is collected. If this is a UX issue, please update the tutorial. If this is a scraper issue, please fix that. Let me know, thanks.

anchor avatar

Anchor (anchor)

a month ago

Hello, Thanks for trying Doctolib scrapper. I am sorry that i did not work out of the box for you... let me help you with that. Can you provide your logs and actor run input so that I can reproduce on my side ? Thanks

LE

alexalexalexalex

a month ago

Sorry I dont understand where to get the logs so I just copy paste stuff:

LE

alexalexalexalex

a month ago

{ "hideSearchPages": true, "pageFunction": "async function pageFunction(context) {\n\n let data = {}\n let userData = context.request.userData\n data.url = context.request.url\n data.label = userData.label\n \n if(userData && userData.label === 'doctor'){ \n data.nom = await context.page.locator('#main-content h1').innerText({timeout:6000})\n data.tarif = await context.innerTextwrapper(context,'#payment_means')\n data.horaire_contact = await context.innerTextwrapper(context,'#openings_and_contact')\n data.description = await context.innerTextwrapper(context,'.dl-profile-bio')\n data.specialite = await context.innerTextwrapper(context,'.dl-profile-header-speciality')\n data.expertise = await context.innerTextwrapper(context,'#skills')\n try{\n data.phones = await context.getPhones(data.horaire_contact)\n }catch(e){\n context.log.info('Phones not found',e); \n }\n try{\n data.image = await context.page.locator('.dl-profile img').first().getAttribute('src',{timeout:2000})\n if(data.image.startsWith('/')){ data.image = 'https:' + data.image}\n }catch(e){\n context.log.info('Image not found',e); \n } \n \n }else{\n context.log.info('we are not on a doctor page: so a search or pagination page.');\n userData.label = 'doctor';\n const elements = context.page.locator('.search-result-card a[href]');\n const links = await elements.evaluateAll(elems => elems.map(elem => elem.getAttribute('href')));\n let extenstion = 'fr'\n if(context.request.url.includes('doctolib.de')){ extenstion = 'de' }\n if(context.request.url.includes('doctolib.it')){ extenstion = 'it' }\n links.forEach(async link => {\n if(link.startsWith('/')){ link = https://www.doctolib.${extenstion}${link} }\n await context.enqueueRequest(link, userData , true);\n })\n\n }\n context.log.info(ending this page now);\n delete data.label\n return data;\n}\n", "startUrls": [ { "url": "https://www.doctolib.de/privatklinik/stuttgart/lipoklinik?pid=practice-624868&phs=true&page=1&insurance_sector=public&highlight%5Bspeciality_ids%5D%5B%5D=1297", "method": "GET" }, { "url": "https://www.doctolib.de/hautarzt/stuttgart/sandra-teufel?pid=practice-228783&phs=true&page=1&index=1&insurance_sector=public", "method": "GET" }, { "url": "https://www.doctolib.de/plastische-und-asthetische-chirurgie/stuttgart/jens-neumann?pid=practice-561047&phs=true&page=1&index=2&insurance_sector=public", "method": "GET" }, { "url": "https://www.doctolib.de/plastische-und-asthetische-chirurgie/stuttgart/mirela-anghel-bota?pid=practice-578733&phs=true&page=1&index=3&insurance_sector=public", "method": "GET" } ] }

LE

alexalexalexalex

a month ago

Ah found it!

LE

alexalexalexalex

a month ago

It seems that I cannot upload the log file here.

LE

alexalexalexalex

a month ago

2025-06-04T09:12:08.857Z ACTOR: Pulling Docker image of build aPSuTUe6Z2JFtF9vM from registry. 2025-06-04T09:12:22.108Z ACTOR: Creating Docker container. 2025-06-04T09:12:22.306Z ACTOR: Starting Docker container. 2025-06-04T09:12:22.511Z Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp 2025-06-04T09:12:22.512Z Executing main command 2025-06-04T09:12:23.655Z INFO System info {"apifyVersion":"2.3.2","apifyClientVersion":"2.9.3","osType":"Linux","nodeVersion":"v16.20.2"} 2025-06-04T09:12:24.117Z INFO FingerprintInjector: Successfully initialized. 2025-06-04T09:12:24.119Z INFO Starting the crawl. 2025-06-04T09:12:24.171Z INFO PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}} 2025-06-04T09:12:31.601Z INFO Page opened, url:https://www.doctolib.de/privatklinik/stuttgart/lipoklinik?pid=practice-624868&phs=true&page=1&insurance_sector=public&highlight%5Bspeciality_ids%5D%5B%5D=1297 - label:undefined 2025-06-04T09:12:31.632Z INFO Page opened, url:https://www.doctolib.de/hautarzt/stuttgart/sandra-teufel?pid=practice-228783&phs=true&page=1&index=1&insurance_sector=public - label:undefined 2025-06-04T09:12:31.664Z INFO we are not on a doctor page: so a search or pagination page. 2025-06-04T09:12:31.684Z INFO we are not on a doctor page: so a search or pagination page. 2025-06-04T09:12:31.711Z INFO ending this page now 2025-06-04T09:12:31.781Z INFO ending this page now 2025-06-04T09:12:39.377Z INFO Page opened, url:https://www.doctolib.de/plastische-und-asthetische-chirurgie/stuttgart/mirela-anghel-bota?pid=practice-578733&phs=true&page=1&index=3&insurance_sector=public - label:undefined 2025-06-04T09:12:39.678Z INFO we are not on a doctor page: so a search or pagination page. 2025-06-04T09:12:40.085Z INFO Page opened, url:https://www.doctolib.de/plastische-und-asthetische-chirurgie/stuttgart/jens-neumann?pid=practice-561047&phs=true&page=1&index=2&insurance_sector=public - label:undefined 2025-06-04T09:12:40.166Z INFO ending this page now 2025-06-04T09:12:40.682Z INFO we are not on a doctor page: so a search or pagination page. 2025-06-04T09:12:40.783Z INFO ending this page now 2025-06-04T09:12:40.890Z INFO PlaywrightCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down. 2025-06-04T09:12:41.625Z INFO PlaywrightCrawler: Final request statistics: {"requestsFinished":4,"requestsFailed":0,"retryHistogram":[4],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":8099,"requestsFinishedPerMinute":14,"requestsFailedPerMinute":0,"requestTotalDurationMillis":32397,"requestsTotal":4,"crawlerRuntimeMillis":17512} 2025-06-04T09:12:41.625Z INFO Crawl finished.

LE

alexalexalexalex

a month ago

What I do want is a list with all the doctors and all the website domains scraped from the links that I have provided above. Unfortunately, this only works in about 10-15% of the entries. The majority of them don't provide any website link of the individual doctors.

anchor avatar

Anchor (anchor)

a month ago

Let me try to help you. I opened your first link : https://www.doctolib.de/privatklinik/stuttgart/lipoklinik?pid=practice-624868&phs=true&page=1&insurance_sector=public&highlight%5Bspeciality_ids%5D%5B%5D=1297

This page looks like a good doctor page indeed. What you want is extract the website "https://lipoklinik.com/" ? Do I understand correctly ?

anchor avatar

Anchor (anchor)

a month ago

I added a new column in the output in version 0.7 this morning, maybe it will fit your need. Also, in your INPUT, you are supposed to put "search urls" not doctors "urls". If you really want to put doctor urls, you will need to add "label":"doctor" in the userdata of your input url

anchor avatar

Anchor (anchor)

a month ago
anchor avatar

Anchor (anchor)

22 days ago

Closing this issue, feel free to reopen if necessary

anchor avatar

Anchor (anchor)

14 days ago

If you are happy with this Actor, can I ask you to rate it 5 ⭐ ? I would really appreciate :)