🩺 WebMD Doctor Scraper avatar
🩺 WebMD Doctor Scraper

Pricing

Pay per usage

Go to Apify Store
🩺 WebMD Doctor Scraper

🩺 WebMD Doctor Scraper

Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and data accuracy. To ensure smooth operation and prevent blocking, using residential proxies is highly recommended.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

WebMD Doctor Scraper

Extract physician and healthcare provider information from WebMD. The actor uses Playwright (Firefox) to bypass anti-bot, then extracts listings from embedded window.__INITIAL_STATE__ JSON first (fast) with an HTML parsing fallback.

Features

  • Multiple Data Extraction Methods: Prioritizes embedded JSON state, falls back to intelligent HTML parsing
  • JSON State First (Priority 1): Fast extraction via embedded window.__INITIAL_STATE__ JSON parse
  • Playwright Firefox: Solves JS/cookie challenges so API/HTML requests return real content
  • Comprehensive Data Collection: Extracts names, specialties, contact information, locations, websites, and biographies
  • Efficient Pagination: Automatically handles multi-page results with customizable limits
  • Optional Detail Extraction: Fetch full provider profiles for in-depth information
  • Structured Output: Consistent JSON schema for all extracted data
  • High Performance: Concurrent requests with optimized session pool management
  • Proxy Support: Full integration with Apify Proxy for reliable operation

Output Schema

Each doctor profile includes the following fields:

FieldTypeDescription
namestringFull name of the physician
specialtystringMedical specialty/field
phonestringContact phone number
addressstringFull address (street, city, state, zip)
websitestringProvider website or practice URL
biostringHTML-formatted biography/credentials
bio_textstringPlain text version of biography
urlstringDirect link to doctor's profile on WebMD
sourcestringData source identifier

Example Output:

{
"name": "Dr. John Smith, MD",
"specialty": "Family Medicine",
"phone": "(555) 123-4567",
"address": "123 Medical Plaza Drive, Springfield, IL 62701",
"website": "https://www.smithmedical.com",
"bio": "<p>Dr. Smith is a board-certified family medicine physician with 15 years of experience...</p>",
"bio_text": "Dr. Smith is a board-certified family medicine physician with 15 years of experience...",
"url": "https://doctor.webmd.com/providers/[provider-id]",
"source": "webmd.com"
}

Quick Start

Basic Usage

The simplest way to get started is to use default settings:

{
"specialty": "family-medicine",
"results_wanted": 50
}

This will scrape the first 50 family medicine doctors from WebMD.

Advanced Configuration

For more control over scraping behavior:

{
"specialty": "cardiology",
"location": "New York",
"results_wanted": 100,
"max_pages": 10,
"collectDetails": true,
"useJsonApi": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Configuration Options

Input Parameters

specialty (string, optional)

  • Medical specialty to search for
  • Default: family-medicine
  • Examples: cardiology, dermatology, pediatrics, neurology
  • Use URL slug format (hyphens between words)

location (string, optional)

  • Geographic location filter
  • Default: Empty (searches nationwide)
  • Examples: New York, California, Texas
  • Leave empty for all locations

results_wanted (integer, optional)

  • Maximum number of doctor profiles to extract
  • Default: 50
  • Minimum: 1
  • Maximum recommended: 500

max_pages (integer, optional)

  • Safety limit on number of listing pages to process
  • Default: 5
  • Minimum: 1
  • Each page typically contains 10-20 profiles

collectDetails (boolean, optional)

  • Whether to visit individual doctor profiles for complete information
  • Default: true
  • If false: Returns only profile URLs without detailed data

The actor uses embedded window.__INITIAL_STATE__ JSON first and can run an HTML fallback internally (built-in defaults; not exposed as actor inputs).

maxConcurrency (integer, optional)

  • Maximum number of parallel requests
  • Default: 10

debug (boolean, optional)

  • Saves small debug artifacts to Key-Value Store when blocked or when API JSON is invalid
  • Default: false

startUrl / startUrls (string/array, optional)

  • Custom WebMD search URLs to start from
  • Overrides specialty and location parameters
  • Example: https://doctor.webmd.com/providers/specialty/family-medicine

proxyConfiguration (object, optional)

  • Proxy settings for requests
  • Recommended: Use Apify Proxy (useApifyProxy: true)
  • Improves reliability and helps avoid rate limiting

How It Works

Scraping Process

  1. Initialization: Loads your input configuration
  2. Listing Extraction (Priority 1): Parses embedded window.__INITIAL_STATE__ JSON to extract providers
  3. Detail Collection (optional): Fetches provider pages for JSON-LD/HTML enrichment
  4. HTML Fallback (optional): Uses HTML parsing when JSON extraction yields no results
  5. Storage: Saves all extracted data to the Apify Dataset

Data Extraction Methods

The actor employs a multi-tier approach for maximum data quality:

1. Embedded JSON State (Priority 1)

  • Parses window.__INITIAL_STATE__ for fast, stable listing extraction
  • Avoids brittle CSS selector scraping for search pages

2. JSON-LD Extraction (Priority 2)

  • Extracts schema.org Physician data from provider detail pages (when present)

3. HTML Parsing (Priority 3)

  • Intelligent fallback to CSS selectors
  • Searches multiple class names and attribute patterns
  • Handles variations in page markup

Common Use Cases

Search for all pediatricians in a specific state:

{
"specialty": "pediatrics",
"location": "California",
"results_wanted": 100,
"max_pages": 10
}

Case 2: Quick Verification

Get just the profile URLs without details for quick verification:

{
"specialty": "dermatology",
"results_wanted": 25,
"collectDetails": false
}

Case 3: Comprehensive Research

Extract detailed profiles for all specialists in a region:

{
"specialty": "neurology",
"location": "New York",
"results_wanted": 200,
"max_pages": 20,
"collectDetails": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Best Practices

Optimal Settings

For Small Datasets (< 100 profiles)

  • Set max_pages: 3-5
  • Use results_wanted: 50-100
  • Enable proxy for reliability

For Large Datasets (100-500+ profiles)

  • Set max_pages: 10-20
  • Use proxy configuration (recommended)
  • Increase actor memory if needed

For Production Use

  • Always use Apify Proxy (useApifyProxy: true)
  • Set reasonable results_wanted limits
  • Monitor actor logs for errors
  • Test with small batches first

Performance Tips

  1. Use Proxies: Apify Proxy prevents rate limiting and improves stability
  2. Set Realistic Limits: Balance between data completeness and runtime
  3. Enable Details Selectively: Detail scraping is thorough but slower
  4. Monitor Resources: Watch memory usage during execution
  5. Batch Large Requests: Split very large searches into multiple runs

Troubleshooting

Common Issues

Issue: Limited results returned

  • Solution: Increase max_pages value
  • Check specialty name spelling
  • Verify location parameter format

Issue: “Just a moment…” / blocked responses

  • Solution: Use Apify Proxy (Residential recommended) and reduce maxConcurrency
  • Enable debug: true to store small blocked-response snippets in Key-Value Store

Issue: Slow performance

  • Solution: Reduce results_wanted or max_pages
  • Enable proxy for concurrent optimization
  • Increase actor memory allocation

Issue: Missing data fields

  • Solution: Ensure collectDetails: true
  • Check if WebMD page structure changed
  • Verify proxy connectivity

Issue: Actor timeout

  • Solution: Reduce max_pages or results_wanted
  • Increase requestTimeoutSecs in actor.json
  • Use proxy to improve response times

Output Dataset

All results are saved to an Apify Dataset with the following characteristics:

  • Format: JSON
  • Records: Individual doctor profiles
  • Sorting: By discovery order
  • Deduplication: Automatic (unique URLs)

Accessing Results

Results can be downloaded in multiple formats:

  • JSON (native format)
  • CSV (for spreadsheet analysis)
  • XML (for integration)
  • JSONL (for streaming)

Data Quality & Compliance

  • Source Verification: All data extracted directly from public WebMD pages
  • Rate Limiting: Respects WebMD's terms of service with appropriate delays
  • Data Consistency: Validated against schema before storage
  • Error Handling: Robust error management with detailed logging
  • Privacy: No PII collection beyond publicly available information

Compatibility

  • Target: WebMD Doctor Directory (doctor.webmd.com)
  • Browser: Required (Playwright Firefox)
  • JavaScript: Handles both static and dynamically-loaded content
  • Encoding: Full UTF-8 support

Input Template

Save this as INPUT.json for easy reuse:

{
"specialty": "family-medicine",
"location": "",
"results_wanted": 50,
"max_pages": 5,
"collectDetails": true,
}

Rate Limits

  • Concurrent Requests: 10 simultaneous connections
  • Request Timeout: 60 seconds per request
  • Retry Attempts: Up to 3 retries on failure
  • Session Pool: Automatic session rotation for reliability

Version History

v2.0.0 (2025-12-13)

  • Complete conversion from jobs scraper to doctor scraper
  • WebMD-specific selectors and data extraction
  • Enhanced error handling and logging
  • Improved pagination logic
  • Added JSON-LD extraction support
  • Better performance and reliability

Support

For issues, feature requests, or questions, please refer to the Apify documentation or contact support.


Last Updated: December 13, 2025
Scraper Version: 2.0.0
Target Website: WebMD Doctor Directory