🩺 WebMD Doctor Scraper
Pricing
Pay per usage
🩺 WebMD Doctor Scraper
Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and data accuracy. To ensure smooth operation and prevent blocking, using residential proxies is highly recommended.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
WebMD Doctor Scraper
Extract physician and healthcare provider information from WebMD. The actor uses Playwright (Firefox) to bypass anti-bot, then extracts listings from embedded window.__INITIAL_STATE__ JSON first (fast) with an HTML parsing fallback.
Features
- Multiple Data Extraction Methods: Prioritizes embedded JSON state, falls back to intelligent HTML parsing
- JSON State First (Priority 1): Fast extraction via embedded
window.__INITIAL_STATE__JSON parse - Playwright Firefox: Solves JS/cookie challenges so API/HTML requests return real content
- Comprehensive Data Collection: Extracts names, specialties, contact information, locations, websites, and biographies
- Efficient Pagination: Automatically handles multi-page results with customizable limits
- Optional Detail Extraction: Fetch full provider profiles for in-depth information
- Structured Output: Consistent JSON schema for all extracted data
- High Performance: Concurrent requests with optimized session pool management
- Proxy Support: Full integration with Apify Proxy for reliable operation
Output Schema
Each doctor profile includes the following fields:
| Field | Type | Description |
|---|---|---|
name | string | Full name of the physician |
specialty | string | Medical specialty/field |
phone | string | Contact phone number |
address | string | Full address (street, city, state, zip) |
website | string | Provider website or practice URL |
bio | string | HTML-formatted biography/credentials |
bio_text | string | Plain text version of biography |
url | string | Direct link to doctor's profile on WebMD |
source | string | Data source identifier |
Example Output:
{"name": "Dr. John Smith, MD","specialty": "Family Medicine","phone": "(555) 123-4567","address": "123 Medical Plaza Drive, Springfield, IL 62701","website": "https://www.smithmedical.com","bio": "<p>Dr. Smith is a board-certified family medicine physician with 15 years of experience...</p>","bio_text": "Dr. Smith is a board-certified family medicine physician with 15 years of experience...","url": "https://doctor.webmd.com/providers/[provider-id]","source": "webmd.com"}
Quick Start
Basic Usage
The simplest way to get started is to use default settings:
{"specialty": "family-medicine","results_wanted": 50}
This will scrape the first 50 family medicine doctors from WebMD.
Advanced Configuration
For more control over scraping behavior:
{"specialty": "cardiology","location": "New York","results_wanted": 100,"max_pages": 10,"collectDetails": true,"useJsonApi": true,"proxyConfiguration": {"useApifyProxy": true}}
Configuration Options
Input Parameters
specialty (string, optional)
- Medical specialty to search for
- Default:
family-medicine - Examples:
cardiology,dermatology,pediatrics,neurology - Use URL slug format (hyphens between words)
location (string, optional)
- Geographic location filter
- Default: Empty (searches nationwide)
- Examples:
New York,California,Texas - Leave empty for all locations
results_wanted (integer, optional)
- Maximum number of doctor profiles to extract
- Default:
50 - Minimum:
1 - Maximum recommended:
500
max_pages (integer, optional)
- Safety limit on number of listing pages to process
- Default:
5 - Minimum:
1 - Each page typically contains 10-20 profiles
collectDetails (boolean, optional)
- Whether to visit individual doctor profiles for complete information
- Default:
true - If
false: Returns only profile URLs without detailed data
The actor uses embedded window.__INITIAL_STATE__ JSON first and can run an HTML fallback internally (built-in defaults; not exposed as actor inputs).
maxConcurrency (integer, optional)
- Maximum number of parallel requests
- Default:
10
debug (boolean, optional)
- Saves small debug artifacts to Key-Value Store when blocked or when API JSON is invalid
- Default:
false
startUrl / startUrls (string/array, optional)
- Custom WebMD search URLs to start from
- Overrides
specialtyandlocationparameters - Example:
https://doctor.webmd.com/providers/specialty/family-medicine
proxyConfiguration (object, optional)
- Proxy settings for requests
- Recommended: Use Apify Proxy (
useApifyProxy: true) - Improves reliability and helps avoid rate limiting
How It Works
Scraping Process
- Initialization: Loads your input configuration
- Listing Extraction (Priority 1): Parses embedded
window.__INITIAL_STATE__JSON to extract providers - Detail Collection (optional): Fetches provider pages for JSON-LD/HTML enrichment
- HTML Fallback (optional): Uses HTML parsing when JSON extraction yields no results
- Storage: Saves all extracted data to the Apify Dataset
Data Extraction Methods
The actor employs a multi-tier approach for maximum data quality:
1. Embedded JSON State (Priority 1)
- Parses
window.__INITIAL_STATE__for fast, stable listing extraction - Avoids brittle CSS selector scraping for search pages
2. JSON-LD Extraction (Priority 2)
- Extracts schema.org
Physiciandata from provider detail pages (when present)
3. HTML Parsing (Priority 3)
- Intelligent fallback to CSS selectors
- Searches multiple class names and attribute patterns
- Handles variations in page markup
Common Use Cases
Case 1: Regional Doctor Search
Search for all pediatricians in a specific state:
{"specialty": "pediatrics","location": "California","results_wanted": 100,"max_pages": 10}
Case 2: Quick Verification
Get just the profile URLs without details for quick verification:
{"specialty": "dermatology","results_wanted": 25,"collectDetails": false}
Case 3: Comprehensive Research
Extract detailed profiles for all specialists in a region:
{"specialty": "neurology","location": "New York","results_wanted": 200,"max_pages": 20,"collectDetails": true,"proxyConfiguration": {"useApifyProxy": true}}
Best Practices
Optimal Settings
For Small Datasets (< 100 profiles)
- Set
max_pages: 3-5 - Use
results_wanted: 50-100 - Enable proxy for reliability
For Large Datasets (100-500+ profiles)
- Set
max_pages: 10-20 - Use proxy configuration (recommended)
- Increase actor memory if needed
For Production Use
- Always use Apify Proxy (
useApifyProxy: true) - Set reasonable
results_wantedlimits - Monitor actor logs for errors
- Test with small batches first
Performance Tips
- Use Proxies: Apify Proxy prevents rate limiting and improves stability
- Set Realistic Limits: Balance between data completeness and runtime
- Enable Details Selectively: Detail scraping is thorough but slower
- Monitor Resources: Watch memory usage during execution
- Batch Large Requests: Split very large searches into multiple runs
Troubleshooting
Common Issues
Issue: Limited results returned
- Solution: Increase
max_pagesvalue - Check specialty name spelling
- Verify location parameter format
Issue: “Just a moment…” / blocked responses
- Solution: Use Apify Proxy (Residential recommended) and reduce
maxConcurrency - Enable
debug: trueto store small blocked-response snippets in Key-Value Store
Issue: Slow performance
- Solution: Reduce
results_wantedormax_pages - Enable proxy for concurrent optimization
- Increase actor memory allocation
Issue: Missing data fields
- Solution: Ensure
collectDetails: true - Check if WebMD page structure changed
- Verify proxy connectivity
Issue: Actor timeout
- Solution: Reduce
max_pagesorresults_wanted - Increase
requestTimeoutSecsin actor.json - Use proxy to improve response times
Output Dataset
All results are saved to an Apify Dataset with the following characteristics:
- Format: JSON
- Records: Individual doctor profiles
- Sorting: By discovery order
- Deduplication: Automatic (unique URLs)
Accessing Results
Results can be downloaded in multiple formats:
- JSON (native format)
- CSV (for spreadsheet analysis)
- XML (for integration)
- JSONL (for streaming)
Data Quality & Compliance
- Source Verification: All data extracted directly from public WebMD pages
- Rate Limiting: Respects WebMD's terms of service with appropriate delays
- Data Consistency: Validated against schema before storage
- Error Handling: Robust error management with detailed logging
- Privacy: No PII collection beyond publicly available information
Compatibility
- Target: WebMD Doctor Directory (doctor.webmd.com)
- Browser: Required (Playwright Firefox)
- JavaScript: Handles both static and dynamically-loaded content
- Encoding: Full UTF-8 support
Input Template
Save this as INPUT.json for easy reuse:
{"specialty": "family-medicine","location": "","results_wanted": 50,"max_pages": 5,"collectDetails": true,}
Rate Limits
- Concurrent Requests: 10 simultaneous connections
- Request Timeout: 60 seconds per request
- Retry Attempts: Up to 3 retries on failure
- Session Pool: Automatic session rotation for reliability
Version History
v2.0.0 (2025-12-13)
- Complete conversion from jobs scraper to doctor scraper
- WebMD-specific selectors and data extraction
- Enhanced error handling and logging
- Improved pagination logic
- Added JSON-LD extraction support
- Better performance and reliability
Support
For issues, feature requests, or questions, please refer to the Apify documentation or contact support.
Last Updated: December 13, 2025
Scraper Version: 2.0.0
Target Website: WebMD Doctor Directory