CMS Hospital Price Transparency Scraper
Pricing
Pay per event
CMS Hospital Price Transparency Scraper
Extract hospital standard charges from CMS-mandated machine-readable files (MRF). Parses CMS v1/v2/v3 JSON schemas into rows by billing code (CPT, HCPCS, MS-DRG, NDC) and payer/plan. Fetches hospital identity from the CMS enrollment dataset. Filter by state, CCN, code type, billing code, or payer.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrapes hospital standard charges from machine-readable files mandated by the CMS Hospital Price Transparency rule. Parses CMS JSON schemas (v1, v2, v3) into structured rows by billing code — CPT, HCPCS, MS-DRG, Revenue Code, NDC — payer, and plan. Also retrieves hospital identity data for ~6,000 US hospitals from the CMS enrollment dataset.
CMS Hospital Price Transparency Scraper Features
- Parses CMS JSON MRF files (v1/v2 flat schema and v3 nested schema) from any user-supplied URL
- Auto-detects schema version — no configuration needed
- Returns negotiated rates by payer and plan: dollar amounts, percentages, and algorithm descriptions
- Returns gross charges, cash/self-pay prices, and de-identified min/max rates
- Fetches hospital identity records (CCN, NPI, address) from the CMS enrollment dataset
- Filters by billing code type (CPT, HCPCS, MS-DRG, RC, NDC), specific billing code, payer name substring, or state
- Three modes:
mrf_parsefor a single file URL,hospital_listfor enrollment data,discover_and_parsefor the combined pipeline - No proxy required — CMS and GitHub are open public APIs
Who Uses CMS Hospital Price Transparency Data?
- Healthcare price-comparison startups — Build comparison tools on top of actual negotiated rates across hospitals and payers
- Employers and self-funded health plans — Compare in-network rates to negotiate better contracts or choose preferred networks
- Benefits consultants and brokers — Analyze payer/plan rate variation for clients by procedure code
- Journalists and researchers — Track compliance, investigate pricing disparities, publish hospital cost analyses
- Healthcare data vendors — Supplement CMS enrollment records with charge data to build comprehensive hospital intelligence datasets
- Academic institutions — Study pricing patterns across regions, facility types, and payer mixes
How CMS Hospital Price Transparency Scraper Works
- Pick a mode.
mrf_parsetakes a single MRF URL and parses it.hospital_listpages through the CMS enrollment dataset and returns hospital identity records.discover_and_parsedoes both. - For MRF parsing, the scraper fetches the JSON file and auto-detects whether it uses the v3 nested schema (with
standard_charges[]andpayers_information[]) or the v1/v2 flat schema. It handles both without any configuration. - Apply optional filters — billing code type, specific code, payer name substring — and the scraper applies them during parsing so only matching rows reach the output.
- Results land in the Apify dataset as structured JSON. One row per payer/plan combination per billing code per service setting, which is as granular as the CMS standard requires.
Input
{"mode": "mrf_parse","mrfUrl": "https://example-hospital.com/standard-charges.json","billingCodeType": "CPT","maxItems": 1000,"sp_intended_usage": "Rate comparison for employer health plan negotiation","sp_improvement_suggestions": "None"}
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | mrf_parse | mrf_parse parses a single MRF URL. hospital_list returns CMS enrollment records. discover_and_parse combines both. |
mrfUrl | string | — | MRF JSON URL to fetch and parse. Required for mrf_parse mode. |
stateFilter | string | — | Two-letter state code (e.g. CA, TX). Filters hospital list results. |
hospitalCcn | string | — | CMS Certification Number to filter to a single hospital. |
billingCodeType | string | — | Filter by code system: CPT, HCPCS, MS-DRG, APR-DRG, RC, NDC, or Internal. Leave blank for all. |
billingCode | string | — | Specific billing code to filter (e.g. 70551). Leave blank for all. |
payerFilter | string | — | Case-insensitive payer name substring filter. |
maxItems | integer | 15 | Maximum records to return. 0 = unlimited. |
Hospital List Mode Input
{"mode": "hospital_list","stateFilter": "TX","maxItems": 500,"sp_intended_usage": "Building a hospital database for Texas","sp_improvement_suggestions": "None"}
CMS Hospital Price Transparency Scraper Output Fields
MRF Parse Mode
Returns one row per payer/plan/billing code combination.
{"hospital_name": "EXAMPLE REGIONAL MEDICAL CENTER","hospital_ccn": "","hospital_npi": "","hospital_address": "","hospital_city": "","hospital_state": "","hospital_zip": "","mrf_url": "https://example-hospital.com/standard-charges.json","mrf_version": "3.0.0","mrf_last_updated": "2025-01-15","billing_code": "70551","billing_code_type": "CPT","description": "MRI Brain without contrast","payer_name": "Aetna","plan_name": "Aetna PPO Standard","setting": "outpatient","methodology": "fee schedule","standard_charge_gross": 4200,"standard_charge_discounted_cash": 1890,"standard_charge_negotiated_dollar": 1240,"standard_charge_negotiated_percentage": null,"standard_charge_negotiated_algorithm": "","standard_charge_min": 980,"standard_charge_max": 1600,"estimated_amount": 1240,"additional_payer_notes": "","record_type": "charge_row"}
| Field | Type | Description |
|---|---|---|
hospital_name | string | Hospital name from MRF header |
hospital_ccn | string | CMS Certification Number (populated in hospital_list mode) |
hospital_npi | string | National Provider Identifier |
hospital_address | string | Street address |
hospital_city | string | City |
hospital_state | string | State abbreviation |
hospital_zip | string | ZIP code |
mrf_url | string | Source MRF file URL |
mrf_version | string | CMS schema version (e.g. 3.0.0) |
mrf_last_updated | string | Date the MRF was last updated |
billing_code | string | Billing code (e.g. 70551) |
billing_code_type | string | Code system: CPT, HCPCS, MS-DRG, RC, NDC, Internal |
description | string | Service or item description |
payer_name | string | Payer name |
plan_name | string | Plan name |
setting | string | Service setting: inpatient, outpatient, or both |
methodology | string | Rate methodology (fee schedule, percent of total billed charges, etc.) |
standard_charge_gross | number | Gross / chargemaster price |
standard_charge_discounted_cash | number | Cash / self-pay discount price |
standard_charge_negotiated_dollar | number | Negotiated dollar amount |
standard_charge_negotiated_percentage | number | Negotiated percentage of gross charge |
standard_charge_negotiated_algorithm | string | Algorithm description when rate is formula-based |
standard_charge_min | number | De-identified minimum negotiated charge |
standard_charge_max | number | De-identified maximum negotiated charge |
estimated_amount | number | Estimated allowed amount |
additional_payer_notes | string | Additional payer or plan notes |
record_type | string | charge_row for MRF data, hospital_info for enrollment data |
Hospital List Mode
Returns one row per hospital from the CMS enrollment dataset.
{"hospital_name": "MEMORIAL HOSPITAL OF LARAMIE COUNTY","hospital_ccn": "530012","hospital_npi": "1568469223","hospital_address": "214 E 23RD ST","hospital_city": "CHEYENNE","hospital_state": "WY","hospital_zip": "82001","mrf_url": "","mrf_version": "","mrf_last_updated": "","billing_code": "","billing_code_type": "","description": "","payer_name": "","plan_name": "","setting": "","methodology": "","standard_charge_gross": null,"standard_charge_discounted_cash": null,"standard_charge_negotiated_dollar": null,"standard_charge_negotiated_percentage": null,"standard_charge_negotiated_algorithm": "","standard_charge_min": null,"standard_charge_max": null,"estimated_amount": null,"additional_payer_notes": "","record_type": "hospital_info"}
🔍 FAQ
How do I scrape hospital prices from CMS machine-readable files?
CMS Hospital Price Transparency Scraper handles this in mrf_parse mode. Supply the MRF URL in the mrfUrl field, set optional filters, and run. The scraper fetches the file, auto-detects whether it uses CMS v1/v2 or v3 schema, and outputs one structured row per payer/plan/billing code combination.
Where do I find hospital MRF URLs?
CMS does not publish a single comprehensive directory of MRF URLs. Individual hospitals publish their own files on their websites, typically in a "price transparency" or "standard charges" section. The CMS Hospital Price Transparency enforcement dataset tracks compliance but does not always include direct file links. Third-party aggregators like Turquoise Health and Dolthub's hospital-price-transparency project maintain compiled URL lists.
What billing code types does this scraper support?
CMS Hospital Price Transparency Scraper supports all billing code types defined in the CMS standard: CPT, HCPCS, MS-DRG, APR-DRG, Revenue Code (RC), NDC (drug codes), and Internal. Filter using the billingCodeType input field.
How much does this scraper cost to run?
CMS Hospital Price Transparency Scraper uses pay-per-event pricing: $0.10 per run start plus $0.001 per record. A run parsing 10,000 charge rows from a single MRF file costs about $10.10. Use maxItems to control scope on large hospital chargemasters.
Does this scraper need proxies?
No. CMS data APIs and GitHub-hosted example files are public and don't require proxies. The proxyConfiguration input field is available if your target MRF is hosted somewhere that rate-limits, but for standard CMS data sources it isn't necessary.
Can I filter to a single hospital?
Yes. Use hospitalCcn with the hospital's CMS Certification Number to filter hospital_list mode. For MRF parsing, supply the specific hospital's MRF URL in mrfUrl.
Need More Features?
Need support for CSV MRF format, streaming parsing for 1GB+ files, or a different data source? File an issue or get in touch.
Why Use CMS Hospital Price Transparency Scraper?
- CMS schema coverage — Handles CMS v1, v2, and v3 JSON schemas with auto-detection. Most hospital-built parsers handle only one version.
- Dual-mode output — Returns charge rows from MRF files and hospital identity data from the CMS enrollment API in the same output schema, so you can join the two datasets without additional ETL.
- Affordable at scale — At $0.001 per record, parsing 100,000 charge rows costs $100, which is less than most healthcare data subscriptions charge per hospital.