CMS Open Data Scraper avatar

CMS Open Data Scraper

Pricing

from $3.75 / 1,000 result items

Go to Apify Store
CMS Open Data Scraper

CMS Open Data Scraper

Export healthcare datasets from the Centers for Medicare & Medicaid Services Open Data portal. Pull provider directories, hospital quality, drug spending, Medicare enrollment, and 3,000+ other CMS datasets with metadata and row-level data.

Pricing

from $3.75 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

ParseForge Banner

🏥 CMS Open Data Scraper

🚀 Export the U.S. Medicare and Medicaid data catalog in seconds. Search 5,000+ healthcare datasets by keyword, theme, or publisher, and pull the rows behind each one. No registration, no manual CSV wrangling.

🕒 Last updated: 2026-05-21 · 📊 13 fields per catalog record · 🏥 5,000+ datasets · 🇺🇸 CMS national catalog · 2 modes

The CMS Open Data Scraper exports the official Centers for Medicare & Medicaid Services open data catalog. Each catalog record returns 13 fields, including dataset identifier, title, description, publisher, contact, keyword and theme tags, modified date, access level, landing page, and download link. Switch to dataset mode and the same Actor returns the rows behind any catalog entry.

The catalog covers every public CMS dataset: hospital cost reports, Medicare Part B and Part D drug spending, provider directories, Medicaid enrollment, nursing-home compare, marketplace open enrollment, hospital quality indicators, and thousands more. Coverage spans every state, every Medicare Administrative Contractor region, and every measurement program CMS publishes.

🎯 Target Audience💡 Primary Use Cases
Healthcare analysts, hospital finance teams, payer pricing teams, policy researchers, health-tech founders, journalistsProvider directory enrichment, drug-spend benchmarking, hospital cost analysis, quality-score lookups, payer comparison, claims research

📋 What the CMS Open Data Scraper does

Two run modes in a single Actor:

  • 🗂️ Catalog mode. List every CMS dataset matching your search term, with metadata, tags, and the download link for each one.
  • 📥 Dataset mode. Pull the rows behind a specific dataset slug straight into your dataset.
  • 🏷️ Multi-dimensional filtering. Restrict by keyword, theme, publisher, access level, or last-modified date.
  • 🔁 Always current. Every run fetches the live catalog state, so your downstream dataset reflects what CMS published today.

Each catalog record carries identifiers, descriptive metadata (title, description, contact), classification tags (keyword, theme), provenance (modified date, access level), and ready-to-use links (landing page, download URL).

💡 Why it matters: the CMS catalog is one of the richest open datasets in U.S. healthcare, but the listing surface is fragmented and the per-dataset download formats vary widely. This Actor gives you a single clean shape for both the catalog and the rows underneath it.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded CMS dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"catalog"catalog lists datasets. dataset pulls rows from a specific slug.
searchQuerystring"hospital"Catalog keyword search. Example: medicare, drug spending, nursing home.
keywordstring""Match a value in the dataset's keyword tags (for example quality, physician).
themestring""Match the dataset's theme or category (for example Medicare, Hospitals).
publisherstring""Match the publishing organization name.
accessLevelstring""public, restricted public, or non-public.
modifiedSincestring""Keep only datasets modified on or after this ISO date.
datasetSlugstring""CMS dataset identifier for dataset mode. Find it in catalog output.

Example: list every hospital-quality dataset modified since 2024.

{
"maxItems": 100,
"mode": "catalog",
"searchQuery": "hospital",
"keyword": "quality",
"modifiedSince": "2024-01-01"
}

Example: pull rows from a specific dataset slug.

{
"maxItems": 500,
"mode": "dataset",
"datasetSlug": "9wzi-peqs"
}

⚠️ Good to Know: CMS dataset coverage varies by program. Some entries are continuously refreshed, others are quarterly or annual releases. Check the modified field on each catalog record to confirm freshness before pulling rows.


📊 Output

Each catalog record contains 13 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 identifierstringcatalog ID URL (e.g. /dataset/7cf9662e-7c5c-4fe0-a8c6-828edf81a23c)
📝 titlestring"AHRQ Patient Safety Indicator 11 (PSI-11) Measure Rates"
📄 descriptionstringfull dataset summary
🏢 publisherstring"Centers for Medicare & Medicaid Services"
📇 contactPointobject{ "name": "...", "email": "..." }
🏷️ keywordarray["Medicare","Hospitals & Facilities","Safety of Care"]
🗂️ themearray["Medicare"]
📅 modifiedstring"2020-12-08"
🔓 accessLevelstring"public"
🔗 landingPagestring"https://data.cms.gov/quality-of-care/..."
📥 downloadUrlstring | nulldirect CSV or JSON download
📊 rowsarraysample rows (dataset mode only)
🕒 scrapedAtISO 8601"2026-05-21T22:14:33.381Z"
errorstring | nullpopulated only on failure

📦 Sample records


✨ Why choose this Actor

Capability
🏥Full CMS catalog. Every dataset CMS publishes, queryable by keyword, theme, publisher, or modified date.
🔀Two run modes. Catalog mode for discovery, dataset mode for row-level pulls.
🏷️Multi-dimensional filtering. Combine keyword, theme, publisher, and access level in a single run.
Fast. 10 catalog entries in under 5 seconds, 1,000 in under a minute.
🔁Always fresh. Every run pulls the live catalog state, so your dataset reflects current CMS publications.
🇺🇸Official U.S. source. Maintained by the Centers for Medicare & Medicaid Services and used by every major healthcare analyst.
🚫No keys to manage. No personal token required for the default run.

📊 The CMS catalog is the backbone of U.S. healthcare data: provider directories, drug spend, hospital quality, and Medicare and Medicaid program metrics all live here.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ CMS Open Data Scraper (this Actor)$5 free credit, then pay-per-use5,000+ datasetsLive per runsearch, keyword, theme, publisher, access, date⚡ 2 min
Build your own pipelineFree, but engineering hoursFull catalog if you build itManualDIY🐢 Days
Commercial healthcare data vendors$1,000+/monthVendor-curated subsetVendor cadenceVendor's⏳ Hours
Manual CSV downloadsFreeStale snapshotsQuarterly at bestNone🕒 Variable

Pick this Actor when you want every CMS dataset in a single normalized shape, with row-level pulls available on demand.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the CMS Open Data Scraper page on the Apify Store.
  3. 🎯 Set input. Choose catalog or dataset mode, type a search term, set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🏥 Hospital Finance & Strategy

  • Hospital cost-report benchmarking
  • Medicare wage-index lookups
  • Disproportionate-share hospital analysis
  • Service-line revenue modeling

💊 Payer & PBM Pricing

  • Part D drug-spend trend tracking
  • Brand-vs-generic substitution research
  • Provider directory enrichment
  • Network-adequacy analysis

📊 Healthcare Analytics & BI

  • Quality-score dashboards by facility
  • Readmission and complication tracking
  • Cross-state Medicaid enrollment comparison
  • Long-term-care utilization analysis

📰 Policy Research & Journalism

  • CMS rule-change impact modeling
  • Investigative reporting on provider outliers
  • Federal-program transparency research
  • Public-health surveillance datasets

🔌 Automating CMS Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep your downstream healthcare warehouse in sync as CMS publishes new datasets and updates existing ones.


🌟 Beyond business use cases

Healthcare open data powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Health-services research with cited reference values
  • Policy analysis for Medicare and Medicaid program design
  • Coursework in health informatics, biostatistics, and public health
  • Reproducible studies with versioned dataset pulls

🎨 Personal and creative

  • Hobby data-journalism projects
  • Indie health-tech prototypes
  • Educational content for healthcare educators
  • Visualizations and explainer dashboards

🤝 Non-profit and civic

  • Patient-advocacy organizations tracking provider quality
  • Community-health needs assessments
  • Investigative reporting on Medicare fraud or quality outliers
  • Civic-data hackathons and transparency projects

🧪 Experimentation

  • Train healthcare-classification models
  • Prototype agent pipelines that summarize dataset catalogs
  • Build cost-of-care comparison dashboards
  • Validate provider directory matching rules

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick catalog mode to list datasets matching your search term, or dataset mode and pass a slug to pull the rows from a specific dataset. The Actor returns a single clean shape per record, ready to download as CSV, Excel, JSON, or XML.

📏 How fresh is the data?

CMS updates its catalog continuously. Every run of this Actor pulls the latest state, so your dataset reflects current CMS publications as of run time. Use the modified field on each catalog record to verify freshness per dataset.

🏥 Does it include hospital cost reports?

Yes. Search for cost report in catalog mode to surface every cost-report dataset CMS publishes, including hospital, skilled-nursing, hospice, and home-health.

💊 Can I pull Medicare Part D drug spend?

Yes. Use catalog mode with searchQuery: "drug spending" to find the latest Part D spending dataset, then run dataset mode with that slug to pull the rows.

🔓 What does accessLevel mean?

public means anyone can download. restricted public requires registration or data-use agreement. non-public is metadata only. Filter on this field if you only want fully open data.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (daily, weekly, monthly) and keep a downstream healthcare warehouse in sync.

CMS publishes its catalog as U.S. government open data. Public-access datasets are free to use, including for commercial products. Review CMS terms for any project-specific obligations.

💼 Can I use this data commercially?

Yes. Public-access CMS data is published as open data and is free for commercial use. You are responsible for complying with the dataset-specific terms and any downstream regulatory requirements.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from interrupted runs are preserved so you never lose progress.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

CMS Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe healthcare data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh CMS data into your analytics backend, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Centers for Medicare & Medicaid Services or the U.S. Department of Health and Human Services. All trademarks mentioned are the property of their respective owners. Only publicly available CMS open data is collected.