Open Payments Scraper avatar

Open Payments Scraper

Pricing

Pay per event

Go to Apify Store
Open Payments Scraper

Open Payments Scraper

Extract official CMS Open Payments records, dataset metadata, and CSV URLs by year, payment type, NPI, company, and state.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Extract CMS Open Payments payment records, dataset metadata, and CSV download URLs from the official public CMS Open Payments APIs.

Open Payments Scraper helps compliance teams, healthcare analysts, journalists, pharma/medtech researchers, and sales operations teams turn the large CMS Open Payments catalog into clean Apify datasets. Use it to pull payments by reporting year, payment type, physician NPI, recipient name, company name, and state, or to discover official CSV files for bulk processing.

What does Open Payments Scraper do?

Open Payments Scraper connects to the public CMS Open Payments metadata and datastore APIs.

It can:

  • ๐Ÿฅ Extract detailed general, research, and ownership payment records.
  • ๐Ÿ“… Select CMS reporting years such as 2024, 2023, or 2022.
  • ๐Ÿ”Ž Filter output by recipient NPI, recipient name, company name, and state.
  • ๐Ÿ“„ Return dataset metadata only for lightweight discovery.
  • ๐Ÿ”— Return official CMS CSV download URLs for bulk ETL workflows.
  • ๐Ÿงพ Preserve the raw CMS API record for auditability.

Who is it for?

Open Payments Scraper is useful for teams that need repeatable access to CMS payment transparency data.

  • Compliance teams monitoring physician-industry relationships.
  • Healthcare analytics teams building payment dashboards.
  • Pharma and medtech commercial teams researching customer segments.
  • Journalists investigating healthcare payments.
  • Academic researchers analyzing Open Payments trends.
  • Data engineers who need stable CSV URLs and dataset identifiers.

Why use this actor?

The CMS website and bulk files are powerful but large. This actor gives you a controlled Apify interface with limits, filters, and normalized output.

Instead of manually navigating CMS pages, downloading huge CSV files, and writing one-off scripts, you can run a saved Apify task and export the results as JSON, CSV, Excel, or through the API.

What data can you extract?

FieldDescription
recordTypepayment, dataset_metadata, or csv_url
datasetIdentifierCMS dataset UUID
datasetTitleCMS dataset title
datasetYearReporting year when available
datasetTypeGeneral, research, ownership, summary, or profile
recipientNamePhysician, entity, or hospital name
recipientNpiCovered recipient or investigator NPI
companyNameManufacturer, GPO, or submitting organization
paymentAmountUsdPayment amount in USD
paymentDateCMS payment date
paymentNatureNature of payment or transfer of value
csvDownloadUrlOfficial CMS CSV download URL
rawRecordFull original CMS API row

How much does it cost to scrape CMS Open Payments data?

This actor uses pay-per-event pricing.

  • Start event: $0.005 per run.
  • Per result item: Free $0.000032381, Starter/BRONZE $0.000028158, Scale/SILVER $0.000021963, Business/GOLD $0.000016895, Platinum $0.000011263, Diamond $0.00001.
  • Each saved payment, metadata, or CSV URL row is charged as one result item.
  • Use metadataOnly or csvUrlsOnly for cheap discovery runs before bulk extraction.
  • Keep maxItems low while testing filters, then raise it for production runs.

Quick start

  1. Open the actor on Apify.
  2. Choose one or more reporting years.
  3. Choose dataset types such as general, research, or ownership.
  4. Optionally enter an NPI, recipient name, company name, or state.
  5. Set maxItems.
  6. Run the actor.
  7. Export the dataset or consume it through the Apify API.

Input options

years

Array of CMS reporting years. Example: [2024].

datasetTypes

Choose one or more dataset families:

  • general
  • research
  • ownership
  • summary
  • profile
  • all

datasetIdentifiers

Optional CMS dataset UUIDs. If you already know the exact dataset identifier, this overrides year/type selection.

recipientNpi

Filter by covered recipient or principal investigator NPI.

recipientName

Case-insensitive filter for physician, entity, or teaching hospital names.

companyName

Case-insensitive filter for manufacturer, GPO, or reporting entity names.

state

Two-letter state code such as CA, NY, or TX.

metadataOnly

Return one row per matching CMS dataset with title, identifier, modified date, source URL, and download links.

csvUrlsOnly

Return one row per matching dataset focused on official CSV download URLs.

maxItems

Maximum number of output rows to save.

Example input

{
"years": [2024],
"datasetTypes": ["general"],
"state": "CA",
"maxItems": 100
}

Output example

{
"recordType": "payment",
"datasetIdentifier": "e6b17c6a-2534-4207-a4a1-6746a14911ff",
"datasetTitle": "2024 General Payment Data",
"datasetYear": "2024",
"datasetType": "general",
"recipientName": "Jane Smith",
"recipientNpi": "1234567890",
"recipientState": "CA",
"companyName": "Example Medical Inc.",
"paymentAmountUsd": 25.0,
"paymentDate": "2024-05-10",
"paymentNature": "Food and Beverage",
"csvDownloadUrl": "https://download.cms.gov/openpayments/...csv"
}

Tips for better results

  • Start with metadataOnly: true to see available datasets.
  • Use csvUrlsOnly: true when you want CMS bulk files instead of paginated rows.
  • Use exact NPI values for targeted provider checks.
  • Use state filters for regional compliance reviews.
  • Increase maxItems only after confirming your filters.

Integrations

Open Payments Scraper works with common Apify workflows:

  • Export to Google Sheets for compliance review.
  • Send dataset items to Make or Zapier.
  • Load JSONL output into Snowflake, BigQuery, or Postgres.
  • Schedule recurring monitoring tasks.
  • Combine with lead enrichment or healthcare-provider datasets.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/open-payments-scraper').call({
years: [2024],
datasetTypes: ['general'],
maxItems: 100,
});
console.log(run.defaultDatasetId);

Python

from apify_client import ApifyClient
client = ApifyClient('MY-APIFY-TOKEN')
run = client.actor('automation-lab/open-payments-scraper').call(run_input={
'years': [2024],
'datasetTypes': ['general'],
'maxItems': 100,
})
print(run['defaultDatasetId'])

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~open-payments-scraper/runs?token=$APIFY_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"years":[2024],"datasetTypes":["general"],"maxItems":100}'

MCP usage

Use this actor from MCP-compatible clients through Apify MCP.

MCP URL:

https://mcp.apify.com/?tools=automation-lab/open-payments-scraper

Claude Code setup:

$claude mcp add apify-open-payments "https://mcp.apify.com/?tools=automation-lab/open-payments-scraper"

MCP JSON configuration:

{
"mcpServers": {
"apify-open-payments": {
"url": "https://mcp.apify.com/?tools=automation-lab/open-payments-scraper"
}
}
}

Example prompts:

  • "Run Open Payments Scraper for 2024 general payments in California and summarize the largest payments."
  • "Find CMS Open Payments CSV URLs for 2024 research and ownership payment datasets."
  • "Extract 100 Open Payments records for a specific physician NPI."

Data freshness

The actor reads the live CMS Open Payments metadata API. CMS controls dataset publication schedules, modified dates, and CSV file paths.

Legality

CMS Open Payments data is public United States government transparency data. You are responsible for using it lawfully and for complying with your organization's privacy, compliance, and data-retention policies.

FAQ

Why did I get fewer rows than maxItems?

Your filters may be narrow, or the selected dataset may have fewer matching rows within the bounded scan window. Try metadata discovery first, broaden filters, or use the official CSV URL for bulk processing.

Why is rawRecord large?

CMS records contain many columns. The actor keeps the raw row so analysts can audit values that are not normalized into top-level fields.

Should I use CSV mode or payment mode?

Use payment mode for API-ready samples and filtered exports. Use CSV URL mode for warehouse-scale ETL.

Other Automation Lab actors can complement this data:

Support

If a CMS field changes or a dataset identifier stops working, open an issue on the actor page with the run ID and input used.