USA HealthData.gov HHS Open Data Scraper avatar

USA HealthData.gov HHS Open Data Scraper

Pricing

Pay per event

Go to Apify Store
USA HealthData.gov HHS Open Data Scraper

USA HealthData.gov HHS Open Data Scraper

Collect health data catalog information from HealthData.gov . Filter by category, tags, view type, authority, and search terms to find exactly what you need. Perfect for researchers, data analysts, and healthcare professionals who need to discover and access public health datasets efficiently.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

📊 USA HealthData.gov HHS Open Data Scraper

🚀 Collect health datasets, stories, charts, and maps from the U.S. HHS Open Data Catalog in seconds. Filter by category, tags, view type, and authority. No coding, no API keys required.

🕒 Last updated: 2026-04-16 · 📊 40 fields · 🏥 HHS Open Data · 📂 Datasets, Stories, Charts, Maps

The HealthData.gov Scraper automates the discovery and collection of health datasets from the U.S. Department of Health and Human Services Open Data Catalog. Each record includes the dataset name, unique ID, description, publisher, contact information, categories, tags, view and download counts, license details, file download links, and timestamps. You can filter by keyword, category (CDC, FDA, CMS, NIH, HHS), tags, view type (datasets, stories, charts, maps, files), authority (official or community), and sort order. Free users can collect up to 10 items per run, while paid users can retrieve up to 1,000,000 records.

Whether you are a healthcare researcher tracking new CDC datasets, a data scientist building a catalog of open health data, or a policy analyst monitoring HHS publications, this tool eliminates the manual browsing that HealthData.gov requires. Results export to JSON, CSV, or Excel, making it easy to load records into your database, BI tool, or analysis pipeline. Schedule recurring runs to automatically detect new datasets as they are published. The scraper handles pagination, normalizes metadata fields, and processes multiple content types including datasets, stories, charts, maps, and downloadable files.

Target AudienceUse Cases
Healthcare ResearchersDiscover and catalog open health datasets for analysis
Data ScientistsBuild metadata indexes of available HHS data sources
Policy AnalystsMonitor new publications from CDC, FDA, CMS, and NIH
Public Health TeamsTrack epidemiological datasets and surveillance data
JournalistsFind health data for investigative reporting
Academic InstitutionsLocate research datasets for grant-funded projects

📋 What the HealthData.gov Scraper does

  • 📝 Dataset names and IDs - capture the title, unique identifier, and description for every item in the HHS catalog
  • 🔗 Direct URLs - collect working links to each dataset page for quick access and verification
  • 📊 Engagement metrics - pull view counts and download counts to identify the most popular datasets
  • 👤 Publisher and contact info - identify which health authority published the data and how to reach them
  • 🏷️ Categories and tags - classify items by health topic, authority (CDC, FDA, CMS), and custom tags
  • 📁 File downloads - extract download links with format and size information for each available file

The scraper connects to the HealthData.gov catalog API and iterates through results using your specified filters. It processes datasets, stories, charts, maps, files, and calendars. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. The tool supports both URL-based browsing (paste a HealthData.gov browse URL) and filter-based searching (set keywords and categories directly).

💡 Why it matters: HealthData.gov hosts thousands of datasets from dozens of health agencies. Manually browsing and cataloging this content is time-consuming. This scraper gives you structured metadata for the entire catalog in minutes.


🎬 Full Demo

🚧 Coming soon...


⚙️ Input

FieldTypeRequiredDescription
startUrlstringNoDirect URL to a HealthData.gov browse page. Use this OR search filters, not both.
maxItemsintegerNoMaximum items to collect. Free: 10. Paid: up to 1,000,000.
qstringNoSearch term to find datasets (e.g., "diabetes", "vaccination").
categorystringNoFilter by category: CDC, FDA, CMS, HHS, NIH, Hospital, State.
tagsstringNoFilter by tags (comma-separated values).
limitTostringNoContent type: Datasets, Stories, Charts, Maps, Forms, Files, Calendars.
authoritystringNoOfficial health agency data or community-contributed content.
sortBystringNoSort order: newest, alpha, most_accessed, relevance, recently_updated.

Example 1: Browse newest datasets

{
"startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20",
"maxItems": 50
}

Example 2: Search for vaccination data

{
"q": "vaccination",
"category": "Health",
"limitTo": "datasets",
"sortBy": "most_accessed",
"maxItems": 100
}

⚠️ Good to Know: Free users are automatically limited to 10 items per run. Use either startUrl OR the search filters (q, category, tags), not both at the same time. The limitTo field lets you focus on specific content types like datasets or charts.


📊 Output

🧾 Schema

EmojiFieldTypeDescription
📝datasetIdstringUnique identifier for the dataset
🏷️datasetNamestringTitle of the dataset or resource
🔗datasetUrlstringDirect link to the dataset page
📄descriptionstringFull description of the dataset
👤publisherstringAgency or organization that published the data
📧contactEmailstringContact email for the dataset publisher
🏷️categoriesarrayTopic categories assigned to the dataset
🔖tagsarrayTopic tags for filtering and discovery
📊viewCountnumberTotal number of views
📥downloadCountnumberTotal number of downloads
📜licensestringLicense type for the dataset
📅createdAtstringDate the dataset was first published
📅publicationDatestringOfficial publication date
🔄lastUpdatedstringMost recent update timestamp
📁downloadsarrayAvailable file downloads with format and size
🕐scrapedAtstringTimestamp of data collection
⚠️errorstringError message if processing failed

📦 Sample records


✨ Why choose this Actor

FeatureThis ActorAlternatives
Filter by health authority (CDC, FDA, CMS, NIH)YesNo
Multiple content types (datasets, stories, charts, maps)YesDatasets only
View and download count metricsYesRarely included
Publisher and contact informationYesNo
File download links with format infoYesNo
Up to 1,000,000 results per runYesCapped lower
Export to JSON, CSV, and ExcelYesJSON only

📊 HealthData.gov hosts thousands of health datasets from over a dozen federal agencies. This scraper gives you structured access to the full catalog with engagement metrics and file download links.


📈 How it compares to alternatives

CapabilityThis ActorManual BrowsingGeneric Web Scrapers
Health-specific filters (category, tags, authority)YesYesNo
Engagement metrics (views, downloads)YesVisible per pageNo
Automatic paginationYesNoPartial
Multiple content types in one runYesManual switchingNo
Scheduled recurring runsYesNoVaries
No coding requiredYesYesNo

This scraper is purpose-built for HealthData.gov and handles the catalog's specific API structure, content types, and metadata fields out of the box.


🚀 How to use

  1. Sign up - Create a free Apify account with $5 credit
  2. Find the Actor - Search for "HealthData.gov Scraper" in the Apify Store
  3. Configure your search - Set keywords, category, content type, and max items
  4. Start the run - Click "Start" and watch results appear in real time
  5. Export your data - Download as JSON, CSV, or Excel from the dataset tab

🕒 Typical run time: 30 seconds to 2 minutes for up to 50 items. Larger runs with 500+ items may take 5 to 15 minutes.


💼 Business use cases

Healthcare Research

  • Discover new CDC and NIH datasets for epidemiological studies
  • Build metadata catalogs of available health data sources
  • Track dataset updates to ensure analyses use current data
  • Identify high-download datasets for literature review context

Public Health Monitoring

  • Monitor new HHS publications weekly for surveillance data
  • Track COVID-19 and infectious disease dataset updates
  • Catalog hospital quality and safety datasets by state
  • Build notification systems for new health data releases

Policy and Journalism

  • Find data sources for health policy analysis and reporting
  • Track which health agencies are publishing the most data
  • Identify trending datasets by view and download counts
  • Build evidence bases for policy recommendations

Data Engineering

  • Catalog available APIs and downloadable files for pipeline planning
  • Monitor dataset freshness and update frequency
  • Build automated ingestion workflows triggered by new publications
  • Track license types across datasets for compliance


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

💳 Do I need a paid Apify plan to run this actor?

No. You can start right now on the free Apify plan, which includes $5 in free monthly credit. That is enough to run this actor several times and explore the output before committing to anything. Paid plans unlock higher limits, more concurrent runs, and larger datasets. Create a free Apify account here to get started.

🚨 What happens if my run fails or returns no results?

Failed runs are not charged. If the source site changes, proxies get rate-limited, or a specific input matches nothing, re-run the actor or open our contact form and we will investigate. You can also check the run log in the Apify console to see why the run stopped.

📏 How many items can I scrape per run?

Free users are limited to 10 items per run so you can preview the output and confirm the actor works for your use case. Paid users can raise maxItems up to 1,000,000 per run. Upgrade here if you need full scale.

🕒 How fresh is the data?

Every run fetches live data at the moment of execution. There is no cache or delay: the records you get reflect what the source returned at that moment. Schedule the actor to maintain a rolling snapshot of the data you need.

🧑‍💻 Can I call this actor from my own code?

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for Node.js and Python. You can start a run, read the dataset, and handle webhooks from your own app in a few lines. All you need is your Apify API token.

📤 How do I export the data?

Every Apify dataset can be downloaded in one click from the console as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the Apify API or stream them into BigQuery, S3, and other destinations through built-in integrations.

📅 Can I schedule the actor to run automatically?

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Results are saved to your dataset and can be delivered to webhooks, email, Slack, cloud storage, or automation tools such as Zapier and Make.


🔌 Automating HealthData.gov Scraper

Node.js example:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('parseforge/healthdata-scraper').call({
q: 'vaccination',
maxItems: 50,
sortBy: 'newest'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python example:

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('parseforge/healthdata-scraper').call(run_input={
'q': 'vaccination',
'maxItems': 50,
'sortBy': 'newest'
})
items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(items)

Schedules: Set up daily or weekly runs to detect new health datasets as they are published. Combine with Slack or email integrations to get notified whenever new data matches your search criteria.

🔌 Integrate with any app

  • Make - Automate health data workflows and route datasets to your team
  • Zapier - Connect to 5,000+ apps and trigger actions on new health data
  • Slack - Get notifications when new datasets match your criteria
  • Airbyte - Stream health data metadata into your data warehouse
  • GitHub - Version control your scraper configurations
  • Google Drive - Export results directly to Google Sheets

ActorDescription
GSA eLibrary ScraperCollect government contractor and vendor data from the GSA eLibrary
USAspending ScraperExtract federal spending data and contract information
PR Newswire ScraperCollect press releases and news articles from PR Newswire
FINRA BrokerCheck ScraperSearch broker and firm registration data from the FINRA registry
FAA Aircraft Registry ScraperLook up aircraft registration records by N-number from the FAA

💡 Pro Tip: Combine the HealthData.gov Scraper with the USAspending Scraper to cross-reference health datasets with federal health spending records.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.


Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the U.S. Department of Health and Human Services, HealthData.gov, CDC, FDA, CMS, or NIH. All trademarks mentioned are the property of their respective owners.