USA HealthData.gov HHS Open Data Scraper
Pricing
Pay per event
USA HealthData.gov HHS Open Data Scraper
Collect health data catalog information from HealthData.gov . Filter by category, tags, view type, authority, and search terms to find exactly what you need. Perfect for researchers, data analysts, and healthcare professionals who need to discover and access public health datasets efficiently.
Pricing
Pay per event
Rating
0.0
(0)
Developer
ParseForge
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

📊 USA HealthData.gov HHS Open Data Scraper
🚀 Collect health datasets, stories, charts, and maps from the U.S. HHS Open Data Catalog in seconds. Filter by category, tags, view type, and authority. No coding, no API keys required.
🕒 Last updated: 2026-04-16 · 📊 40 fields · 🏥 HHS Open Data · 📂 Datasets, Stories, Charts, Maps
The HealthData.gov Scraper automates the discovery and collection of health datasets from the U.S. Department of Health and Human Services Open Data Catalog. Each record includes the dataset name, unique ID, description, publisher, contact information, categories, tags, view and download counts, license details, file download links, and timestamps. You can filter by keyword, category (CDC, FDA, CMS, NIH, HHS), tags, view type (datasets, stories, charts, maps, files), authority (official or community), and sort order. Free users can collect up to 10 items per run, while paid users can retrieve up to 1,000,000 records.
Whether you are a healthcare researcher tracking new CDC datasets, a data scientist building a catalog of open health data, or a policy analyst monitoring HHS publications, this tool eliminates the manual browsing that HealthData.gov requires. Results export to JSON, CSV, or Excel, making it easy to load records into your database, BI tool, or analysis pipeline. Schedule recurring runs to automatically detect new datasets as they are published. The scraper handles pagination, normalizes metadata fields, and processes multiple content types including datasets, stories, charts, maps, and downloadable files.
| Target Audience | Use Cases |
|---|---|
| Healthcare Researchers | Discover and catalog open health datasets for analysis |
| Data Scientists | Build metadata indexes of available HHS data sources |
| Policy Analysts | Monitor new publications from CDC, FDA, CMS, and NIH |
| Public Health Teams | Track epidemiological datasets and surveillance data |
| Journalists | Find health data for investigative reporting |
| Academic Institutions | Locate research datasets for grant-funded projects |
📋 What the HealthData.gov Scraper does
- 📝 Dataset names and IDs - capture the title, unique identifier, and description for every item in the HHS catalog
- 🔗 Direct URLs - collect working links to each dataset page for quick access and verification
- 📊 Engagement metrics - pull view counts and download counts to identify the most popular datasets
- 👤 Publisher and contact info - identify which health authority published the data and how to reach them
- 🏷️ Categories and tags - classify items by health topic, authority (CDC, FDA, CMS), and custom tags
- 📁 File downloads - extract download links with format and size information for each available file
The scraper connects to the HealthData.gov catalog API and iterates through results using your specified filters. It processes datasets, stories, charts, maps, files, and calendars. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. The tool supports both URL-based browsing (paste a HealthData.gov browse URL) and filter-based searching (set keywords and categories directly).
💡 Why it matters: HealthData.gov hosts thousands of datasets from dozens of health agencies. Manually browsing and cataloging this content is time-consuming. This scraper gives you structured metadata for the entire catalog in minutes.
🎬 Full Demo
🚧 Coming soon...
⚙️ Input
| Field | Type | Required | Description |
|---|---|---|---|
startUrl | string | No | Direct URL to a HealthData.gov browse page. Use this OR search filters, not both. |
maxItems | integer | No | Maximum items to collect. Free: 10. Paid: up to 1,000,000. |
q | string | No | Search term to find datasets (e.g., "diabetes", "vaccination"). |
category | string | No | Filter by category: CDC, FDA, CMS, HHS, NIH, Hospital, State. |
tags | string | No | Filter by tags (comma-separated values). |
limitTo | string | No | Content type: Datasets, Stories, Charts, Maps, Forms, Files, Calendars. |
authority | string | No | Official health agency data or community-contributed content. |
sortBy | string | No | Sort order: newest, alpha, most_accessed, relevance, recently_updated. |
Example 1: Browse newest datasets
{"startUrl": "https://healthdata.gov/browse?sortBy=newest&page=1&pageSize=20","maxItems": 50}
Example 2: Search for vaccination data
{"q": "vaccination","category": "Health","limitTo": "datasets","sortBy": "most_accessed","maxItems": 100}
⚠️ Good to Know: Free users are automatically limited to 10 items per run. Use either
startUrlOR the search filters (q, category, tags), not both at the same time. ThelimitTofield lets you focus on specific content types like datasets or charts.
📊 Output
🧾 Schema
| Emoji | Field | Type | Description |
|---|---|---|---|
| 📝 | datasetId | string | Unique identifier for the dataset |
| 🏷️ | datasetName | string | Title of the dataset or resource |
| 🔗 | datasetUrl | string | Direct link to the dataset page |
| 📄 | description | string | Full description of the dataset |
| 👤 | publisher | string | Agency or organization that published the data |
| 📧 | contactEmail | string | Contact email for the dataset publisher |
| 🏷️ | categories | array | Topic categories assigned to the dataset |
| 🔖 | tags | array | Topic tags for filtering and discovery |
| 📊 | viewCount | number | Total number of views |
| 📥 | downloadCount | number | Total number of downloads |
| 📜 | license | string | License type for the dataset |
| 📅 | createdAt | string | Date the dataset was first published |
| 📅 | publicationDate | string | Official publication date |
| 🔄 | lastUpdated | string | Most recent update timestamp |
| 📁 | downloads | array | Available file downloads with format and size |
| 🕐 | scrapedAt | string | Timestamp of data collection |
| ⚠️ | error | string | Error message if processing failed |
📦 Sample records
✨ Why choose this Actor
| Feature | This Actor | Alternatives |
|---|---|---|
| Filter by health authority (CDC, FDA, CMS, NIH) | Yes | No |
| Multiple content types (datasets, stories, charts, maps) | Yes | Datasets only |
| View and download count metrics | Yes | Rarely included |
| Publisher and contact information | Yes | No |
| File download links with format info | Yes | No |
| Up to 1,000,000 results per run | Yes | Capped lower |
| Export to JSON, CSV, and Excel | Yes | JSON only |
📊 HealthData.gov hosts thousands of health datasets from over a dozen federal agencies. This scraper gives you structured access to the full catalog with engagement metrics and file download links.
📈 How it compares to alternatives
| Capability | This Actor | Manual Browsing | Generic Web Scrapers |
|---|---|---|---|
| Health-specific filters (category, tags, authority) | Yes | Yes | No |
| Engagement metrics (views, downloads) | Yes | Visible per page | No |
| Automatic pagination | Yes | No | Partial |
| Multiple content types in one run | Yes | Manual switching | No |
| Scheduled recurring runs | Yes | No | Varies |
| No coding required | Yes | Yes | No |
This scraper is purpose-built for HealthData.gov and handles the catalog's specific API structure, content types, and metadata fields out of the box.
🚀 How to use
- Sign up - Create a free Apify account with $5 credit
- Find the Actor - Search for "HealthData.gov Scraper" in the Apify Store
- Configure your search - Set keywords, category, content type, and max items
- Start the run - Click "Start" and watch results appear in real time
- Export your data - Download as JSON, CSV, or Excel from the dataset tab
🕒 Typical run time: 30 seconds to 2 minutes for up to 50 items. Larger runs with 500+ items may take 5 to 15 minutes.
💼 Business use cases
|
Healthcare Research
|
Public Health Monitoring
|
|
Policy and Journalism
|
Data Engineering
|
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
💳 Do I need a paid Apify plan to run this actor?
No. You can start right now on the free Apify plan, which includes $5 in free monthly credit. That is enough to run this actor several times and explore the output before committing to anything. Paid plans unlock higher limits, more concurrent runs, and larger datasets. Create a free Apify account here to get started.
🚨 What happens if my run fails or returns no results?
Failed runs are not charged. If the source site changes, proxies get rate-limited, or a specific input matches nothing, re-run the actor or open our contact form and we will investigate. You can also check the run log in the Apify console to see why the run stopped.
📏 How many items can I scrape per run?
Free users are limited to 10 items per run so you can preview the output and confirm the actor works for your use case. Paid users can raise maxItems up to 1,000,000 per run. Upgrade here if you need full scale.
🕒 How fresh is the data?
Every run fetches live data at the moment of execution. There is no cache or delay: the records you get reflect what the source returned at that moment. Schedule the actor to maintain a rolling snapshot of the data you need.
🧑💻 Can I call this actor from my own code?
Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for Node.js and Python. You can start a run, read the dataset, and handle webhooks from your own app in a few lines. All you need is your Apify API token.
📤 How do I export the data?
Every Apify dataset can be downloaded in one click from the console as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the Apify API or stream them into BigQuery, S3, and other destinations through built-in integrations.
📅 Can I schedule the actor to run automatically?
Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Results are saved to your dataset and can be delivered to webhooks, email, Slack, cloud storage, or automation tools such as Zapier and Make.
🔌 Automating HealthData.gov Scraper
Node.js example:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('parseforge/healthdata-scraper').call({q: 'vaccination',maxItems: 50,sortBy: 'newest'});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python example:
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('parseforge/healthdata-scraper').call(run_input={'q': 'vaccination','maxItems': 50,'sortBy': 'newest'})items = list(client.dataset(run['defaultDatasetId']).iterate_items())print(items)
Schedules: Set up daily or weekly runs to detect new health datasets as they are published. Combine with Slack or email integrations to get notified whenever new data matches your search criteria.
🔌 Integrate with any app
- Make - Automate health data workflows and route datasets to your team
- Zapier - Connect to 5,000+ apps and trigger actions on new health data
- Slack - Get notifications when new datasets match your criteria
- Airbyte - Stream health data metadata into your data warehouse
- GitHub - Version control your scraper configurations
- Google Drive - Export results directly to Google Sheets
🔗 Recommended Actors
| Actor | Description |
|---|---|
| GSA eLibrary Scraper | Collect government contractor and vendor data from the GSA eLibrary |
| USAspending Scraper | Extract federal spending data and contract information |
| PR Newswire Scraper | Collect press releases and news articles from PR Newswire |
| FINRA BrokerCheck Scraper | Search broker and firm registration data from the FINRA registry |
| FAA Aircraft Registry Scraper | Look up aircraft registration records by N-number from the FAA |
💡 Pro Tip: Combine the HealthData.gov Scraper with the USAspending Scraper to cross-reference health datasets with federal health spending records.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.
Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the U.S. Department of Health and Human Services, HealthData.gov, CDC, FDA, CMS, or NIH. All trademarks mentioned are the property of their respective owners.
