# CDC Health Statistics Scraper (`compute-edge/cdc-health-statistics-scraper`) Actor

Extract public health data from CDC's open data portal (data.cdc.gov). Access mortality causes, COVID-19 deaths, vaccination coverage, chronic disease indicators, birth rates, foodborne outbreaks, and diabetes surveillance. 8 curated datasets with filtering, sorting, and full pagination via the Socr

- **URL**: https://apify.com/compute-edge/cdc-health-statistics-scraper.md
- **Developed by:** [Compute Edge](https://apify.com/compute-edge) (community)
- **Categories:** Lead generation, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CDC Health Statistics Scraper

Extract **public health data** from the **CDC's open data portal** (data.cdc.gov). Access mortality causes, COVID-19 death counts, vaccination coverage, chronic disease indicators, birth rates, foodborne outbreaks, and diabetes surveillance data — 8 curated datasets with powerful filtering, sorting, and full pagination via the Socrata SODA API.

This Actor is designed for **public health researchers**, **epidemiologists**, **data journalists**, **healthcare analytics teams**, and anyone building applications that rely on authoritative US health statistics. Get structured JSON output ready for dashboards, statistical analysis, or RAG pipelines.

### Key Features

| Feature | Description |
|---------|-------------|
| **8 CDC datasets** | Mortality, COVID-19, vaccinations, chronic disease, births, foodborne outbreaks, diabetes |
| **Full-text search** | Search across all fields with keyword queries |
| **SQL-like filtering** | Use Socrata $where clauses for precise filtering |
| **Flexible sorting** | Sort by any field, ascending or descending |
| **Automatic pagination** | Fetches all pages up to your maxResults limit |
| **No API key required** | Uses the free public Socrata SODA API |

### Available Datasets

| Dataset | Description |
|---------|-------------|
| **Leading Causes of Death** | Age-adjusted death rates by cause, state, and year |
| **COVID-19 Deaths by Demographics** | Provisional COVID-19 death counts by age, sex, and race |
| **Vaccination Coverage (Children)** | National and state-level childhood vaccination rates |
| **Chronic Disease Indicators** | State-level chronic disease indicators and risk factors |
| **Birth Rates** | Birth rates by age of mother, race, and Hispanic origin |
| **Underlying Cause of Death (ICD-10)** | Death counts and rates by ICD-10 codes |
| **Foodborne Disease Outbreaks** | National foodborne outbreak surveillance data |
| **Diabetes Prevalence by County** | County-level diagnosed diabetes prevalence |

### How to Scrape CDC Health Statistics

1. **Go to this Actor's page** on the Apify Store
2. **Click "Start"** to open the input configuration form
3. **Select a Dataset** — choose from 8 CDC datasets
4. **Set search query** (optional) — e.g., "heart disease" or "California"
5. **Set filter expression** (optional) — e.g., `year='2023' AND state='California'`
6. **Set sort field** (optional) — sort by any field in the dataset
7. **Set Max Results** — default: 5000, set to 0 for unlimited
8. **Click "Start"** to run the Actor
9. **Download your data** in JSON, CSV, or Excel from the Dataset tab

### Input Example

```json
{
    "dataset": "mortality_causes",
    "searchQuery": "heart disease",
    "filterExpression": "year='2021'",
    "sortField": "deaths",
    "sortDescending": true,
    "maxResults": 100
}
````

### Output Example

```json
{
    "year": "2021",
    "causeName": "Heart disease",
    "state": "California",
    "deaths": "65432",
    "aadr": "142.5",
    "source": "NCHS - Leading Causes of Death, United States"
}
```

### Pricing

This Actor uses **pay-per-result** pricing:

| Event | Price |
|-------|-------|
| Actor Start | $0.00005 |
| Per result | $0.002 |

A typical run of 1,000 records costs approximately $2.00 in Actor fees plus minimal Apify compute costs.

### Use Cases

- **Public health research** — Analyze mortality trends, disease prevalence, and vaccination coverage
- **Epidemiology** — Study cause-of-death patterns by state, age group, and demographic
- **Healthcare analytics** — Feed CDC data into dashboards and reporting tools
- **Data journalism** — Build visualizations of health trends for news stories
- **Policy analysis** — Compare health indicators across states for policy research
- **RAG/LLM pipelines** — Structured health data ready for AI-powered analysis

### FAQ

#### Is it legal to scrape CDC data?

Yes. This Actor uses the official CDC Socrata Open Data API, which provides free public access to CDC datasets. The data is published by a U.S. government agency for public use. No authentication is required.

#### How Much Does It Cost to Scrape CDC Health Data?

See the pricing table above. At $0.002 per result, fetching 5,000 mortality records costs approximately $10.00 in Actor fees plus minimal Apify compute costs.

#### Can I export CDC data to Excel or CSV?

Yes. Apify supports exporting results in JSON, CSV, Excel, XML, and other formats directly from the Dataset tab after a run completes.

#### How often is the CDC data updated?

Update frequency varies by dataset. Mortality data is updated annually, COVID-19 data is updated weekly, and vaccination data is updated periodically. You can schedule this Actor to run at any interval.

#### What filter expressions can I use?

The Actor supports Socrata $where clause syntax, which is similar to SQL. Examples: `year='2023'`, `state='Texas' AND deaths > 1000`, `cause_name LIKE '%cancer%'`. See the [Socrata SoQL documentation](https://dev.socrata.com/docs/queries/) for full syntax.

### Other Scrapers by SeatSignal

- [FDA OpenFDA Scraper](https://apify.com/seatsignal/fda-openfda-scraper) — Extract FDA drug adverse events, labeling, and NDC data
- [FDA Food Recalls Scraper](https://apify.com/seatsignal/fda-food-recalls-scraper) — Extract FDA food recall enforcement reports
- [Medicare Provider Scraper](https://apify.com/seatsignal/medicare-provider-scraper) — Extract Medicare provider and facility data
- [NIH Grants Scraper](https://apify.com/seatsignal/nih-grants-scraper) — Extract NIH research grant awards data
- [ClinicalTrials Scraper](https://apify.com/seatsignal/clinicaltrials-scraper) — Extract clinical trial data from ClinicalTrials.gov

### Legal Disclaimer

This Actor accesses publicly available data from the CDC's open data portal (data.cdc.gov) via the Socrata SODA API. The data is published by the U.S. Centers for Disease Control and Prevention for public use. This Actor does not bypass any authentication or access controls. Users are responsible for ensuring their use of the data complies with applicable laws and regulations.

For questions or support, please open an issue on this Actor's page.

# Actor input Schema

## `dataset` (type: `string`):

Which CDC dataset to query.

## `searchQuery` (type: `string`):

Full-text search term across all fields (uses Socrata $q parameter). E.g., 'heart disease' or 'California'.

## `filterExpression` (type: `string`):

Socrata $where clause for filtering. SQL-like syntax. E.g., "year='2023' AND state='California'" or "deaths > 1000".

## `sortField` (type: `string`):

Field name to sort results by. Leave empty for API default ordering.

## `sortDescending` (type: `boolean`):

Sort in descending order. Only applies when Sort Field is set.

## `maxResults` (type: `integer`):

Maximum number of records to return. Set to 0 for unlimited.

## Actor input object example

```json
{
  "dataset": "mortality_causes",
  "sortDescending": true,
  "maxResults": 5000
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "",
    "filterExpression": "",
    "sortField": ""
};

// Run the Actor and wait for it to finish
const run = await client.actor("compute-edge/cdc-health-statistics-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "",
    "filterExpression": "",
    "sortField": "",
}

# Run the Actor and wait for it to finish
run = client.actor("compute-edge/cdc-health-statistics-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "",
  "filterExpression": "",
  "sortField": ""
}' |
apify call compute-edge/cdc-health-statistics-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=compute-edge/cdc-health-statistics-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CDC Health Statistics Scraper",
        "description": "Extract public health data from CDC's open data portal (data.cdc.gov). Access mortality causes, COVID-19 deaths, vaccination coverage, chronic disease indicators, birth rates, foodborne outbreaks, and diabetes surveillance. 8 curated datasets with filtering, sorting, and full pagination via the Socr",
        "version": "0.0",
        "x-build-id": "ZFrfV33iiZxvzdctf"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/compute-edge~cdc-health-statistics-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-compute-edge-cdc-health-statistics-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/compute-edge~cdc-health-statistics-scraper/runs": {
            "post": {
                "operationId": "runs-sync-compute-edge-cdc-health-statistics-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/compute-edge~cdc-health-statistics-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-compute-edge-cdc-health-statistics-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "dataset"
                ],
                "properties": {
                    "dataset": {
                        "title": "Dataset",
                        "enum": [
                            "mortality_causes",
                            "covid_deaths",
                            "vaccination_coverage",
                            "chronic_disease",
                            "natality",
                            "wonder_mortality",
                            "foodborne_outbreaks",
                            "diabetes_atlas"
                        ],
                        "type": "string",
                        "description": "Which CDC dataset to query.",
                        "default": "mortality_causes"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Full-text search term across all fields (uses Socrata $q parameter). E.g., 'heart disease' or 'California'."
                    },
                    "filterExpression": {
                        "title": "Filter Expression",
                        "type": "string",
                        "description": "Socrata $where clause for filtering. SQL-like syntax. E.g., \"year='2023' AND state='California'\" or \"deaths > 1000\"."
                    },
                    "sortField": {
                        "title": "Sort Field",
                        "type": "string",
                        "description": "Field name to sort results by. Leave empty for API default ordering."
                    },
                    "sortDescending": {
                        "title": "Sort Descending",
                        "type": "boolean",
                        "description": "Sort in descending order. Only applies when Sort Field is set.",
                        "default": true
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of records to return. Set to 0 for unlimited.",
                        "default": 5000
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
