Water Quality Data Scraper
Pricing
Pay per usage
Water Quality Data Scraper
Water Quality Data Scraper. Extract structured data with automatic pagination, proxy rotation, and JSON/CSV export. Pay only for results.
What does Water Quality Data Scraper do?
Water Quality Data Scraper is an Apify actor that extracts water quality reports, contamination data, and violation records from the US Environmental Protection Agency (EPA) databases. It queries the EPA Envirofacts SDWIS (Safe Drinking Water Information System) API to retrieve water system information, regulatory violations, and lead/copper testing results for any US state. The actor produces structured datasets ideal for public health research, environmental journalism, community advocacy, and regulatory compliance analysis.
Why use Water Quality Data Scraper?
Access to clean drinking water is a fundamental public health concern, and the EPA maintains extensive databases tracking water quality across the United States. However, navigating these databases directly is complex and time-consuming. This actor simplifies the process by making targeted API calls and delivering the data in clean, structured format. Whether you are investigating water quality issues in your community, conducting academic research on environmental health, or building data visualizations for journalism, this tool provides the data foundation you need.
How to use Water Quality Data Scraper
- Open the actor on the Apify platform.
- Enter a two-letter US state code in the
stateCodefield (e.g., NY, CA, TX, FL). - Set
maxResultsto control how many records to retrieve from each data source. - Click Start to begin fetching data from the EPA.
- Download results from the Dataset tab in JSON, CSV, or Excel format.
Input Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
stateCode | string | Two-letter US state code | "NY" |
maxResults | integer | Maximum records per data source | 50 |
Output
The dataset contains the following fields for each record:
| Field | Description |
|---|---|
system | Name of the public water system |
state | State code |
contaminant | Contaminant name or code |
level | Measured contamination level |
mcl | Maximum contaminant level (regulatory limit) |
violationType | Type of regulatory violation |
sampleDate | Date of the sample or compliance period |
source | Data source within the EPA database |
Cost Estimate
This actor makes lightweight API calls to the EPA Envirofacts service and requires minimal compute resources. A typical run costs approximately $0.001-0.003 on the Apify platform. The actor runs on 512 MB of memory by default. Each run queries multiple EPA endpoints (water systems, violations, and lead/copper data) to provide comprehensive coverage.
Tips and Best Practices
- Use standard two-letter US state abbreviations: NY (New York), CA (California), TX (Texas), FL (Florida), MI (Michigan), PA (Pennsylvania), etc.
- The actor queries three separate EPA data sources: water system registrations, violation records, and lead/copper test results. This provides different perspectives on water quality for each state.
- For states with major water quality concerns (e.g., MI for Flint), increase
maxResultsto capture more violation records. - Combine this data with the Air Quality Monitor actor for a comprehensive environmental health analysis of specific regions.
- The EPA data may have reporting delays, so the most recent records might be from several months ago.
- Contaminant codes can be looked up on the EPA website for human-readable names if they appear as numeric codes.
- Schedule regular runs to track changes in violation status over time.
- Export to CSV for easy import into GIS tools to map water quality issues geographically.
