Water Quality Data Scraper avatar

Water Quality Data Scraper

Pricing

Pay per usage

Go to Apify Store
Water Quality Data Scraper

Water Quality Data Scraper

Water Quality Data Scraper. Extract structured data with automatic pagination, proxy rotation, and JSON/CSV export. Pay only for results.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny

Donny

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 minutes ago

Last modified

Categories

Share

What does Water Quality Data Scraper do?

Water Quality Data Scraper is an Apify actor that extracts water quality reports, contamination data, and violation records from the US Environmental Protection Agency (EPA) databases. It queries the EPA Envirofacts SDWIS (Safe Drinking Water Information System) API to retrieve water system information, regulatory violations, and lead/copper testing results for any US state. The actor produces structured datasets ideal for public health research, environmental journalism, community advocacy, and regulatory compliance analysis.

Why use Water Quality Data Scraper?

Access to clean drinking water is a fundamental public health concern, and the EPA maintains extensive databases tracking water quality across the United States. However, navigating these databases directly is complex and time-consuming. This actor simplifies the process by making targeted API calls and delivering the data in clean, structured format. Whether you are investigating water quality issues in your community, conducting academic research on environmental health, or building data visualizations for journalism, this tool provides the data foundation you need.

How to use Water Quality Data Scraper

  1. Open the actor on the Apify platform.
  2. Enter a two-letter US state code in the stateCode field (e.g., NY, CA, TX, FL).
  3. Set maxResults to control how many records to retrieve from each data source.
  4. Click Start to begin fetching data from the EPA.
  5. Download results from the Dataset tab in JSON, CSV, or Excel format.

Input Parameters

ParameterTypeDescriptionDefault
stateCodestringTwo-letter US state code"NY"
maxResultsintegerMaximum records per data source50

Output

The dataset contains the following fields for each record:

FieldDescription
systemName of the public water system
stateState code
contaminantContaminant name or code
levelMeasured contamination level
mclMaximum contaminant level (regulatory limit)
violationTypeType of regulatory violation
sampleDateDate of the sample or compliance period
sourceData source within the EPA database

Cost Estimate

This actor makes lightweight API calls to the EPA Envirofacts service and requires minimal compute resources. A typical run costs approximately $0.001-0.003 on the Apify platform. The actor runs on 512 MB of memory by default. Each run queries multiple EPA endpoints (water systems, violations, and lead/copper data) to provide comprehensive coverage.

Tips and Best Practices

  • Use standard two-letter US state abbreviations: NY (New York), CA (California), TX (Texas), FL (Florida), MI (Michigan), PA (Pennsylvania), etc.
  • The actor queries three separate EPA data sources: water system registrations, violation records, and lead/copper test results. This provides different perspectives on water quality for each state.
  • For states with major water quality concerns (e.g., MI for Flint), increase maxResults to capture more violation records.
  • Combine this data with the Air Quality Monitor actor for a comprehensive environmental health analysis of specific regions.
  • The EPA data may have reporting delays, so the most recent records might be from several months ago.
  • Contaminant codes can be looked up on the EPA website for human-readable names if they appear as numeric codes.
  • Schedule regular runs to track changes in violation status over time.
  • Export to CSV for easy import into GIS tools to map water quality issues geographically.