NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data avatar

NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data

Pricing

from $3.00 / 1,000 results

Go to Apify Store
NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data

NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data

Catalog every dataset published by any NOAA ERDDAP server (CoastWatch, AOML, ICOADS, Upwell, and others). Filter by title, institution, or data type (grid vs table). Returns dataset IDs, summaries, and direct griddap/tabledap/WMS/FGDC URLs for oceanographic and climate research pipelines.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Compute Edge

Compute Edge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

NOAA ERDDAP Dataset Catalog Scraper

Catalog every dataset published by any NOAA ERDDAP server — the authoritative source for oceanographic, climate, and atmospheric data. This Actor fetches the complete dataset index from NOAA's Earth System Research Laboratories (ESRL) ERDDAP network and provides structured access to dataset metadata, direct data access URLs (griddap, tabledap, WMS), and full catalog URLs for scientific research pipelines.

ERDDAP servers are deployed across NOAA's ecosystem: CoastWatch (coastal oceanography), AOML (Atlantic Oceanographic), ICOADS (climate observations), Upwell (real-time environmental data), and others. This Actor discovers and catalogs datasets across any publicly accessible ERDDAP instance.

Key Features

  • Complete dataset enumeration — Fetch 100+ datasets per ERDDAP server with pagination
  • Multi-server support — Query any NOAA ERDDAP server (CoastWatch, AOML, ICOADS, Upwell, etc.) or custom ERDDAP instances
  • Flexible filtering — Filter by dataset title, institution, or data type (gridded vs. tabular)
  • Full metadata extraction — Dataset ID, title, summary, institution, access URLs (griddap, tabledap, WMS, FGDC, ISO 19115)
  • Direct data access URLs — Pre-built URLs for each dataset's griddap/tabledap interface, WMS service, metadata records
  • No authentication required — Public ERDDAP APIs, no credentials needed
  • Scientific pipeline ready — Clean JSON output structured for integration with data processing workflows, Jupyter notebooks, RAG systems

Use Cases

  • Scientific data discovery — Build searchable catalogs of oceanographic, climate, and atmospheric datasets
  • Data integration pipelines — Automatically discover and ingest NOAA datasets into your research infrastructure
  • Climate & weather research — Index datasets by institution (NOAA regional offices, universities, research organizations)
  • Environmental monitoring — Discover real-time and historical datasets (sea surface temperature, wind, currents, precipitation)
  • Reproducible science — Capture metadata snapshots of datasets used in analyses (versioning, attribution, availability)
  • Data governance — Audit which datasets are available, accessible, and updated across NOAA ERDDAP network
  • RAG pipeline ingestion — Structured dataset metadata ready for LLM-based research assistant systems

Output Data Fields

FieldTypeDescription
datasetIdstringUnique ERDDAP dataset identifier (e.g., erdSST3day, jplG1SST)
titlestringHuman-readable dataset title
summarystringDataset description and scientific context
institutionstringData provider (NOAA CoastWatch, AOML, ICOADS, etc.)
accessiblestringAccess status ("true" if publicly accessible)
griddapUrlstringDirect URL to gridded data access interface (null if N/A)
tabledapUrlstringDirect URL to tabular data access interface (null if N/A)
wmsUrlstringWeb Map Service URL for visualization (null if N/A)
infoUrlstringDataset information and metadata page
makeAGraphUrlstringInteractive data viewer URL
fgdcUrlstringFGDC XML metadata record URL
iso19115UrlstringISO 19115 XML metadata record URL
backgroundInfoUrlstringFull dataset documentation URL
rssUrlstringRSS feed for dataset updates
serverstringERDDAP server base URL

How to Scrape NOAA ERDDAP Dataset Catalogs

1. Open the Actor

Navigate to the NOAA ERDDAP Dataset Catalog Scraper on Apify Store and click Start.

2. Configure the Input (Optional)

The Actor comes with sensible defaults. Customize as needed:

  • ERDDAP Server URL — Base URL of the ERDDAP instance to catalog

    • Default: https://coastwatch.pfeg.noaa.gov/erddap (CoastWatch — most comprehensive)
    • Alternatives:
      • https://upwell.pfeg.noaa.gov/erddap (real-time coastal data)
      • https://erddap.aoml.noaa.gov/hdb/erddap (Atlantic Oceanographic & Meteorological Laboratory)
      • https://erddap.icoads.noaa.gov/erddap (International Comprehensive Ocean-Atmosphere Data Set)
  • Title Contains — Filter to datasets whose title includes this phrase (case-insensitive)

    • Example: "temperature" will match "Sea Surface Temperature", "GHRSST Temperature"
  • Institution Contains — Filter to datasets from a specific data provider

    • Example: "CoastWatch" will match all NOAA CoastWatch datasets
  • Data Type Filter — Limit to specific data format

    • any (default) — All dataset types
    • griddap — Gridded data (netCDF-style arrays, e.g., satellite sea surface temperature)
    • tabledap — Tabular data (CSV-style rows, e.g., buoy observations, station time series)
  • Max Results — Maximum number of datasets to return (default 1000, max 20,000)

3. Click Start

The Actor will fetch the catalog from your chosen ERDDAP server, apply filters, and return structured dataset metadata.

4. Download Results

From the Dataset tab, export as JSON, CSV, Excel, or XML. Use the JSON output directly in Python/R scripts or import into data tools.

Input Example

{
"serverUrl": "https://coastwatch.pfeg.noaa.gov/erddap",
"titleContains": "temperature",
"institutionContains": "CoastWatch",
"dataTypeFilter": "griddap",
"maxResults": 50
}

Output Example

{
"datasetId": "erdSST3day",
"title": "NOAA Global SST Analysis V3",
"summary": "NOAA OI SST provides daily global sea surface temperature from blended satellite and in-situ measurements on a 0.25 degree grid.",
"institution": "NOAA CoastWatch",
"accessible": "true",
"griddapUrl": "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdSST3day",
"tabledapUrl": null,
"wmsUrl": "https://coastwatch.pfeg.noaa.gov/erddap/wms/erdSST3day",
"infoUrl": "https://coastwatch.pfeg.noaa.gov/erddap/info/erdSST3day/index.html",
"makeAGraphUrl": "https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdSST3day.htmlTable",
"fgdcUrl": "https://coastwatch.pfeg.noaa.gov/erddap/metadata/fgdc/xml/erdSST3day_fgdc.xml",
"iso19115Url": "https://coastwatch.pfeg.noaa.gov/erddap/metadata/iso19115/xml/erdSST3day_iso19115.xml",
"backgroundInfoUrl": "https://coastwatch.pfeg.noaa.gov/erddap/info/erdSST3day",
"rssUrl": "https://coastwatch.pfeg.noaa.gov/erddap/rss/erdSST3day.rss",
"server": "https://coastwatch.pfeg.noaa.gov/erddap"
}

Pricing

This Actor fetches dataset catalogs from NOAA ERDDAP public APIs.

  • Cost per run: ~$0.001-0.002 (API request, no browser required)
  • Actor start event: Default platform rate
  • Per-result pricing: $0.002/dataset

A typical run catalogs 100-500 datasets (CoastWatch server). Full pagination of a large ERDDAP instance (2000+ datasets) takes 2-5 minutes and costs approximately $2-5 in actor fees plus minimal Apify compute.

Use Cases Explained

Scientific Data Discovery

Researchers can search the ERDDAP network for datasets matching their study area or variables without manually visiting each server. Example: "Find all datasets with 'chlorophyll' in the title from NOAA CoastWatch."

Climate & Weather Data Processing

Climate researchers use this to automatically discover and ingest NOAA datasets. The output includes direct griddap/tabledap URLs ready for subsetting in Python (via netCDF4) or R (via rerddap).

Reproducible Science

Capture the metadata and access URLs of datasets used in a published analysis. Store this as supplementary material to ensure future readers can locate and cite the exact datasets.

Environmental Monitoring Dashboards

Ingest ERDDAP dataset metadata into a data catalog system. Alert teams when new datasets are added or updated.

Data Governance & Compliance

Audit access to NOAA data across your organization. Discover which datasets are available and track their update frequency.

Example: Integrating into Python Workflow

import json
import requests
import xarray as xr
# Download dataset list from Actor output
with open('erddap_datasets.json') as f:
datasets = json.load(f)
# Filter to sea surface temperature datasets
sst_datasets = [d for d in datasets if 'temperature' in d['title'].lower()]
# For each dataset, fetch metadata and load data
for dataset in sst_datasets[:3]:
dataset_id = dataset['datasetId']
griddap_url = dataset['griddapUrl']
if griddap_url:
# Construct netCDF download URL
nc_url = f"{griddap_url}.nc?time[0:100]&latitude[0:10]&longitude[0:10]"
print(f"Loading {dataset_id}...")
try:
ds = xr.open_dataset(nc_url)
print(f" Variables: {list(ds.data_vars)}")
except Exception as e:
print(f" Error: {e}")

FAQ

Which ERDDAP servers does this Actor support?

This Actor works with any public ERDDAP server. NOAA operates several regional instances:

You can also query university or research organization ERDDAP instances.

How many datasets are on a typical ERDDAP server?

CoastWatch has 200+ datasets. Smaller regional instances may have 50-100. The Actor will paginate through all available datasets up to your maxResults limit.

What's the difference between griddap and tabledap?

  • griddap — Gridded data access. Returns multi-dimensional arrays (e.g., sea surface temperature at lat/lon/time). Optimized for spatial/temporal subsetting.
  • tabledap — Tabular data access. Returns rows like a CSV or SQL query (e.g., individual buoy measurements). Optimized for filtering by attributes.

Both can return netCDF, CSV, JSON, or other formats. Use the URLs provided to construct custom data download requests.

How often is ERDDAP data updated?

ERDDAP servers catalog real-time satellite data, model output, and historical observations. Update frequencies vary by dataset: real-time (daily or more frequent), monthly, or static. Check the dataset's RSS feed (included in output) for update notifications.

Is ERDDAP data free to use?

Yes. NOAA ERDDAP data is public and typically available under the NOAA Open Data policy or Creative Commons licenses. See the dataset's metadata page for licensing details. Most are freely available for research, education, and operational use.

How do I download actual data from these datasets?

Use the griddapUrl or tabledapUrl included in the output. These URLs point to ERDDAP's data access interface. For example:

  • Gridded data: {griddapUrl}.nc?variable[time][lat][lon] to download netCDF
  • Tabular data: {tabledapUrl}.csv?select=variables&where=constraints to download CSV

ERDDAP supports multiple output formats (netCDF, CSV, JSON, GeoTIFF, etc.). See ERDDAP's API documentation for syntax.

Can I query datasets across multiple ERDDAP servers?

Run the Actor once per server. Results will include the server field so you can identify which server each dataset came from.

This Actor accesses publicly available NOAA ERDDAP datasets. No authentication bypass or terms-of-service violation is involved. All data extracted is from public ERDDAP catalog APIs (/info/index.json). Users are responsible for ensuring their use of the extracted data complies with NOAA's Open Data policy and applicable licenses. For questions about data licensing, visit the NOAA ERDDAP documentation at https://coastwatch.pfeg.noaa.gov/erddap/information.html.

Other Scrapers by SeatSignal

Support

For issues, questions, or feature requests, contact the Actor developer through the Apify Store.