NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data
Pricing
from $3.00 / 1,000 results
NOAA ERDDAP Dataset Catalog Scraper — Ocean & Climate Data
Catalog every dataset published by any NOAA ERDDAP server (CoastWatch, AOML, ICOADS, Upwell, and others). Filter by title, institution, or data type (grid vs table). Returns dataset IDs, summaries, and direct griddap/tabledap/WMS/FGDC URLs for oceanographic and climate research pipelines.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Compute Edge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
NOAA ERDDAP Dataset Catalog Scraper
Catalog every dataset published by any NOAA ERDDAP server — the authoritative source for oceanographic, climate, and atmospheric data. This Actor fetches the complete dataset index from NOAA's Earth System Research Laboratories (ESRL) ERDDAP network and provides structured access to dataset metadata, direct data access URLs (griddap, tabledap, WMS), and full catalog URLs for scientific research pipelines.
ERDDAP servers are deployed across NOAA's ecosystem: CoastWatch (coastal oceanography), AOML (Atlantic Oceanographic), ICOADS (climate observations), Upwell (real-time environmental data), and others. This Actor discovers and catalogs datasets across any publicly accessible ERDDAP instance.
Key Features
- Complete dataset enumeration — Fetch 100+ datasets per ERDDAP server with pagination
- Multi-server support — Query any NOAA ERDDAP server (CoastWatch, AOML, ICOADS, Upwell, etc.) or custom ERDDAP instances
- Flexible filtering — Filter by dataset title, institution, or data type (gridded vs. tabular)
- Full metadata extraction — Dataset ID, title, summary, institution, access URLs (griddap, tabledap, WMS, FGDC, ISO 19115)
- Direct data access URLs — Pre-built URLs for each dataset's griddap/tabledap interface, WMS service, metadata records
- No authentication required — Public ERDDAP APIs, no credentials needed
- Scientific pipeline ready — Clean JSON output structured for integration with data processing workflows, Jupyter notebooks, RAG systems
Use Cases
- Scientific data discovery — Build searchable catalogs of oceanographic, climate, and atmospheric datasets
- Data integration pipelines — Automatically discover and ingest NOAA datasets into your research infrastructure
- Climate & weather research — Index datasets by institution (NOAA regional offices, universities, research organizations)
- Environmental monitoring — Discover real-time and historical datasets (sea surface temperature, wind, currents, precipitation)
- Reproducible science — Capture metadata snapshots of datasets used in analyses (versioning, attribution, availability)
- Data governance — Audit which datasets are available, accessible, and updated across NOAA ERDDAP network
- RAG pipeline ingestion — Structured dataset metadata ready for LLM-based research assistant systems
Output Data Fields
| Field | Type | Description |
|---|---|---|
datasetId | string | Unique ERDDAP dataset identifier (e.g., erdSST3day, jplG1SST) |
title | string | Human-readable dataset title |
summary | string | Dataset description and scientific context |
institution | string | Data provider (NOAA CoastWatch, AOML, ICOADS, etc.) |
accessible | string | Access status ("true" if publicly accessible) |
griddapUrl | string | Direct URL to gridded data access interface (null if N/A) |
tabledapUrl | string | Direct URL to tabular data access interface (null if N/A) |
wmsUrl | string | Web Map Service URL for visualization (null if N/A) |
infoUrl | string | Dataset information and metadata page |
makeAGraphUrl | string | Interactive data viewer URL |
fgdcUrl | string | FGDC XML metadata record URL |
iso19115Url | string | ISO 19115 XML metadata record URL |
backgroundInfoUrl | string | Full dataset documentation URL |
rssUrl | string | RSS feed for dataset updates |
server | string | ERDDAP server base URL |
How to Scrape NOAA ERDDAP Dataset Catalogs
1. Open the Actor
Navigate to the NOAA ERDDAP Dataset Catalog Scraper on Apify Store and click Start.
2. Configure the Input (Optional)
The Actor comes with sensible defaults. Customize as needed:
-
ERDDAP Server URL — Base URL of the ERDDAP instance to catalog
- Default:
https://coastwatch.pfeg.noaa.gov/erddap(CoastWatch — most comprehensive) - Alternatives:
https://upwell.pfeg.noaa.gov/erddap(real-time coastal data)https://erddap.aoml.noaa.gov/hdb/erddap(Atlantic Oceanographic & Meteorological Laboratory)https://erddap.icoads.noaa.gov/erddap(International Comprehensive Ocean-Atmosphere Data Set)
- Default:
-
Title Contains — Filter to datasets whose title includes this phrase (case-insensitive)
- Example: "temperature" will match "Sea Surface Temperature", "GHRSST Temperature"
-
Institution Contains — Filter to datasets from a specific data provider
- Example: "CoastWatch" will match all NOAA CoastWatch datasets
-
Data Type Filter — Limit to specific data format
any(default) — All dataset typesgriddap— Gridded data (netCDF-style arrays, e.g., satellite sea surface temperature)tabledap— Tabular data (CSV-style rows, e.g., buoy observations, station time series)
-
Max Results — Maximum number of datasets to return (default 1000, max 20,000)
3. Click Start
The Actor will fetch the catalog from your chosen ERDDAP server, apply filters, and return structured dataset metadata.
4. Download Results
From the Dataset tab, export as JSON, CSV, Excel, or XML. Use the JSON output directly in Python/R scripts or import into data tools.
Input Example
{"serverUrl": "https://coastwatch.pfeg.noaa.gov/erddap","titleContains": "temperature","institutionContains": "CoastWatch","dataTypeFilter": "griddap","maxResults": 50}
Output Example
{"datasetId": "erdSST3day","title": "NOAA Global SST Analysis V3","summary": "NOAA OI SST provides daily global sea surface temperature from blended satellite and in-situ measurements on a 0.25 degree grid.","institution": "NOAA CoastWatch","accessible": "true","griddapUrl": "https://coastwatch.pfeg.noaa.gov/erddap/griddap/erdSST3day","tabledapUrl": null,"wmsUrl": "https://coastwatch.pfeg.noaa.gov/erddap/wms/erdSST3day","infoUrl": "https://coastwatch.pfeg.noaa.gov/erddap/info/erdSST3day/index.html","makeAGraphUrl": "https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdSST3day.htmlTable","fgdcUrl": "https://coastwatch.pfeg.noaa.gov/erddap/metadata/fgdc/xml/erdSST3day_fgdc.xml","iso19115Url": "https://coastwatch.pfeg.noaa.gov/erddap/metadata/iso19115/xml/erdSST3day_iso19115.xml","backgroundInfoUrl": "https://coastwatch.pfeg.noaa.gov/erddap/info/erdSST3day","rssUrl": "https://coastwatch.pfeg.noaa.gov/erddap/rss/erdSST3day.rss","server": "https://coastwatch.pfeg.noaa.gov/erddap"}
Pricing
This Actor fetches dataset catalogs from NOAA ERDDAP public APIs.
- Cost per run: ~$0.001-0.002 (API request, no browser required)
- Actor start event: Default platform rate
- Per-result pricing: $0.002/dataset
A typical run catalogs 100-500 datasets (CoastWatch server). Full pagination of a large ERDDAP instance (2000+ datasets) takes 2-5 minutes and costs approximately $2-5 in actor fees plus minimal Apify compute.
Use Cases Explained
Scientific Data Discovery
Researchers can search the ERDDAP network for datasets matching their study area or variables without manually visiting each server. Example: "Find all datasets with 'chlorophyll' in the title from NOAA CoastWatch."
Climate & Weather Data Processing
Climate researchers use this to automatically discover and ingest NOAA datasets. The output includes direct griddap/tabledap URLs ready for subsetting in Python (via netCDF4) or R (via rerddap).
Reproducible Science
Capture the metadata and access URLs of datasets used in a published analysis. Store this as supplementary material to ensure future readers can locate and cite the exact datasets.
Environmental Monitoring Dashboards
Ingest ERDDAP dataset metadata into a data catalog system. Alert teams when new datasets are added or updated.
Data Governance & Compliance
Audit access to NOAA data across your organization. Discover which datasets are available and track their update frequency.
Example: Integrating into Python Workflow
import jsonimport requestsimport xarray as xr# Download dataset list from Actor outputwith open('erddap_datasets.json') as f:datasets = json.load(f)# Filter to sea surface temperature datasetssst_datasets = [d for d in datasets if 'temperature' in d['title'].lower()]# For each dataset, fetch metadata and load datafor dataset in sst_datasets[:3]:dataset_id = dataset['datasetId']griddap_url = dataset['griddapUrl']if griddap_url:# Construct netCDF download URLnc_url = f"{griddap_url}.nc?time[0:100]&latitude[0:10]&longitude[0:10]"print(f"Loading {dataset_id}...")try:ds = xr.open_dataset(nc_url)print(f" Variables: {list(ds.data_vars)}")except Exception as e:print(f" Error: {e}")
FAQ
Which ERDDAP servers does this Actor support?
This Actor works with any public ERDDAP server. NOAA operates several regional instances:
- CoastWatch (default): https://coastwatch.pfeg.noaa.gov/erddap — Most datasets, ocean-focused
- Upwell: https://upwell.pfeg.noaa.gov/erddap — Real-time environmental data
- AOML: https://erddap.aoml.noaa.gov/hdb/erddap — Atlantic region oceanography
- ICOADS: https://erddap.icoads.noaa.gov/erddap — International climate observations
You can also query university or research organization ERDDAP instances.
How many datasets are on a typical ERDDAP server?
CoastWatch has 200+ datasets. Smaller regional instances may have 50-100. The Actor will paginate through all available datasets up to your maxResults limit.
What's the difference between griddap and tabledap?
- griddap — Gridded data access. Returns multi-dimensional arrays (e.g., sea surface temperature at lat/lon/time). Optimized for spatial/temporal subsetting.
- tabledap — Tabular data access. Returns rows like a CSV or SQL query (e.g., individual buoy measurements). Optimized for filtering by attributes.
Both can return netCDF, CSV, JSON, or other formats. Use the URLs provided to construct custom data download requests.
How often is ERDDAP data updated?
ERDDAP servers catalog real-time satellite data, model output, and historical observations. Update frequencies vary by dataset: real-time (daily or more frequent), monthly, or static. Check the dataset's RSS feed (included in output) for update notifications.
Is ERDDAP data free to use?
Yes. NOAA ERDDAP data is public and typically available under the NOAA Open Data policy or Creative Commons licenses. See the dataset's metadata page for licensing details. Most are freely available for research, education, and operational use.
How do I download actual data from these datasets?
Use the griddapUrl or tabledapUrl included in the output. These URLs point to ERDDAP's data access interface. For example:
- Gridded data:
{griddapUrl}.nc?variable[time][lat][lon]to download netCDF - Tabular data:
{tabledapUrl}.csv?select=variables&where=constraintsto download CSV
ERDDAP supports multiple output formats (netCDF, CSV, JSON, GeoTIFF, etc.). See ERDDAP's API documentation for syntax.
Can I query datasets across multiple ERDDAP servers?
Run the Actor once per server. Results will include the server field so you can identify which server each dataset came from.
Legal Disclaimer
This Actor accesses publicly available NOAA ERDDAP datasets. No authentication bypass or terms-of-service violation is involved. All data extracted is from public ERDDAP catalog APIs (/info/index.json). Users are responsible for ensuring their use of the extracted data complies with NOAA's Open Data policy and applicable licenses. For questions about data licensing, visit the NOAA ERDDAP documentation at https://coastwatch.pfeg.noaa.gov/erddap/information.html.
Other Scrapers by SeatSignal
- CISA Known Exploited Vulnerabilities (KEV) Scraper — Extract CVE threat intelligence
- NIST NVD Scraper — Extract NIST National Vulnerability Database
- NHTSA Vehicle Safety Scraper — Extract vehicle recalls
- OSHA Inspections Scraper — Extract OSHA inspection data
- FDA OpenFDA Scraper — Extract FDA drug and device safety
Support
For issues, questions, or feature requests, contact the Actor developer through the Apify Store.