Data.gov Dataset Search avatar

Data.gov Dataset Search

Pricing

from $2.00 / 1,000 dataset fetcheds

Go to Apify Store
Data.gov Dataset Search

Data.gov Dataset Search

Search 300K+ US government open datasets on Data.gov. Filter by keyword, organization, tags, and format (CSV, JSON, API). Find federal, state, and local data from EPA, NOAA, NASA, Census Bureau. Free, no API key required.

Pricing

from $2.00 / 1,000 dataset fetcheds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 hours ago

Last modified

Share

Search and extract metadata from 300,000+ datasets in the official United States government open data catalog at Data.gov. This Apify actor queries the Data.gov CKAN API to find datasets by keyword, publishing organization, topic tag, or file format -- returning structured, machine-readable results with direct download links for every resource.

Whether you need environmental monitoring data from NOAA, public health datasets from the CDC, economic indicators from the Census Bureau, or transportation records from the DOT, this actor provides programmatic access to the entire Data.gov catalog. No API key is required -- the CKAN API backing Data.gov is completely free and open.

Each result includes the dataset title, description, publishing organization, tags, available file formats, individual resource download URLs, access level, creation and modification timestamps, and a direct link to the dataset page on Data.gov. Results are available in JSON, CSV, Excel, or any other format supported by the Apify platform.


  • No API integration required -- skip writing CKAN API code, handling pagination, and parsing nested responses. The actor handles all of that and returns clean, flat JSON.
  • Structured output at scale -- extract up to 500 datasets per run with consistent field names, ready for databases, spreadsheets, or downstream pipelines.
  • Powerful filtering -- combine keyword search with organization, tag, and format filters to precisely target datasets from specific agencies in specific formats.
  • Automation-ready -- schedule recurring searches to monitor new datasets from agencies you track, or trigger downstream workflows via webhooks when new data appears.
  • No authentication needed -- the Data.gov CKAN API is free and open. No API keys, tokens, or registration required.
  • Direct download links -- every result includes resource URLs so you can go straight from search results to downloading CSV, JSON, XML, or API endpoints.

Key features

  • Full-text search across 300,000+ datasets from federal, state, and local government agencies
  • Organization filtering to limit results to specific agencies (e.g., epa-gov, noaa-gov, nasa-gov, census-gov)
  • Tag-based filtering for topic searches (e.g., health, environment, transportation, energy)
  • Format filtering to find datasets available as CSV, JSON, API, XML, shapefile, or other formats
  • Four sort options -- Most Popular, Most Relevant, Recently Updated, or Name A-Z
  • Resource extraction with direct download URLs, format labels, and descriptions for every file in each dataset
  • Automatic format deduplication -- resource formats are normalized to uppercase and deduplicated per dataset
  • Rate-limited pagination with 200ms delays between API pages for reliable bulk extraction
  • Access level detection -- extracts whether datasets are public, restricted-public, or non-public from CKAN extras
  • Lightweight execution -- runs on 256 MB memory, completing most searches in 10-30 seconds

How to use

Apify Console

  1. Go to the Data.gov Dataset Search actor page on Apify.
  2. Click Try for free to open the actor in the Apify Console.
  3. Enter your search parameters:
    • Type keywords into Search Query (e.g., "air quality monitoring").
    • Optionally set Organization to restrict to one agency (e.g., "epa-gov").
    • Optionally add a Tag filter (e.g., "environment").
    • Optionally specify a Resource Format (e.g., "CSV").
    • Choose a Sort By option (default: Most Popular).
    • Set Max Results between 1 and 500 (default: 50).
  4. Click Start to run the actor.
  5. When the run finishes, view and download results from the Dataset tab in JSON, CSV, Excel, or other formats.

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/datagov-dataset-search").call(run_input={
"query": "air quality monitoring",
"organization": "epa-gov",
"format": "CSV",
"sortBy": "views_recent desc",
"maxResults": 100
})
for dataset in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{dataset['title']} -- {dataset['organizationTitle']}")
for resource in dataset["resources"]:
print(f" {resource['format']}: {resource['url']}")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/datagov-dataset-search").call({
query: "air quality monitoring",
organization: "epa-gov",
format: "CSV",
sortBy: "views_recent desc",
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((dataset) => {
console.log(`${dataset.title} -- ${dataset.organizationTitle}`);
dataset.resources.forEach((r) => console.log(` ${r.format}: ${r.url}`));
});

Input parameters

ParameterTypeRequiredDefaultDescription
queryStringNo--Keywords to search for datasets (e.g., "climate change", "water quality")
organizationStringNo--Filter by publishing organization slug (e.g., "epa-gov", "noaa-gov")
tagsStringNo--Filter by dataset tag (e.g., "health", "environment", "transportation")
formatStringNo--Filter by resource format (e.g., "CSV", "JSON", "API", "XML")
sortByStringNoviews_recent descSort order: Most Popular, Most Relevant, Recently Updated, or Name A-Z
maxResultsIntegerNo50Maximum number of datasets to return (1--500)

Example input

{
"query": "water quality monitoring",
"organization": "epa-gov",
"tags": "environment",
"format": "CSV",
"sortBy": "metadata_modified desc",
"maxResults": 100
}

Tips

  • If no query or filters are provided, the actor returns the most popular datasets across the entire catalog.
  • Organization slugs follow the format agency-gov. Common examples: epa-gov, noaa-gov, nasa-gov, census-gov, usda-gov, dot-gov, hhs-gov, doe-gov, dod-gov, dhs-gov.
  • Combine multiple filters for precise results -- for example, query: "PM2.5" with organization: "epa-gov" and format: "CSV".
  • Use sortBy: "metadata_modified desc" to find actively maintained, recently refreshed datasets.

Output

Example output

{
"datasetId": "e4c3b2a1-5f6d-7890-abcd-ef1234567890",
"title": "U.S. Hourly Climate Normals (1991-2020)",
"description": "The U.S. Climate Normals are a large suite of data products that provide information about typical climate conditions for thousands of weather station locations across the United States. Climate normals act as a baseline for evaluating how current weather and climate conditions compare to what is normal or expected.",
"organization": "noaa-gov",
"organizationTitle": "National Oceanic and Atmospheric Administration, Department of Commerce",
"tags": [
"climate",
"normals",
"hourly",
"temperature",
"precipitation",
"wind"
],
"formats": ["CSV", "PDF", "HTML"],
"resources": [
{
"name": "Hourly Climate Normals - Temperature",
"format": "CSV",
"url": "https://www.ncei.noaa.gov/data/normals-hourly/archive/us-climate-normals_hourly-temperature.csv",
"description": "Hourly temperature normals for U.S. weather stations"
},
{
"name": "Documentation",
"format": "PDF",
"url": "https://www.ncei.noaa.gov/data/normals-hourly/doc/Normals_Hourly_Documentation.pdf",
"description": "Technical documentation for hourly climate normals"
}
],
"resourceCount": 5,
"created": "2021-05-15T14:30:00.000000",
"modified": "2024-08-22T09:15:00.000000",
"accessLevel": "public",
"datagovUrl": "https://catalog.data.gov/dataset/u-s-hourly-climate-normals-1991-2020",
"extractedAt": "2026-02-19T12:00:00.000Z"
}

Output fields

FieldTypeDescription
datasetIdStringUnique CKAN identifier for the dataset
titleStringDataset title as published on Data.gov
descriptionStringDataset description (truncated to 1,000 characters)
organizationStringOrganization slug (e.g., "noaa-gov")
organizationTitleStringFull organization name (e.g., "National Oceanic and Atmospheric Administration, Department of Commerce")
tagsArrayList of topic tags assigned to the dataset
formatsArrayDeduplicated list of resource formats in uppercase (e.g., ["CSV", "JSON", "PDF"])
resourcesArrayIndividual resource files with name, format, url, and description
resourceCountIntegerTotal number of resources (files/endpoints) in the dataset
createdStringISO timestamp when the dataset was first published
modifiedStringISO timestamp when the dataset was last updated
accessLevelStringAccess classification -- typically "public", "restricted public", or "non-public"
datagovUrlStringDirect URL to the dataset page on catalog.data.gov
extractedAtStringISO timestamp when the actor extracted this record

Use cases

  • Academic research -- survey all government datasets on a topic like "climate change" or "opioid epidemic" to find primary data sources for papers and dissertations.
  • Journalism and FOIA work -- discover publicly available government datasets relevant to investigative stories before filing records requests.
  • Data science project sourcing -- find high-quality, structured government data in CSV or JSON format for machine learning training, analysis, or visualization projects.
  • Policy analysis -- locate datasets from specific agencies to evaluate the impact of federal programs, regulations, or spending initiatives.
  • Grant proposal research -- identify existing government datasets relevant to proposed research to demonstrate awareness of prior data and avoid duplication.
  • Open data monitoring -- schedule recurring runs to track new dataset publications from agencies like EPA, NOAA, or the Census Bureau and get notified when new data appears.
  • Data pipeline integration -- automatically discover and catalog government data sources, then feed resource URLs into ETL pipelines for regular data ingestion.
  • Civic technology -- find municipal and state government datasets for building public-facing applications, dashboards, and transparency tools.
  • Competitive intelligence -- monitor government contract, spending, and procurement datasets to track agency priorities and funding patterns.
  • Environmental compliance -- locate EPA monitoring data, emissions inventories, and facility-level environmental datasets for regulatory compliance work.

API & integration

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("1pdOsFEBvCm5RzMfM").call(run_input={
"query": "greenhouse gas emissions",
"organization": "epa-gov",
"format": "CSV",
"maxResults": 200
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']} ({item['resourceCount']} resources)")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("1pdOsFEBvCm5RzMfM").call({
query: "greenhouse gas emissions",
organization: "epa-gov",
format: "CSV",
maxResults: 200,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Found ${items.length} datasets`);

cURL

curl -X POST "https://api.apify.com/v2/acts/1pdOsFEBvCm5RzMfM/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "greenhouse gas emissions",
"organization": "epa-gov",
"format": "CSV",
"maxResults": 200
}'

Integrations

Connect Data.gov Dataset Search to your existing tools and workflows:

  • Google Sheets -- automatically export results to a spreadsheet after each run
  • Slack -- receive notifications when new datasets match your search criteria
  • Webhooks -- trigger custom workflows when the actor finishes
  • Zapier / Make -- connect to 5,000+ apps through Apify's integration platform
  • Amazon S3 -- save results directly to cloud storage for data lake ingestion
  • Google BigQuery -- load dataset metadata into BigQuery for SQL-based analysis

How it works

  1. Parse input -- the actor reads your search parameters (query, organization, tags, format, sort order, max results).
  2. Build CKAN query -- keywords, tags, and format filters are combined into a CKAN search query string. Organization filters are applied as fq (filter query) parameters.
  3. Paginate API calls -- the actor requests up to 100 datasets per page from the Data.gov CKAN API at catalog.data.gov/api/3/action/package_search, continuing until it reaches your maxResults limit or exhausts matching results.
  4. Rate limiting -- a 200ms delay is inserted between each API page request to respect the Data.gov API servers.
  5. Transform results -- each raw CKAN package is transformed into a clean output object with normalized fields, deduplicated formats, extracted access levels, and constructed Data.gov URLs.
  6. Push to dataset -- all transformed results are pushed to the Apify dataset for download in JSON, CSV, Excel, or other formats.
Data.gov Dataset Search Pipeline
================================
Input Parameters CKAN API Requests Transform & Output
================== ================= ==================
[ query ] catalog.data.gov [ datasetId ]
[ organization ] ---> /api/3/action/ ---> [ title ]
[ tags ] package_search [ description ]
[ format ] [ organization ]
[ sortBy ] +-- Page 1 (100) [ tags[] ]
[ maxResults ] +-- Page 2 (100) [ formats[] ]
+-- Page 3 (100) [ resources[] ]
+-- ... [ resourceCount ]
| [ created ]
200ms delay [ modified ]
between pages [ accessLevel ]
[ datagovUrl ]
[ extractedAt ]
|
v
Apify Dataset
(JSON/CSV/Excel)

Performance & cost

ScenarioMax ResultsTypical DurationMemoryEstimated Cost
Quick search105--8 seconds256 MB~$0.0003
Default search5010--15 seconds256 MB~$0.0005
Medium batch20020--40 seconds256 MB~$0.0015
Full extraction50060--120 seconds256 MB~$0.003
  • Free tier: Apify provides $5 of free platform credit monthly -- enough for approximately 8,000 default runs or 1,600 full extractions.
  • No external API costs: The Data.gov CKAN API is completely free with no rate limits enforced at the query level.

Limitations

  • 500 dataset limit per run -- the actor caps results at 500 per execution. For broader coverage, run multiple searches with different filters or sort orders.
  • Metadata only -- the actor extracts dataset metadata and resource URLs, not the actual data files. Use the resource URLs to download the files in a separate step.
  • Description truncation -- dataset descriptions are truncated to 1,000 characters to keep output sizes manageable.
  • Single tag filter -- only one tag can be applied per run. If you need multi-tag filtering, use keyword search with tag names in the query field.
  • Organization slug format -- organization filters use CKAN slugs (e.g., "epa-gov"), not full agency names. Check Data.gov for the correct slug.
  • CKAN API availability -- the actor depends on the Data.gov CKAN API being online. Government API outages during maintenance windows or shutdowns may cause failures.
  • No geospatial search -- the actor searches by text, tags, and organization but does not support bounding box or geographic filtering.

Responsible use

  • Respect government infrastructure -- while the Data.gov API has no strict rate limits, avoid running excessive concurrent requests. The actor includes built-in 200ms delays between pages.
  • Attribution -- when using government datasets in publications or applications, credit the publishing agency and link back to the original dataset on Data.gov.
  • Data accuracy -- government datasets vary in quality, timeliness, and completeness. Always verify data freshness by checking the modified timestamp and reading the dataset documentation.
  • Terms of use -- most Data.gov datasets are in the public domain, but some may have specific license terms. Check the access level and any associated license before redistribution.
  • Scheduling frequency -- for monitoring new datasets, daily or weekly schedules are sufficient. The catalog does not update frequently enough to warrant hourly polling.

FAQ

Do I need a Data.gov API key to use this actor? No. The Data.gov CKAN API is completely free and open. No API key, registration, or authentication is required.

What types of datasets are available on Data.gov? Data.gov contains 300,000+ datasets from hundreds of federal agencies, state governments, and local authorities. Topics include environmental monitoring, public health, economic indicators, transportation, education, energy, agriculture, housing, criminal justice, and more. Formats range from CSV and JSON to APIs, shapefiles, KML, and PDFs.

Can I download the actual data files using this actor? The actor returns metadata including direct download URLs in the resources array. Use those URLs to download the actual data files. For automated downloading, feed the URLs into a script or another Apify actor.

How often is the Data.gov catalog updated? The catalog is continuously updated as federal agencies publish new datasets and refresh existing ones. Sort by "Recently Updated" to find the latest additions and modifications.

What is the maximum number of results I can retrieve? 500 datasets per run. For broader coverage, run the actor multiple times with different search criteria, organizations, or sort orders.

How do I find the correct organization slug? Organization slugs follow the pattern agency-gov. Common examples: epa-gov, noaa-gov, nasa-gov, census-gov, usda-gov, dot-gov. You can also visit an agency's page on Data.gov and check the URL for the exact slug.

Can I search for datasets from state or local governments? Yes. Many state and local government organizations publish data on Data.gov. Include state or city names in your keyword search, or use the organization filter if you know the entity's CKAN slug.

What sort options are available? Four sort options: "Most Popular" (views_recent desc -- default), "Most Relevant" (score desc -- best for keyword searches), "Recently Updated" (metadata_modified desc), and "Name A-Z" (name asc).

Can I filter by multiple tags at once? The tag filter supports one tag per run. To search across multiple topics, include tag names as keywords in the query field (e.g., "health environment") or run the actor multiple times with different tag values.

Does this actor work during government shutdowns? The Data.gov API may experience downtime during federal government shutdowns or maintenance periods. If the API is unavailable, the actor will return an error. Re-run when service is restored.

How do I schedule this actor to run automatically? In the Apify Console, click Schedule on the actor run page to set up daily, weekly, or custom cron-based schedules. You can also configure webhooks to notify external services when each run completes.

What does the accessLevel field mean? The accessLevel field indicates the dataset's openness classification from the CKAN extras metadata. Values are typically "public" (freely available), "restricted public" (available with conditions), or "non-public" (limited access). Most Data.gov datasets are public.


ActorDescription
FRED Economic Data SearchSearch the Federal Reserve Economic Data database for U.S. economic time series. Pairs with Data.gov for comprehensive economic research.
World Bank Development IndicatorsSearch World Bank indicators for international economic and social data. Compare U.S. government data with global benchmarks.
BLS US Economic Data SearchSearch Bureau of Labor Statistics for employment, inflation, and productivity data. Complements Data.gov labor and economic datasets.
USAspending Federal SpendingSearch federal spending and contract data from USAspending.gov. Combine with Data.gov to correlate agency budgets with published datasets.
Congress Bill SearchSearch congressional legislation and bill text. Cross-reference with Data.gov datasets to track data mandates in federal law.
Federal Register SearchSearch the Federal Register for proposed and final rules. Discover regulatory data requirements that lead to new Data.gov publications.