Data.gov Dataset Search
Pricing
from $2.00 / 1,000 dataset fetcheds
Data.gov Dataset Search
Search 300K+ US government open datasets on Data.gov. Filter by keyword, organization, tags, and format (CSV, JSON, API). Find federal, state, and local data from EPA, NOAA, NASA, Census Bureau. Free, no API key required.
Pricing
from $2.00 / 1,000 dataset fetcheds
Rating
0.0
(0)
Developer

ryan clinton
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 hours ago
Last modified
Categories
Share
Search and extract metadata from 300,000+ datasets in the official United States government open data catalog at Data.gov. This Apify actor queries the Data.gov CKAN API to find datasets by keyword, publishing organization, topic tag, or file format -- returning structured, machine-readable results with direct download links for every resource.
Whether you need environmental monitoring data from NOAA, public health datasets from the CDC, economic indicators from the Census Bureau, or transportation records from the DOT, this actor provides programmatic access to the entire Data.gov catalog. No API key is required -- the CKAN API backing Data.gov is completely free and open.
Each result includes the dataset title, description, publishing organization, tags, available file formats, individual resource download URLs, access level, creation and modification timestamps, and a direct link to the dataset page on Data.gov. Results are available in JSON, CSV, Excel, or any other format supported by the Apify platform.
Why use Data.gov Dataset Search?
- No API integration required -- skip writing CKAN API code, handling pagination, and parsing nested responses. The actor handles all of that and returns clean, flat JSON.
- Structured output at scale -- extract up to 500 datasets per run with consistent field names, ready for databases, spreadsheets, or downstream pipelines.
- Powerful filtering -- combine keyword search with organization, tag, and format filters to precisely target datasets from specific agencies in specific formats.
- Automation-ready -- schedule recurring searches to monitor new datasets from agencies you track, or trigger downstream workflows via webhooks when new data appears.
- No authentication needed -- the Data.gov CKAN API is free and open. No API keys, tokens, or registration required.
- Direct download links -- every result includes resource URLs so you can go straight from search results to downloading CSV, JSON, XML, or API endpoints.
Key features
- Full-text search across 300,000+ datasets from federal, state, and local government agencies
- Organization filtering to limit results to specific agencies (e.g.,
epa-gov,noaa-gov,nasa-gov,census-gov) - Tag-based filtering for topic searches (e.g., health, environment, transportation, energy)
- Format filtering to find datasets available as CSV, JSON, API, XML, shapefile, or other formats
- Four sort options -- Most Popular, Most Relevant, Recently Updated, or Name A-Z
- Resource extraction with direct download URLs, format labels, and descriptions for every file in each dataset
- Automatic format deduplication -- resource formats are normalized to uppercase and deduplicated per dataset
- Rate-limited pagination with 200ms delays between API pages for reliable bulk extraction
- Access level detection -- extracts whether datasets are public, restricted-public, or non-public from CKAN extras
- Lightweight execution -- runs on 256 MB memory, completing most searches in 10-30 seconds
How to use
Apify Console
- Go to the Data.gov Dataset Search actor page on Apify.
- Click Try for free to open the actor in the Apify Console.
- Enter your search parameters:
- Type keywords into Search Query (e.g., "air quality monitoring").
- Optionally set Organization to restrict to one agency (e.g., "epa-gov").
- Optionally add a Tag filter (e.g., "environment").
- Optionally specify a Resource Format (e.g., "CSV").
- Choose a Sort By option (default: Most Popular).
- Set Max Results between 1 and 500 (default: 50).
- Click Start to run the actor.
- When the run finishes, view and download results from the Dataset tab in JSON, CSV, Excel, or other formats.
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("ryanclinton/datagov-dataset-search").call(run_input={"query": "air quality monitoring","organization": "epa-gov","format": "CSV","sortBy": "views_recent desc","maxResults": 100})for dataset in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{dataset['title']} -- {dataset['organizationTitle']}")for resource in dataset["resources"]:print(f" {resource['format']}: {resource['url']}")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("ryanclinton/datagov-dataset-search").call({query: "air quality monitoring",organization: "epa-gov",format: "CSV",sortBy: "views_recent desc",maxResults: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((dataset) => {console.log(`${dataset.title} -- ${dataset.organizationTitle}`);dataset.resources.forEach((r) => console.log(` ${r.format}: ${r.url}`));});
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | String | No | -- | Keywords to search for datasets (e.g., "climate change", "water quality") |
organization | String | No | -- | Filter by publishing organization slug (e.g., "epa-gov", "noaa-gov") |
tags | String | No | -- | Filter by dataset tag (e.g., "health", "environment", "transportation") |
format | String | No | -- | Filter by resource format (e.g., "CSV", "JSON", "API", "XML") |
sortBy | String | No | views_recent desc | Sort order: Most Popular, Most Relevant, Recently Updated, or Name A-Z |
maxResults | Integer | No | 50 | Maximum number of datasets to return (1--500) |
Example input
{"query": "water quality monitoring","organization": "epa-gov","tags": "environment","format": "CSV","sortBy": "metadata_modified desc","maxResults": 100}
Tips
- If no query or filters are provided, the actor returns the most popular datasets across the entire catalog.
- Organization slugs follow the format
agency-gov. Common examples:epa-gov,noaa-gov,nasa-gov,census-gov,usda-gov,dot-gov,hhs-gov,doe-gov,dod-gov,dhs-gov. - Combine multiple filters for precise results -- for example,
query: "PM2.5"withorganization: "epa-gov"andformat: "CSV". - Use
sortBy: "metadata_modified desc"to find actively maintained, recently refreshed datasets.
Output
Example output
{"datasetId": "e4c3b2a1-5f6d-7890-abcd-ef1234567890","title": "U.S. Hourly Climate Normals (1991-2020)","description": "The U.S. Climate Normals are a large suite of data products that provide information about typical climate conditions for thousands of weather station locations across the United States. Climate normals act as a baseline for evaluating how current weather and climate conditions compare to what is normal or expected.","organization": "noaa-gov","organizationTitle": "National Oceanic and Atmospheric Administration, Department of Commerce","tags": ["climate","normals","hourly","temperature","precipitation","wind"],"formats": ["CSV", "PDF", "HTML"],"resources": [{"name": "Hourly Climate Normals - Temperature","format": "CSV","url": "https://www.ncei.noaa.gov/data/normals-hourly/archive/us-climate-normals_hourly-temperature.csv","description": "Hourly temperature normals for U.S. weather stations"},{"name": "Documentation","format": "PDF","url": "https://www.ncei.noaa.gov/data/normals-hourly/doc/Normals_Hourly_Documentation.pdf","description": "Technical documentation for hourly climate normals"}],"resourceCount": 5,"created": "2021-05-15T14:30:00.000000","modified": "2024-08-22T09:15:00.000000","accessLevel": "public","datagovUrl": "https://catalog.data.gov/dataset/u-s-hourly-climate-normals-1991-2020","extractedAt": "2026-02-19T12:00:00.000Z"}
Output fields
| Field | Type | Description |
|---|---|---|
datasetId | String | Unique CKAN identifier for the dataset |
title | String | Dataset title as published on Data.gov |
description | String | Dataset description (truncated to 1,000 characters) |
organization | String | Organization slug (e.g., "noaa-gov") |
organizationTitle | String | Full organization name (e.g., "National Oceanic and Atmospheric Administration, Department of Commerce") |
tags | Array | List of topic tags assigned to the dataset |
formats | Array | Deduplicated list of resource formats in uppercase (e.g., ["CSV", "JSON", "PDF"]) |
resources | Array | Individual resource files with name, format, url, and description |
resourceCount | Integer | Total number of resources (files/endpoints) in the dataset |
created | String | ISO timestamp when the dataset was first published |
modified | String | ISO timestamp when the dataset was last updated |
accessLevel | String | Access classification -- typically "public", "restricted public", or "non-public" |
datagovUrl | String | Direct URL to the dataset page on catalog.data.gov |
extractedAt | String | ISO timestamp when the actor extracted this record |
Use cases
- Academic research -- survey all government datasets on a topic like "climate change" or "opioid epidemic" to find primary data sources for papers and dissertations.
- Journalism and FOIA work -- discover publicly available government datasets relevant to investigative stories before filing records requests.
- Data science project sourcing -- find high-quality, structured government data in CSV or JSON format for machine learning training, analysis, or visualization projects.
- Policy analysis -- locate datasets from specific agencies to evaluate the impact of federal programs, regulations, or spending initiatives.
- Grant proposal research -- identify existing government datasets relevant to proposed research to demonstrate awareness of prior data and avoid duplication.
- Open data monitoring -- schedule recurring runs to track new dataset publications from agencies like EPA, NOAA, or the Census Bureau and get notified when new data appears.
- Data pipeline integration -- automatically discover and catalog government data sources, then feed resource URLs into ETL pipelines for regular data ingestion.
- Civic technology -- find municipal and state government datasets for building public-facing applications, dashboards, and transparency tools.
- Competitive intelligence -- monitor government contract, spending, and procurement datasets to track agency priorities and funding patterns.
- Environmental compliance -- locate EPA monitoring data, emissions inventories, and facility-level environmental datasets for regulatory compliance work.
API & integration
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("1pdOsFEBvCm5RzMfM").call(run_input={"query": "greenhouse gas emissions","organization": "epa-gov","format": "CSV","maxResults": 200})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']} ({item['resourceCount']} resources)")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("1pdOsFEBvCm5RzMfM").call({query: "greenhouse gas emissions",organization: "epa-gov",format: "CSV",maxResults: 200,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Found ${items.length} datasets`);
cURL
curl -X POST "https://api.apify.com/v2/acts/1pdOsFEBvCm5RzMfM/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"query": "greenhouse gas emissions","organization": "epa-gov","format": "CSV","maxResults": 200}'
Integrations
Connect Data.gov Dataset Search to your existing tools and workflows:
- Google Sheets -- automatically export results to a spreadsheet after each run
- Slack -- receive notifications when new datasets match your search criteria
- Webhooks -- trigger custom workflows when the actor finishes
- Zapier / Make -- connect to 5,000+ apps through Apify's integration platform
- Amazon S3 -- save results directly to cloud storage for data lake ingestion
- Google BigQuery -- load dataset metadata into BigQuery for SQL-based analysis
How it works
- Parse input -- the actor reads your search parameters (query, organization, tags, format, sort order, max results).
- Build CKAN query -- keywords, tags, and format filters are combined into a CKAN search query string. Organization filters are applied as
fq(filter query) parameters. - Paginate API calls -- the actor requests up to 100 datasets per page from the Data.gov CKAN API at
catalog.data.gov/api/3/action/package_search, continuing until it reaches yourmaxResultslimit or exhausts matching results. - Rate limiting -- a 200ms delay is inserted between each API page request to respect the Data.gov API servers.
- Transform results -- each raw CKAN package is transformed into a clean output object with normalized fields, deduplicated formats, extracted access levels, and constructed Data.gov URLs.
- Push to dataset -- all transformed results are pushed to the Apify dataset for download in JSON, CSV, Excel, or other formats.
Data.gov Dataset Search Pipeline================================Input Parameters CKAN API Requests Transform & Output================== ================= ==================[ query ] catalog.data.gov [ datasetId ][ organization ] ---> /api/3/action/ ---> [ title ][ tags ] package_search [ description ][ format ] [ organization ][ sortBy ] +-- Page 1 (100) [ tags[] ][ maxResults ] +-- Page 2 (100) [ formats[] ]+-- Page 3 (100) [ resources[] ]+-- ... [ resourceCount ]| [ created ]200ms delay [ modified ]between pages [ accessLevel ][ datagovUrl ][ extractedAt ]|vApify Dataset(JSON/CSV/Excel)
Performance & cost
| Scenario | Max Results | Typical Duration | Memory | Estimated Cost |
|---|---|---|---|---|
| Quick search | 10 | 5--8 seconds | 256 MB | ~$0.0003 |
| Default search | 50 | 10--15 seconds | 256 MB | ~$0.0005 |
| Medium batch | 200 | 20--40 seconds | 256 MB | ~$0.0015 |
| Full extraction | 500 | 60--120 seconds | 256 MB | ~$0.003 |
- Free tier: Apify provides $5 of free platform credit monthly -- enough for approximately 8,000 default runs or 1,600 full extractions.
- No external API costs: The Data.gov CKAN API is completely free with no rate limits enforced at the query level.
Limitations
- 500 dataset limit per run -- the actor caps results at 500 per execution. For broader coverage, run multiple searches with different filters or sort orders.
- Metadata only -- the actor extracts dataset metadata and resource URLs, not the actual data files. Use the resource URLs to download the files in a separate step.
- Description truncation -- dataset descriptions are truncated to 1,000 characters to keep output sizes manageable.
- Single tag filter -- only one tag can be applied per run. If you need multi-tag filtering, use keyword search with tag names in the query field.
- Organization slug format -- organization filters use CKAN slugs (e.g., "epa-gov"), not full agency names. Check Data.gov for the correct slug.
- CKAN API availability -- the actor depends on the Data.gov CKAN API being online. Government API outages during maintenance windows or shutdowns may cause failures.
- No geospatial search -- the actor searches by text, tags, and organization but does not support bounding box or geographic filtering.
Responsible use
- Respect government infrastructure -- while the Data.gov API has no strict rate limits, avoid running excessive concurrent requests. The actor includes built-in 200ms delays between pages.
- Attribution -- when using government datasets in publications or applications, credit the publishing agency and link back to the original dataset on Data.gov.
- Data accuracy -- government datasets vary in quality, timeliness, and completeness. Always verify data freshness by checking the
modifiedtimestamp and reading the dataset documentation. - Terms of use -- most Data.gov datasets are in the public domain, but some may have specific license terms. Check the access level and any associated license before redistribution.
- Scheduling frequency -- for monitoring new datasets, daily or weekly schedules are sufficient. The catalog does not update frequently enough to warrant hourly polling.
FAQ
Do I need a Data.gov API key to use this actor? No. The Data.gov CKAN API is completely free and open. No API key, registration, or authentication is required.
What types of datasets are available on Data.gov? Data.gov contains 300,000+ datasets from hundreds of federal agencies, state governments, and local authorities. Topics include environmental monitoring, public health, economic indicators, transportation, education, energy, agriculture, housing, criminal justice, and more. Formats range from CSV and JSON to APIs, shapefiles, KML, and PDFs.
Can I download the actual data files using this actor?
The actor returns metadata including direct download URLs in the resources array. Use those URLs to download the actual data files. For automated downloading, feed the URLs into a script or another Apify actor.
How often is the Data.gov catalog updated? The catalog is continuously updated as federal agencies publish new datasets and refresh existing ones. Sort by "Recently Updated" to find the latest additions and modifications.
What is the maximum number of results I can retrieve? 500 datasets per run. For broader coverage, run the actor multiple times with different search criteria, organizations, or sort orders.
How do I find the correct organization slug?
Organization slugs follow the pattern agency-gov. Common examples: epa-gov, noaa-gov, nasa-gov, census-gov, usda-gov, dot-gov. You can also visit an agency's page on Data.gov and check the URL for the exact slug.
Can I search for datasets from state or local governments? Yes. Many state and local government organizations publish data on Data.gov. Include state or city names in your keyword search, or use the organization filter if you know the entity's CKAN slug.
What sort options are available? Four sort options: "Most Popular" (views_recent desc -- default), "Most Relevant" (score desc -- best for keyword searches), "Recently Updated" (metadata_modified desc), and "Name A-Z" (name asc).
Can I filter by multiple tags at once? The tag filter supports one tag per run. To search across multiple topics, include tag names as keywords in the query field (e.g., "health environment") or run the actor multiple times with different tag values.
Does this actor work during government shutdowns? The Data.gov API may experience downtime during federal government shutdowns or maintenance periods. If the API is unavailable, the actor will return an error. Re-run when service is restored.
How do I schedule this actor to run automatically? In the Apify Console, click Schedule on the actor run page to set up daily, weekly, or custom cron-based schedules. You can also configure webhooks to notify external services when each run completes.
What does the accessLevel field mean?
The accessLevel field indicates the dataset's openness classification from the CKAN extras metadata. Values are typically "public" (freely available), "restricted public" (available with conditions), or "non-public" (limited access). Most Data.gov datasets are public.
Related actors
| Actor | Description |
|---|---|
| FRED Economic Data Search | Search the Federal Reserve Economic Data database for U.S. economic time series. Pairs with Data.gov for comprehensive economic research. |
| World Bank Development Indicators | Search World Bank indicators for international economic and social data. Compare U.S. government data with global benchmarks. |
| BLS US Economic Data Search | Search Bureau of Labor Statistics for employment, inflation, and productivity data. Complements Data.gov labor and economic datasets. |
| USAspending Federal Spending | Search federal spending and contract data from USAspending.gov. Combine with Data.gov to correlate agency budgets with published datasets. |
| Congress Bill Search | Search congressional legislation and bill text. Cross-reference with Data.gov datasets to track data mandates in federal law. |
| Federal Register Search | Search the Federal Register for proposed and final rules. Discover regulatory data requirements that lead to new Data.gov publications. |