Pricing

from $2.00 / 1,000 dataset fetcheds

Data.gov Dataset Search

Search and extract metadata from 300,000+ datasets in the official United States government open data catalog at [Data.gov](https://catalog.data.gov/).

Pricing

from $2.00 / 1,000 dataset fetcheds

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Why use Data.gov Dataset Search?

No API integration required -- skip writing CKAN API code, handling pagination, and parsing nested responses. The actor handles all of that and returns clean, flat JSON.
Structured output at scale -- extract up to 500 datasets per run with consistent field names, ready for databases, spreadsheets, or downstream pipelines.
Powerful filtering -- combine keyword search with organization, tag, and format filters to precisely target datasets from specific agencies in specific formats.
Automation-ready -- schedule recurring searches to monitor new datasets from agencies you track, or trigger downstream workflows via webhooks when new data appears.
No authentication needed -- the Data.gov CKAN API is free and open. No API keys, tokens, or registration required.
Direct download links -- every result includes resource URLs so you can go straight from search results to downloading CSV, JSON, XML, or API endpoints.

Key features

Full-text search across 300,000+ datasets from federal, state, and local government agencies
Organization filtering to limit results to specific agencies (e.g., epa-gov, noaa-gov, nasa-gov, census-gov)
Tag-based filtering for topic searches (e.g., health, environment, transportation, energy)
Format filtering to find datasets available as CSV, JSON, API, XML, shapefile, or other formats
Four sort options -- Most Popular, Most Relevant, Recently Updated, or Name A-Z
Resource extraction with direct download URLs, format labels, and descriptions for every file in each dataset
Automatic format deduplication -- resource formats are normalized to uppercase and deduplicated per dataset
Rate-limited pagination with 200ms delays between API pages for reliable bulk extraction
Access level detection -- extracts whether datasets are public, restricted-public, or non-public from CKAN extras
Lightweight execution -- runs on 256 MB memory, completing most searches in 10-30 seconds

How to use

Apify Console

Go to the Data.gov Dataset Search actor page on Apify.
Click Try for free to open the actor in the Apify Console.
Enter your search parameters:
- Type keywords into Search Query (e.g., "air quality monitoring").
- Optionally set Organization to restrict to one agency (e.g., "epa-gov").
- Optionally add a Tag filter (e.g., "environment").
- Optionally specify a Resource Format (e.g., "CSV").
- Choose a Sort By option (default: Most Popular).
- Set Max Results between 1 and 500 (default: 50).
Click Start to run the actor.
When the run finishes, view and download results from the Dataset tab in JSON, CSV, Excel, or other formats.

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/datagov-dataset-search").call(run_input={
    "query": "air quality monitoring",
    "organization": "epa-gov",
    "format": "CSV",
    "sortBy": "views_recent desc",
    "maxResults": 100
})

for dataset in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{dataset['title']} -- {dataset['organizationTitle']}")
    for resource in dataset["resources"]:
        print(f"  {resource['format']}: {resource['url']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/datagov-dataset-search").call({
    query: "air quality monitoring",
    organization: "epa-gov",
    format: "CSV",
    sortBy: "views_recent desc",
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((dataset) => {
    console.log(`${dataset.title} -- ${dataset.organizationTitle}`);
    dataset.resources.forEach((r) => console.log(`  ${r.format}: ${r.url}`));
});

Input parameters

Parameter	Type	Required	Default	Description
`query`	String	No	--	Keywords to search for datasets (e.g., "climate change", "water quality")
`organization`	String	No	--	Filter by publishing organization slug (e.g., "epa-gov", "noaa-gov")
`tags`	String	No	--	Filter by dataset tag (e.g., "health", "environment", "transportation")
`format`	String	No	--	Filter by resource format (e.g., "CSV", "JSON", "API", "XML")
`sortBy`	String	No	`views_recent desc`	Sort order: Most Popular, Most Relevant, Recently Updated, or Name A-Z
`maxResults`	Integer	No	50	Maximum number of datasets to return (1--500)

Example input

{
    "query": "water quality monitoring",
    "organization": "epa-gov",
    "tags": "environment",
    "format": "CSV",
    "sortBy": "metadata_modified desc",
    "maxResults": 100
}

Tips

If no query or filters are provided, the actor returns the most popular datasets across the entire catalog.
Organization slugs follow the format agency-gov. Common examples: epa-gov, noaa-gov, nasa-gov, census-gov, usda-gov, dot-gov, hhs-gov, doe-gov, dod-gov, dhs-gov.
Combine multiple filters for precise results -- for example, query: "PM2.5" with organization: "epa-gov" and format: "CSV".
Use sortBy: "metadata_modified desc" to find actively maintained, recently refreshed datasets.

Output

Example output

{
    "datasetId": "e4c3b2a1-5f6d-7890-abcd-ef1234567890",
    "title": "U.S. Hourly Climate Normals (1991-2020)",
    "description": "The U.S. Climate Normals are a large suite of data products that provide information about typical climate conditions for thousands of weather station locations across the United States. Climate normals act as a baseline for evaluating how current weather and climate conditions compare to what is normal or expected.",
    "organization": "noaa-gov",
    "organizationTitle": "National Oceanic and Atmospheric Administration, Department of Commerce",
    "tags": [
        "climate",
        "normals",
        "hourly",
        "temperature",
        "precipitation",
        "wind"
    ],
    "formats": ["CSV", "PDF", "HTML"],
    "resources": [
        {
            "name": "Hourly Climate Normals - Temperature",
            "format": "CSV",
            "url": "https://www.ncei.noaa.gov/data/normals-hourly/archive/us-climate-normals_hourly-temperature.csv",
            "description": "Hourly temperature normals for U.S. weather stations"
        },
        {
            "name": "Documentation",
            "format": "PDF",
            "url": "https://www.ncei.noaa.gov/data/normals-hourly/doc/Normals_Hourly_Documentation.pdf",
            "description": "Technical documentation for hourly climate normals"
        }
    ],
    "resourceCount": 5,
    "created": "2021-05-15T14:30:00.000000",
    "modified": "2024-08-22T09:15:00.000000",
    "accessLevel": "public",
    "datagovUrl": "https://catalog.data.gov/dataset/u-s-hourly-climate-normals-1991-2020",
    "extractedAt": "2026-02-19T12:00:00.000Z"
}

Output fields

Field	Type	Description
`datasetId`	String	Unique CKAN identifier for the dataset
`title`	String	Dataset title as published on Data.gov
`description`	String	Dataset description (truncated to 1,000 characters)
`organization`	String	Organization slug (e.g., "noaa-gov")
`organizationTitle`	String	Full organization name (e.g., "National Oceanic and Atmospheric Administration, Department of Commerce")
`tags`	Array	List of topic tags assigned to the dataset
`formats`	Array	Deduplicated list of resource formats in uppercase (e.g., ["CSV", "JSON", "PDF"])
`resources`	Array	Individual resource files with name, format, url, and description
`resourceCount`	Integer	Total number of resources (files/endpoints) in the dataset
`created`	String	ISO timestamp when the dataset was first published
`modified`	String	ISO timestamp when the dataset was last updated
`accessLevel`	String	Access classification -- typically "public", "restricted public", or "non-public"
`datagovUrl`	String	Direct URL to the dataset page on catalog.data.gov
`extractedAt`	String	ISO timestamp when the actor extracted this record

Use cases

Academic research -- survey all government datasets on a topic like "climate change" or "opioid epidemic" to find primary data sources for papers and dissertations.
Journalism and FOIA work -- discover publicly available government datasets relevant to investigative stories before filing records requests.
Data science project sourcing -- find high-quality, structured government data in CSV or JSON format for machine learning training, analysis, or visualization projects.
Policy analysis -- locate datasets from specific agencies to evaluate the impact of federal programs, regulations, or spending initiatives.
Grant proposal research -- identify existing government datasets relevant to proposed research to demonstrate awareness of prior data and avoid duplication.
Open data monitoring -- schedule recurring runs to track new dataset publications from agencies like EPA, NOAA, or the Census Bureau and get notified when new data appears.
Data pipeline integration -- automatically discover and catalog government data sources, then feed resource URLs into ETL pipelines for regular data ingestion.
Civic technology -- find municipal and state government datasets for building public-facing applications, dashboards, and transparency tools.
Competitive intelligence -- monitor government contract, spending, and procurement datasets to track agency priorities and funding patterns.
Environmental compliance -- locate EPA monitoring data, emissions inventories, and facility-level environmental datasets for regulatory compliance work.

API & integration

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("1pdOsFEBvCm5RzMfM").call(run_input={
    "query": "greenhouse gas emissions",
    "organization": "epa-gov",
    "format": "CSV",
    "maxResults": 200
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} ({item['resourceCount']} resources)")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("1pdOsFEBvCm5RzMfM").call({
    query: "greenhouse gas emissions",
    organization: "epa-gov",
    format: "CSV",
    maxResults: 200,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Found ${items.length} datasets`);

cURL

curl -X POST "https://api.apify.com/v2/acts/1pdOsFEBvCm5RzMfM/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "greenhouse gas emissions",
    "organization": "epa-gov",
    "format": "CSV",
    "maxResults": 200
  }'

Integrations

Connect Data.gov Dataset Search to your existing tools and workflows:

Google Sheets -- automatically export results to a spreadsheet after each run
Slack -- receive notifications when new datasets match your search criteria
Webhooks -- trigger custom workflows when the actor finishes
Zapier / Make -- connect to 5,000+ apps through Apify's integration platform
Amazon S3 -- save results directly to cloud storage for data lake ingestion
Google BigQuery -- load dataset metadata into BigQuery for SQL-based analysis

How it works

Parse input -- the actor reads your search parameters (query, organization, tags, format, sort order, max results).
Build CKAN query -- keywords, tags, and format filters are combined into a CKAN search query string. Organization filters are applied as fq (filter query) parameters.
Paginate API calls -- the actor requests up to 100 datasets per page from the Data.gov CKAN API at catalog.data.gov/api/3/action/package_search, continuing until it reaches your maxResults limit or exhausts matching results.
Rate limiting -- a 200ms delay is inserted between each API page request to respect the Data.gov API servers.
Transform results -- each raw CKAN package is transformed into a clean output object with normalized fields, deduplicated formats, extracted access levels, and constructed Data.gov URLs.
Push to dataset -- all transformed results are pushed to the Apify dataset for download in JSON, CSV, Excel, or other formats.

Data.gov Dataset Search Pipeline
                    ================================

  Input Parameters          CKAN API Requests         Transform & Output
  ==================        =================         ==================

  [ query         ]         catalog.data.gov          [ datasetId       ]
  [ organization  ]  --->   /api/3/action/      --->  [ title           ]
  [ tags          ]         package_search            [ description     ]
  [ format        ]                                   [ organization    ]
  [ sortBy        ]         +-- Page 1 (100)          [ tags[]          ]
  [ maxResults    ]         +-- Page 2 (100)          [ formats[]       ]
                            +-- Page 3 (100)          [ resources[]     ]
                            +-- ...                   [ resourceCount   ]
                            |                         [ created         ]
                            200ms delay               [ modified        ]
                            between pages             [ accessLevel     ]
                                                      [ datagovUrl      ]
                                                      [ extractedAt     ]

                                                            |
                                                            v
                                                    Apify Dataset
                                                  (JSON/CSV/Excel)

Performance & cost

Scenario	Max Results	Typical Duration	Memory	Estimated Cost
Quick search	10	5--8 seconds	256 MB	~$0.0003
Default search	50	10--15 seconds	256 MB	~$0.0005
Medium batch	200	20--40 seconds	256 MB	~$0.0015
Full extraction	500	60--120 seconds	256 MB	~$0.003

Free tier: Apify provides $5 of free platform credit monthly -- enough for approximately 8,000 default runs or 1,600 full extractions.
No external API costs: The Data.gov CKAN API is completely free with no rate limits enforced at the query level.

Limitations

500 dataset limit per run -- the actor caps results at 500 per execution. For broader coverage, run multiple searches with different filters or sort orders.
Metadata only -- the actor extracts dataset metadata and resource URLs, not the actual data files. Use the resource URLs to download the files in a separate step.
Description truncation -- dataset descriptions are truncated to 1,000 characters to keep output sizes manageable.
Single tag filter -- only one tag can be applied per run. If you need multi-tag filtering, use keyword search with tag names in the query field.
Organization slug format -- organization filters use CKAN slugs (e.g., "epa-gov"), not full agency names. Check Data.gov for the correct slug.
CKAN API availability -- the actor depends on the Data.gov CKAN API being online. Government API outages during maintenance windows or shutdowns may cause failures.
No geospatial search -- the actor searches by text, tags, and organization but does not support bounding box or geographic filtering.

Responsible use

Respect government infrastructure -- while the Data.gov API has no strict rate limits, avoid running excessive concurrent requests. The actor includes built-in 200ms delays between pages.
Attribution -- when using government datasets in publications or applications, credit the publishing agency and link back to the original dataset on Data.gov.
Data accuracy -- government datasets vary in quality, timeliness, and completeness. Always verify data freshness by checking the modified timestamp and reading the dataset documentation.
Terms of use -- most Data.gov datasets are in the public domain, but some may have specific license terms. Check the access level and any associated license before redistribution.
Scheduling frequency -- for monitoring new datasets, daily or weekly schedules are sufficient. The catalog does not update frequently enough to warrant hourly polling.

FAQ

Do I need a Data.gov API key to use this actor? No. The Data.gov CKAN API is completely free and open. No API key, registration, or authentication is required.

What types of datasets are available on Data.gov? Data.gov contains 300,000+ datasets from hundreds of federal agencies, state governments, and local authorities. Topics include environmental monitoring, public health, economic indicators, transportation, education, energy, agriculture, housing, criminal justice, and more. Formats range from CSV and JSON to APIs, shapefiles, KML, and PDFs.

Can I download the actual data files using this actor? The actor returns metadata including direct download URLs in the resources array. Use those URLs to download the actual data files. For automated downloading, feed the URLs into a script or another Apify actor.

How often is the Data.gov catalog updated? The catalog is continuously updated as federal agencies publish new datasets and refresh existing ones. Sort by "Recently Updated" to find the latest additions and modifications.

What is the maximum number of results I can retrieve? 500 datasets per run. For broader coverage, run the actor multiple times with different search criteria, organizations, or sort orders.

How do I find the correct organization slug? Organization slugs follow the pattern agency-gov. Common examples: epa-gov, noaa-gov, nasa-gov, census-gov, usda-gov, dot-gov. You can also visit an agency's page on Data.gov and check the URL for the exact slug.

Can I search for datasets from state or local governments? Yes. Many state and local government organizations publish data on Data.gov. Include state or city names in your keyword search, or use the organization filter if you know the entity's CKAN slug.

What sort options are available? Four sort options: "Most Popular" (views_recent desc -- default), "Most Relevant" (score desc -- best for keyword searches), "Recently Updated" (metadata_modified desc), and "Name A-Z" (name asc).

Can I filter by multiple tags at once? The tag filter supports one tag per run. To search across multiple topics, include tag names as keywords in the query field (e.g., "health environment") or run the actor multiple times with different tag values.

Does this actor work during government shutdowns? The Data.gov API may experience downtime during federal government shutdowns or maintenance periods. If the API is unavailable, the actor will return an error. Re-run when service is restored.

How do I schedule this actor to run automatically? In the Apify Console, click Schedule on the actor run page to set up daily, weekly, or custom cron-based schedules. You can also configure webhooks to notify external services when each run completes.

What does the accessLevel field mean? The accessLevel field indicates the dataset's openness classification from the CKAN extras metadata. Values are typically "public" (freely available), "restricted public" (available with conditions), or "non-public" (limited access). Most Data.gov datasets are public.

Actor	Description
FRED Economic Data Search	Search the Federal Reserve Economic Data database for U.S. economic time series. Pairs with Data.gov for comprehensive economic research.
World Bank Development Indicators	Search World Bank indicators for international economic and social data. Compare U.S. government data with global benchmarks.
BLS US Economic Data Search	Search Bureau of Labor Statistics for employment, inflation, and productivity data. Complements Data.gov labor and economic datasets.
USAspending Federal Spending	Search federal spending and contract data from USAspending.gov. Combine with Data.gov to correlate agency budgets with published datasets.
Congress Bill Search	Search congressional legislation and bill text. Cross-reference with Data.gov datasets to track data mandates in federal law.
Federal Register Search	Search the Federal Register for proposed and final rules. Discover regulatory data requirements that lead to new Data.gov publications.

Data Gov Catalog Scraper

fortuitous_pirate/data-gov-catalog-scraper

Fortuitous Pirate

Data.gov API - US Open Government Datasets

alizarin_refrigerator-owner/data-gov-api---us-open-government-datasets

Access the Data.gov catalog of 300,000+ US government datasets. Search datasets by topic, agency, format, and keywords. Discover open data from federal, state, and local governments

The Howlers

Data.gov.uk Scraper - Cheap 🌐📊🇬🇧

scrapestorm/data-gov-uk-scraper---cheap

🔎 Easily collect dataset listings from data.gov.uk Provide one or multiple search URLs and extract dataset information such as 📄 Dataset Title 🏢 Published By 🕒 Last Updated 📝 Description 🔗 Dataset URL & more Perfect for open data research, government data monitoring & dataset discovery 📊🚀

Storm_Scraper

5.0

Data.gov.uk Scraper

parseforge/data-gov-uk-scraper

Collect UK government open data effortlessly. Extract datasets, publishers, formats, topics, licenses, and download links from data.gov.uk — the official UK open data portal. Perfect for researchers, policy analysts, and developers building data catalogs.

ParseForge

5.0

Data.gov Dataset Catalog Crawler

jungle_synthesizer/datagov-dataset-crawler

Crawl 300K+ US government datasets from Data.gov. Extract titles, organizations, tags, formats, download URLs, API endpoints, temporal and spatial coverage, contacts, and resources. Filter by agency, format, category, and tags.

BowTiedRaccoon

USA Data.gov U.S. Government's Open Data Scrape

parseforge/data-gov-scraper

Stop wasting hours digging through thousands of government datasets. Our Data.gov scraper automatically gathers complete dataset details from the U.S. government's open data portal in minutes. Ideal for researchers, analysts, journalists, and teams needing reliable data without manual effort.

ParseForge

Grants Gov Scraper

fortuitous_pirate/grants-gov-scraper

Fortuitous Pirate

Clinicaltrials Gov Scraper

fortuitous_pirate/clinicaltrials-gov-scraper

Fortuitous Pirate

Regulations Gov Scraper

fortuitous_pirate/regulations-gov-scraper

Fortuitous Pirate

SAM.gov Contracts Scraper

magicfingers/sam-gov-scraper

Scrape government contract opportunities, entity registrations, and award data from SAM.gov. Search by keyword, NAICS code, set-aside type, agency, date range, and more. Uses the official SAM.gov public API.