USA Data.gov U.S. Government's Open Data Scrape avatar
USA Data.gov U.S. Government's Open Data Scrape

Pricing

Pay per event

Go to Apify Store
USA Data.gov U.S. Government's Open Data Scrape

USA Data.gov U.S. Government's Open Data Scrape

Stop wasting hours digging through thousands of government datasets. Our Data.gov scraper automatically gathers complete dataset details from the U.S. government's open data portal in minutes. Ideal for researchers, analysts, journalists, and teams needing reliable data without manual effort.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🏛️ USA Data.gov U.S. Government's Open Data Scraper

🚀 Supercharge your government data research with our comprehensive Data.gov scraper! Automate collection of detailed dataset information from the US government data catalog including dataset metadata, organizations, publishers, topics, formats, and access information. Get complete dataset details, resource links, and metadata from Data.gov's official catalog. Perfect for researchers, data analysts, and organizations who need accurate, up to date government data intelligence without manual work.

Target Audience: Researchers, data analysts, government contractors, policy analysts, journalists, and organizations needing government data
Primary Use Cases: Government data research, policy analysis, market research, competitive intelligence, and data driven decision making

What Does Data.gov Scraper Do?

This tool collects comprehensive dataset information from Data.gov, the U.S. government's open data portal. It delivers:

  • Complete Dataset Metadata - Titles, descriptions, organization details, publisher information
  • Organization Information - Organization names, types, missions, contact details, and URLs
  • Publisher Data - Publisher names, URLs, and contact information
  • Topic Classifications - Topics, topic categories, and tags for easy categorization
  • Resource Formats - Available data formats (CSV, JSON, XML, PDF, etc.) with download links
  • Access & Use Information - Public status, licenses, and usage rights
  • Download Links - Direct links to all available resources and datasets
  • Metadata Details - Creation dates, update dates, metadata sources, and references
  • Contact Information - Dataset maintainer names and email addresses
  • And much more - Comprehensive government data intelligence in one scrape

Business Value: Access thousands of government datasets efficiently, track data updates automatically, and build comprehensive government data databases that save weeks of manual research and monitoring.

How to use the Data.gov Scraper - Full Demo

[YouTube video embed or link]

Watch this 3-minute demo to see how easy it is to get started!

Input

To start Data.gov web scraping, simply fill in the input form. You can scrape Data.gov using two different methods (choose one):

Method 1: Direct URL Scraping 🔗

  • startUrl - Use a direct Data.gov catalog URL (e.g., https://catalog.data.gov/dataset?q=climate)
    • Required if search filters are not provided
    • Cannot be used together with search filters
  • maxItems - Set the maximum number of datasets to collect (up to 1,000,000). Free users: Required, max 50. Paid users: Optional, max 1,000,000. Leave empty for unlimited (paid users only). Default: 10
  • searchQuery - Enter a search term for datasets (e.g., "climate", "healthcare", "transportation")
    • Required if startUrl is not provided
  • maxItems - Set the maximum number of datasets to collect (up to 1,000,000). Free users: Required, max 50. Paid users: Optional, max 1,000,000. Leave empty for unlimited (paid users only). Default: 10

Advanced Filtering Options:

  • topics - Filter by topic groups. Select one or more topics from the dropdown (e.g., "Climate", "Energy", "Local Government")
  • topicCategories - Filter by topic categories. Select one or more categories (e.g., "Arctic", "Water", "Transportation")
  • datasetType - Filter by dataset type (e.g., "Dataset", "Collection")
  • tags - Filter by tags. Enter multiple tag names (e.g., "earth science", "noaa", "oceans")
  • formats - Filter by resource format. Select one or more formats (e.g., "CSV", "JSON", "PDF", "XML")
  • organizationType - Filter by organization type (e.g., "Federal Government", "State", "City")
  • organization - Filter by specific organization. Select from the dropdown (e.g., "noaa-gov", "usgs-gov")
  • publisher - Filter by publisher. Select from the dropdown
  • bureau - Filter by bureau code. Select from the dropdown
  • location - Filter by geographic location (e.g., "California", "New York")
  • sort - Sort results by relevance, views, or date

⚠️ Important Input Rules:

  1. Choose One Method: You must use either direct URL scraping OR search filters, not both
  2. Required Fields:
    • Either startUrl OR searchQuery must be provided
    • Free users can only use the prefill values provided in the input form. To use custom input parameters, please upgrade to a paid plan.
  3. Mutual Exclusivity:
    • If using startUrl, you cannot use searchQuery or any search filters
    • If using searchQuery, you cannot use startUrl
  4. Filter Combinations: You can combine multiple filters (topics, formats, organization, etc.) for precise results

Here's what the input configuration looks like in JSON:

{
"searchQuery": "climate",
"topics": ["climate5434"],
"formats": ["CSV", "JSON"],
"organizationType": "Federal Government",
"maxItems": 50
}

Example 2: Direct URL

{
"startUrl": "https://catalog.data.gov/dataset?q=climate&groups=climate5434&res_format=CSV",
"maxItems": 100
}
{
"searchQuery": "transportation",
"topics": ["local", "energy9485"],
"topicCategories": ["Transportation", "Energy Infrastructure"],
"tags": ["u.s. department of commerce", "air quality"],
"formats": ["CSV", "JSON", "PDF"],
"organizationType": "Federal Government",
"organization": "noaa-gov",
"maxItems": 200
}

Pro Tips:

  1. 🎯 Use search filters for flexibility - Combine multiple filters to find exactly what you need
  2. 📊 Start broad, then narrow - Begin with a search query, then add filters to refine results
  3. 🔍 Use topic filters - Topics help categorize datasets by subject area
  4. 📁 Filter by format - Get only the data formats you can work with (CSV, JSON, etc.)
  5. 🏛️ Filter by organization - Focus on specific government agencies or departments
  6. Use direct URLs - If you've already found a specific catalog page, paste the URL directly

Output

After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.

Here's an example of scraped Data.gov data you'll get if you decide to scrape dataset information:

{
"organizationImage": "https://raw.githubusercontent.com/GSA/logo/refs/heads/master/state_IA.png",
"title": "Iowa School Performance Profiles",
"datasetUrl": "https://catalog.data.gov/dataset/iowa-school-performance-profiles",
"organizationName": "State of Iowa",
"organizationUrl": "https://catalog.data.gov/organization/about/state-of-iowa",
"organizationType": "State",
"publisher": "data.iowa.gov",
"publisherUrl": "https://catalog.data.gov/dataset?publisher=data.iowa.gov",
"contact": {
"name": "Bryan Bauer",
"email": "no-reply@data.iowa.gov"
},
"organizationMission": "State of Iowa",
"topics": null,
"availableFormats": ["HTML"],
"tags": ["school-report-cards", "student-performance"],
"metadataUpdated": "September 1, 2023",
"metadataCreated": "January 20, 2023",
"description": "The Iowa School Performance Profiles is an online tool showing how public schools performed on required measures...",
"downloadsAndResources": [
{
"format": "HTML",
"url": "https://www.iaschoolperformance.gov/ECP/Home/Index",
"description": null
}
],
"accessAndUseInfo": [
{
"type": "publicStatus",
"label": "Public Status",
"value": "Public: This dataset is intended for public access and use.",
"url": "https://resources.data.gov/schemas/dcat-us/v1.1/#accessLevel"
},
{
"type": "license",
"label": "License",
"value": "License: No license information was provided.",
"url": null
}
],
"references": null,
"metadataSource": [
{
"format": "Data.json",
"heading": "Data.json Metadata",
"downloadUrl": "https://catalog.data.gov/harvest/object/f8ba9a15-9137-4b9f-bbf8-fd59d89bc825",
"harvestedFrom": "Iowa metadata"
}
],
"dates": [
{
"label": "Metadata Created Date",
"value": "January 20, 2023"
},
{
"label": "Metadata Updated Date",
"value": "September 1, 2023"
}
],
"additionalMetadata": {
"Resource Type": "Dataset",
"Publisher": "data.iowa.gov",
"Maintainer": "Bryan Bauer",
"Identifier": "https://data.iowa.gov/api/views/qu5a-5eu4",
"Data First Published": "2022-11-01",
"Data Last Modified": "2023-08-30",
"Category": "Primary & Secondary Ed",
"Public Access Level": "public"
},
"scrapedTimestamp": "2025-12-05T15:37:25.646Z"
}

What You Get:

  • 🏛️ Complete Organization Details - Organization names, types, missions, images, and URLs
  • 📊 Comprehensive Dataset Info - Titles, descriptions, metadata dates, and identifiers
  • 📁 Resource Links - Direct download links for all available formats (CSV, JSON, PDF, etc.)
  • 📞 Contact Information - Maintainer names and email addresses for support
  • 🏷️ Categorization - Topics, tags, and categories for easy organization
  • 📋 Access & Use Info - Public status, licenses, and usage rights
  • 🔗 References & Sources - Metadata sources, harvest information, and related links
  • 📅 Date Tracking - Creation dates, update dates, and modification timestamps
  • 📦 Additional Metadata - Complete technical metadata for advanced analysis

Download Options: CSV, Excel, or JSON formats for easy analysis in your business tools

Why Choose the Data.gov Scraper?

  • 🎯 Comprehensive Data Collection: Get all available dataset information in one scrape - metadata, resources, organizations, and more
  • 🔍 Advanced Filtering: Filter by topics, formats, organizations, publishers, and more for precise results
  • 📊 Multiple Input Methods: Use direct URLs or search filters - whichever works best for your workflow
  • 🏛️ Organization Intelligence: Get complete organization details including missions, contact info, and URLs
  • 📁 Format Filtering: Find datasets in the formats you need (CSV, JSON, PDF, XML, etc.)
  • 🔗 Direct Download Links: Access direct links to all available resources and datasets
  • 📋 Complete Metadata: Get creation dates, update dates, metadata sources, and technical details
  • 🚫 No Duplicates: Automatically skips datasets already in your collection
  • ⚡ User-Friendly: No coding needed, just input search terms or URLs and go
  • 🔄 Parallel Processing: Processes multiple datasets simultaneously for faster results

Time Savings: Save 10-20 hours per week compared to manual government data research
Cost Efficiency: Fraction of the cost of hiring a research assistant or using expensive data services

How to Use

  1. Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
  2. Find the Scraper: Visit the Data.gov Scraper page
  3. Set Input:
    • Option A (Recommended): Enter a search query and select filters
    • Option B: Add your direct Data.gov catalog URL
    • Set max items (optional)
  4. Run It: Click "Start" and let it collect your data
  5. Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON

Total Time: 3 minutes setup, 10-30 minutes for data collection
No Technical Skills Required: Everything is point and click

Business Use Cases

📊 Researchers & Academics:

  • Collect government datasets for research projects
  • Track updates to datasets over time
  • Build comprehensive government data databases
  • Analyze policy impacts using government data

🏛️ Government Contractors:

  • Monitor new government data releases
  • Track data updates from specific agencies
  • Identify data sources for proposals
  • Stay informed about government data initiatives

📰 Journalists & Media:

  • Find government data for investigative reporting
  • Track data releases from specific agencies
  • Monitor updates to important datasets
  • Build data driven stories with government sources

💼 Policy Analysts:

  • Analyze policy data across multiple agencies
  • Track policy implementation through data
  • Compare datasets across different time periods
  • Build policy impact assessments

📈 Data Analysts:

  • Build comprehensive government data catalogs
  • Create automated data monitoring systems
  • Integrate government data into business intelligence tools
  • Support data driven decision making

🔬 Market Researchers:

  • Access government economic and market data
  • Track industry trends through government datasets
  • Analyze market conditions using official data
  • Support business planning with government statistics

Using Data.gov Scraper with the Apify API

For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing business tools.

Example API Usage:

// Node.js example
const { ApifyApi } = require('apify-client');
const client = new ApifyApi({
token: 'YOUR_API_TOKEN',
});
// Run with search filters
await client.actor('YOUR_ACTOR_ID').call({
searchQuery: "climate",
topics: ["climate5434"],
formats: ["CSV", "JSON"],
organizationType: "Federal Government",
maxItems: 50
});
// Run with direct URL
await client.actor('YOUR_ACTOR_ID').call({
startUrl: "https://catalog.data.gov/dataset?q=climate&groups=climate5434",
maxItems: 100
});
  • Node.js: Install the apify-client NPM package
  • Python: Use the apify-client PyPI package
  • See the Apify API reference for full details

Frequently Asked Questions

Q: How accurate is the data? A: We collect data directly from Data.gov's official website in real time, ensuring the most up to date and accurate government dataset information available.

Q: Can I filter by multiple topics or formats? A: Yes! You can select multiple topics, topic categories, tags, and formats to get datasets that match any of your selected criteria.

Q: What's the difference between using startUrl and search filters? A: startUrl lets you use a direct Data.gov catalog URL you've already found, while search filters let you build a search from scratch. Both methods work great - choose the one that fits your workflow.

Q: How do I find organization slugs for the organization filter? A: Visit Data.gov's organizations page to browse all available organizations. The organization slug is in the URL (e.g., noaa-gov from https://catalog.data.gov/organization/noaa-gov).

Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing government data monitoring and research.

Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.

Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.

Q: How many datasets can I scrape? A: Free users can scrape up to 50 datasets per run. Paid users can scrape up to 1,000,000 datasets or leave maxItems empty for unlimited scraping.

Integrate Data.gov Scraper with any app and automate your workflow

Last but not least, Data.gov Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.

These includes:

Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Data.gov Scraper successfully finishes a run.

Looking for more data collection tools? Check out these related actors:

ActorDescriptionLink
GSA eLibrary ScraperCollects government publications and documents from GSA eLibraryhttps://apify.com/parseforge/gsa-elibrary-scraper
FINRA BrokerCheck ScraperExtracts financial broker and advisor information from FINRAhttps://apify.com/parseforge/finra-brokercheck-scraper
FAA Aircraft Registry (N-Number) ScraperCollects aircraft registration and ownership data from FAAhttps://apify.com/parseforge/faa-aircraft-registry-scraper
Greatschools ScraperExtracts school information and ratings from GreatSchools.orghttps://apify.com/parseforge/greatschools-scraper
PR Newswire ScraperCollects press releases and news content from PR Newswirehttps://apify.com/parseforge/pr-newswire-scraper

Pro Tip: 💡 Browse our complete collection of data collection actors to find the perfect tool for your business needs.

Need Help? Our support team is here to help you get the most out of this tool.


⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Data.gov, the U.S. General Services Administration (GSA), or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.