# GovInfo Publications Scraper (`parseforge/govinfo-publications-scraper`) Actor

Reach into govinfo.gov for official US government publications spanning bills, the Congressional Record, the Federal Register, and committee reports. Each item returns the package id, title, collection, congress number, document class, publish date, and branch. Built for policy research.

- **URL**: https://apify.com/parseforge/govinfo-publications-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Developer tools, AI, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $19.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🏛️ GovInfo Publications Scraper

> 🚀 **Pull official US government publications in seconds. Bills, Federal Register, Congressional Record, committee reports, and the US Code, straight from the public govinfo.gov collections API.**

> 🕒 **Last updated** 2026-06-05 · **📊 11 fields** per record · 15+ federal collections · Bills, hearings, reports, regulations · Updated daily by GPO

The GovInfo Publications Scraper turns the official Government Publishing Office (GPO) `api.govinfo.gov/collections` endpoint into a structured dataset of federal publications. Pick a collection (Bills, Federal Register, Congressional Record, US Code, etc.), set a date range, and the actor walks the paginated response and flattens each package into one row.

GovInfo is the authoritative source for official US federal documents, maintained by the GPO and trusted by lawyers, journalists, and researchers.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| ⚖️ Legal researchers | Track new bills and statutes by Congress |
| 📰 Investigative journalists | Monitor the Federal Register for new rules |
| 🏢 Policy analysts | Build longitudinal datasets of congressional activity |
| 🤖 ML engineers | Train models on official legislative text |
| 🏛️ Civic tech builders | Power public dashboards of congressional output |

### 📋 What the GovInfo Publications Scraper does

- Calls `api.govinfo.gov/collections/{collection}/{startDate}/{endDate}` with your filters.
- Walks the paginated response across `pageSize` and `offset`.
- Flattens each package into one row with stable columns.
- Builds the canonical PDF URL for every package so you can pipe to a downloader.
- Pushes any upstream error as a single record rather than crashing the run.

> 💡 **Why it matters** GovInfo holds the authoritative copy of every US federal publication, but its API requires you to chunk by date range and walk offsets. This actor handles pagination so you can focus on the data.

### 🎬 Full Demo

_🚧 Coming soon._

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Required</th><th>Description</th></tr>
<tr><td><code>collection</code></td><td>enum</td><td>No</td><td>GovInfo collection. Default <code>BILLS</code>.</td></tr>
<tr><td><code>startDate</code></td><td>string</td><td>No</td><td>ISO 8601 lower bound. Default <code>2025-01-01T00:00:00Z</code>.</td></tr>
<tr><td><code>endDate</code></td><td>string</td><td>No</td><td>ISO 8601 upper bound. Empty means now.</td></tr>
<tr><td><code>congress</code></td><td>integer</td><td>No</td><td>Optional Congress number (e.g. 118).</td></tr>
<tr><td><code>docClass</code></td><td>string</td><td>No</td><td>Optional document class slug (e.g. <code>hr</code>, <code>s</code>).</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td>No</td><td>Free plan caps at 10. Paid up to 1,000,000.</td></tr>
</table>

**Example 1, recent House bills:**
```json
{
  "collection": "BILLS",
  "startDate": "2025-01-01T00:00:00Z",
  "docClass": "hr",
  "maxItems": 50
}
````

**Example 2, latest Federal Register entries:**

```json
{
  "collection": "FR",
  "startDate": "2026-05-01T00:00:00Z",
  "maxItems": 100
}
```

> ⚠️ **Good to Know** The operator running this actor needs a free `data.gov` API key set as the `GOVINFO_API_KEY` environment variable. Get one in 30 seconds at api.data.gov/signup.

### 📊 Output

Each record is a flat object. `error` is always last.

| Field | Type | Description |
|---|---|---|
| 🆔 `packageId` | string | Unique GovInfo package identifier. |
| 📄 `title` | string | Publication title. |
| 📚 `collection` | string | Collection code (BILLS, FR, CREC, etc.). |
| 🏛️ `congress` | integer | Congress number when applicable. |
| 🗂️ `docClass` | string | Document class (hr, s, hres, sres, etc.). |
| 📅 `publishDate` | string | Date issued. |
| 🕒 `lastModified` | string | Last modified timestamp from GPO. |
| 📎 `pdfUrl` | string | Direct link to the official PDF on govinfo.gov. |
| 🔢 `granuleCount` | integer | Number of sub-documents inside the package. |
| 🏢 `branch` | string | Government branch when present. |
| 🕒 `scrapedAt` | string | When this row was fetched. |
| ❌ `error` | string | Set if the upstream response was an error. |

### ✨ Why choose this Actor

| 🆓 | Free official GPO data, no scraping needed. |
| 📚 | 15+ federal collections covered out of the box. |
| 📎 | Auto-builds the direct PDF URL for every package. |
| 🛟 | Pagination handled, error responses surfaced cleanly. |
| 🔌 | Plain HTTP, no browsers, fast and resilient. |

### 📈 How it compares to alternatives

| Approach | Setup | Pagination | PDF URL builder |
|---|---|---|---|
| Roll your own fetch | 30 min plus | manual | manual |
| Python govinfo client | install plus script | partial | partial |
| **This Actor** | 5 sec, no install | ✅ | ✅ |

### 🚀 How to use

1. Click **Try for free**.
2. Pick a `collection` and a `startDate`.
3. Click **Start**.
4. Open the dataset when the run finishes.

### 💼 Business use cases

**⚖️ Legal monitoring** Watch the Federal Register for new agency rules affecting your industry.

**📊 Policy research** Pull all bills introduced in a Congress and feed pandas for descriptive stats.

**📰 Newsroom backbone** Reporters can verify the exact text of a public law in seconds.

**🤖 ML training** Build a corpus of official legislative text for legal LLMs.

### 🔌 Automating GovInfo Publications Scraper

- **Make and Zapier** trigger on schedule and push results to Airtable, Google Sheets, or Slack.
- **Cron** native Apify scheduler runs the actor every morning.
- **Webhooks** receive a POST the moment a run completes.
- **Warehouses** pipe directly to BigQuery, Snowflake, or Postgres.

### 🌟 Beyond business use cases

**🎓 Education** Teach a civics class with real congressional data.

**🧪 Personal research** Track every bill your representative introduces.

**🤝 Non-profit and open data** Power public dashboards of legislative activity.

**🧰 Tinkering** Spin up a quick prototype on top of authoritative federal data.

### 🤖 Ask an AI assistant about this scraper

Paste this README into ChatGPT, Claude, or any assistant and ask it to map your workflow to the actor's inputs. The schema, examples, and field list above are everything an LLM needs.

### ❓ Frequently Asked Questions

**❓ Do I need an API key** The operator running the actor sets one server-side. End users do nothing.

**❓ Which collections are supported** All 15+ public collections in the dropdown.

**❓ Is there a rate limit** Yes, the data.gov tier allows 1,000 requests per hour per key.

**❓ Can I get full document text** Yes, follow the `pdfUrl` to the official PDF on govinfo.gov.

**❓ How fresh is the data** GPO publishes new packages daily.

**❓ Can I schedule runs** Yes, use Apify's native scheduler.

**❓ Is this scraping or API** Pure API. Endpoint is official and public.

**❓ Will the schema change** Stable core fields. New collections are additive.

### 🔌 Integrate with any app

Apify ships native integrations with Make, Zapier, Slack, Discord, Google Drive, Google Sheets, Gmail, Airbyte, Keboola, Telegram, GitHub, and any REST endpoint or webhook.

### 🔗 Recommended Actors

| Actor | What it does |
|---|---|
| [ParseForge OurAirports Scraper](https://apify.com/parseforge/ourairports-scraper) | Global airport database. |
| [ParseForge Alpha Vantage Scraper](https://apify.com/parseforge) | Public market data. |
| [ParseForge NBA Stats Scraper](https://apify.com/parseforge/nba-stats-scraper) | Player and team stats. |
| [ParseForge CurseForge Mods Scraper](https://apify.com/parseforge/curseforge-mods-scraper) | Public mod metadata. |

> 💡 **Pro Tip** browse the complete [ParseForge collection](https://apify.com/parseforge) for 900+ production-grade scrapers.

***

**Disclaimer** This actor uses only publicly available data. ParseForge is not affiliated with, endorsed by, or sponsored by any of the third-party services referenced. Users are responsible for complying with the target site's terms of service and applicable law. [Create a free account w/ $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp).

# Actor input Schema

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `collection` (type: `string`):

GovInfo collection code (e.g. BILLS, CREC, FR, CRPT, USCODE, PLAW).

## `startDate` (type: `string`):

Earliest lastModified date to fetch. Format YYYY-MM-DDTHH:MM:SSZ.

## `endDate` (type: `string`):

Latest lastModified date to fetch. Leave empty for now.

## `congress` (type: `integer`):

Optional filter by Congress (e.g. 118).

## `docClass` (type: `string`):

Optional document class slug (e.g. hr, s, hres, sres).

## Actor input object example

```json
{
  "maxItems": 10,
  "collection": "BILLS",
  "startDate": "2025-01-01T00:00:00Z"
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "startDate": "2025-01-01T00:00:00Z"
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/govinfo-publications-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "startDate": "2025-01-01T00:00:00Z",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/govinfo-publications-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "startDate": "2025-01-01T00:00:00Z"
}' |
apify call parseforge/govinfo-publications-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/govinfo-publications-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GovInfo Publications Scraper",
        "description": "Reach into govinfo.gov for official US government publications spanning bills, the Congressional Record, the Federal Register, and committee reports. Each item returns the package id, title, collection, congress number, document class, publish date, and branch. Built for policy research.",
        "version": "0.1",
        "x-build-id": "MMiZ8cJb3ZH7QPwYo"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~govinfo-publications-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-govinfo-publications-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~govinfo-publications-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-govinfo-publications-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~govinfo-publications-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-govinfo-publications-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "collection": {
                        "title": "Collection",
                        "enum": [
                            "BILLS",
                            "CREC",
                            "FR",
                            "CRPT",
                            "USCODE",
                            "PLAW",
                            "CHRG",
                            "CFR",
                            "ECONI",
                            "GAOREPORTS",
                            "CPRT",
                            "GPO",
                            "BUDGET",
                            "PAI",
                            "STATUTE"
                        ],
                        "type": "string",
                        "description": "GovInfo collection code (e.g. BILLS, CREC, FR, CRPT, USCODE, PLAW).",
                        "default": "BILLS"
                    },
                    "startDate": {
                        "title": "Start date (ISO 8601)",
                        "type": "string",
                        "description": "Earliest lastModified date to fetch. Format YYYY-MM-DDTHH:MM:SSZ."
                    },
                    "endDate": {
                        "title": "End date (ISO 8601)",
                        "type": "string",
                        "description": "Latest lastModified date to fetch. Leave empty for now."
                    },
                    "congress": {
                        "title": "Congress number",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Optional filter by Congress (e.g. 118)."
                    },
                    "docClass": {
                        "title": "Document class",
                        "type": "string",
                        "description": "Optional document class slug (e.g. hr, s, hres, sres)."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
