US Congressional Record Scraper
Pricing
Pay per event
US Congressional Record Scraper
Scrapes daily floor speeches and statements from the US Congressional Record via the official congress.gov API. Returns per-article records with full text, section, volume, and issue metadata from 1995 to present.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Share
US Congressional Record Scraper - Floor Speeches, Statements & Legislative Text
Extract daily floor speeches and statements from the US Congressional Record via the official Congress.gov API. Returns per-article records with section, volume, issue metadata, page numbers, and optional full plain text. Coverage spans from 1995 to the present across all CR sections: Daily Digest, Senate, House, and Extensions of Remarks.
What data does it extract?
Each record represents a single article from a daily Congressional Record issue:
| Field | Description |
|---|---|
congress | Congress number (e.g. 119) |
session | Session number within the Congress |
volume | CR volume number |
issue_number | Issue number within the volume |
issue_date | Publication date (YYYY-MM-DD) |
section | Section name: Daily Digest, Senate Section, House Section, Extensions of Remarks Section |
article_title | Title of the article or speech |
start_page | First page in the printed Record |
end_page | Last page in the printed Record |
article_text | Full plain-text body (populated when includeFullText: true). Capped at 50,000 characters. |
pdf_url | URL to the PDF version of this article |
source_url | Canonical URL on congress.gov |
scraped_at | ISO-8601 scrape timestamp |
Text availability note: article_text is populated only for articles that have an associated Formatted Text URL in the API. For most pre-2000 records, only PDF versions exist — article_text will be null even with includeFullText: true. Full text coverage is reliable from approximately 2000 onward.
What does the scraper do?
It calls the Congress.gov API to list CR issues by date or Congress number, then fetches each article's detail record. With includeFullText: true, it makes one additional HTTP request per article to retrieve the plain text body. The default request delay (400 ms between requests) keeps throughput well within the 5,000 req/hr limit on a free API key.
How to use it
| Field | Type | Default | Description |
|---|---|---|---|
maxItems | integer | 15 | Maximum number of article records to return |
dateFrom | string | — | Start date (YYYY-MM-DD) |
dateTo | string | — | End date (YYYY-MM-DD) |
congress | integer | — | Filter by Congress number (e.g. 119 for the 119th Congress, 2025-2027) |
includeFullText | boolean | true | Fetch the full plain-text body of each article |
apiKey | string | — | Your free api.congress.gov API key. If blank, a shared key is used (shared rate limit). |
Get a free API key at api.congress.gov/sign-up/ — instant issuance, no review required. A production key allows 5,000 req/hr.
Fetch recent articles (metadata only)
{"dateFrom": "2026-05-15","dateTo": "2026-05-15","includeFullText": false,"maxItems": 100}
Download full text from a specific Congress
{"congress": 119,"dateFrom": "2026-01-01","dateTo": "2026-03-31","includeFullText": true,"maxItems": 1000}
Historical backfill
{"dateFrom": "1995-01-04","dateTo": "1995-12-31","includeFullText": true,"maxItems": 5000}
Use cases
- Legislative NLP training corpus — collect floor speeches with section labels and date metadata to train or fine-tune models on legislative language
- Lobbying analytics — track which bills are discussed on the House or Senate floor, correlated with sponsor party and subject
- Committee hearing archive — build a searchable archive of Extensions of Remarks statements for specific policy areas
- Legislative monitoring — run daily with
dateFromset to the previous day to ingest each new CR issue as it publishes - Journalism and data reporting — export historical floor speech text for topic modeling or legislator activity analysis
FAQ
Is the Congress.gov API free? Yes. A free key allows 5,000 req/hr. Without a key, a shared demo key is used, which is limited to 30 req/hr and suitable for testing only.
Why is article_text null for some records?
Articles that exist only as PDFs in the Congress.gov system do not have a Formatted Text URL. This is most common for records before 2000. The PDF link is always included in pdf_url.
What is a typical run size? Each daily CR issue contains roughly 100 articles across all four sections. A full month of articles is approximately 2,000 records.
Output is available in JSON, CSV, and Excel via the Apify dataset export panel.