US Congressional Record Scraper avatar

US Congressional Record Scraper

Pricing

Pay per event

Go to Apify Store
US Congressional Record Scraper

US Congressional Record Scraper

Scrapes daily floor speeches and statements from the US Congressional Record via the official congress.gov API. Returns per-article records with full text, section, volume, and issue metadata from 1995 to present.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

US Congressional Record Scraper - Floor Speeches, Statements & Legislative Text

Extract daily floor speeches and statements from the US Congressional Record via the official Congress.gov API. Returns per-article records with section, volume, issue metadata, page numbers, and optional full plain text. Coverage spans from 1995 to the present across all CR sections: Daily Digest, Senate, House, and Extensions of Remarks.

What data does it extract?

Each record represents a single article from a daily Congressional Record issue:

FieldDescription
congressCongress number (e.g. 119)
sessionSession number within the Congress
volumeCR volume number
issue_numberIssue number within the volume
issue_datePublication date (YYYY-MM-DD)
sectionSection name: Daily Digest, Senate Section, House Section, Extensions of Remarks Section
article_titleTitle of the article or speech
start_pageFirst page in the printed Record
end_pageLast page in the printed Record
article_textFull plain-text body (populated when includeFullText: true). Capped at 50,000 characters.
pdf_urlURL to the PDF version of this article
source_urlCanonical URL on congress.gov
scraped_atISO-8601 scrape timestamp

Text availability note: article_text is populated only for articles that have an associated Formatted Text URL in the API. For most pre-2000 records, only PDF versions exist — article_text will be null even with includeFullText: true. Full text coverage is reliable from approximately 2000 onward.

What does the scraper do?

It calls the Congress.gov API to list CR issues by date or Congress number, then fetches each article's detail record. With includeFullText: true, it makes one additional HTTP request per article to retrieve the plain text body. The default request delay (400 ms between requests) keeps throughput well within the 5,000 req/hr limit on a free API key.

How to use it

FieldTypeDefaultDescription
maxItemsinteger15Maximum number of article records to return
dateFromstringStart date (YYYY-MM-DD)
dateTostringEnd date (YYYY-MM-DD)
congressintegerFilter by Congress number (e.g. 119 for the 119th Congress, 2025-2027)
includeFullTextbooleantrueFetch the full plain-text body of each article
apiKeystringYour free api.congress.gov API key. If blank, a shared key is used (shared rate limit).

Get a free API key at api.congress.gov/sign-up/ — instant issuance, no review required. A production key allows 5,000 req/hr.

Fetch recent articles (metadata only)

{
"dateFrom": "2026-05-15",
"dateTo": "2026-05-15",
"includeFullText": false,
"maxItems": 100
}

Download full text from a specific Congress

{
"congress": 119,
"dateFrom": "2026-01-01",
"dateTo": "2026-03-31",
"includeFullText": true,
"maxItems": 1000
}

Historical backfill

{
"dateFrom": "1995-01-04",
"dateTo": "1995-12-31",
"includeFullText": true,
"maxItems": 5000
}

Use cases

  • Legislative NLP training corpus — collect floor speeches with section labels and date metadata to train or fine-tune models on legislative language
  • Lobbying analytics — track which bills are discussed on the House or Senate floor, correlated with sponsor party and subject
  • Committee hearing archive — build a searchable archive of Extensions of Remarks statements for specific policy areas
  • Legislative monitoring — run daily with dateFrom set to the previous day to ingest each new CR issue as it publishes
  • Journalism and data reporting — export historical floor speech text for topic modeling or legislator activity analysis

FAQ

Is the Congress.gov API free? Yes. A free key allows 5,000 req/hr. Without a key, a shared demo key is used, which is limited to 30 req/hr and suitable for testing only.

Why is article_text null for some records? Articles that exist only as PDFs in the Congress.gov system do not have a Formatted Text URL. This is most common for records before 2000. The PDF link is always included in pdf_url.

What is a typical run size? Each daily CR issue contains roughly 100 articles across all four sections. A full month of articles is approximately 2,000 records.

Output is available in JSON, CSV, and Excel via the Apify dataset export panel.