FOMC Meeting Transcripts & Minutes Scraper avatar

FOMC Meeting Transcripts & Minutes Scraper

Pricing

Pay per event

Go to Apify Store
FOMC Meeting Transcripts & Minutes Scraper

FOMC Meeting Transcripts & Minutes Scraper

Scrapes the Federal Reserve FOMC historical archive (1936-present). Extracts transcripts, minutes, Tealbooks, Beige Books, and statements for every FOMC meeting. Optionally extracts PDF plain-text with participant lists and topic tags.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Extracts FOMC meeting artifacts from the Federal Reserve's historical materials archive — the official public record of every Federal Open Market Committee meeting since 1936. Collects transcript PDFs, minutes, Tealbooks, Beige Books, policy statements, and press conference links for every FOMC meeting in the embargo-cleared corpus (currently 1936–2020). Optionally extracts full plain-text from PDFs with participant lists and topic tags.

What This Scraper Collects

  • Transcripts — verbatim PDFs of FOMC meeting proceedings (released under the 5-year embargo rule)
  • Minutes — official summary of each meeting, released approximately 3 weeks after the meeting
  • Tealbooks A & B — staff economic forecasts and analysis prepared before each meeting
  • Beige Book — regional economic conditions summary from all 12 Federal Reserve Districts
  • Agendas — formal meeting agenda PDFs
  • Policy Statements — press release HTML links for post-meeting rate decisions
  • Press Conferences — chair press conference page links (post-2011)

Each record includes: meeting date, meeting type (regular or conference call), artifact type, artifact URL, Fed Chair name at time of the meeting, minutes release date, statement URL, press conference URL, embargo status, and scraped timestamp. With extractPdfText: true, also includes plain text, semicolon-separated participant names, and heuristic topic tags.

Features

  • Covers the full historical archive from 1936 to 2020 (85 years, 800+ artifacts)
  • Filter by year range with startYear / endYear — run only the years you care about
  • Filter by artifact type — transcripts only, minutes only, or any combination
  • Identifies Fed Chair by meeting date using a built-in tenure map (Volcker, Greenspan, Bernanke, Yellen, Powell)
  • Optional PDF text extraction — extracts participant list from the PRESENT section and heuristic topic tags (inflation, employment, interest rates, balance sheet, GDP, credit, financial stability, international)
  • Detects conference call meetings separately from regular scheduled meetings
  • Runs on 512 MB memory, no proxy required — federalreserve.gov is fully public

Who Uses a FOMC Transcript Dataset?

  • Macroeconomic research desks — build time-series analysis of Fed language, voting patterns, and policy signals across chair eras
  • AI training shops — primary-source central-bank verbatim is high-value training data for finance-aware LLMs and monetary policy models
  • Academic researchers — automates what was previously a hand-download task for papers citing FOMC transcripts
  • Quantitative analysts — run NLP models over FOMC text to extract sentiment, policy stance, and forward guidance signals
  • Journalists and financial writers — search the full historical record for specific topics or speeches

How the Scraper Works

  1. Fetches the Historical Materials by Year index page to enumerate all available year pages.
  2. Filters to years within startYearendYear and crawls each per-year page.
  3. Parses every meeting panel, classifying each link by artifact type.
  4. Emits one record per artifact link, enriched with meeting metadata and chair name.
  5. If extractPdfText: true, downloads each PDF and extracts plain text, participants, and topic tags before saving.

Input

{
"startYear": 2015,
"endYear": 2020,
"artifactTypes": ["transcript", "minutes"],
"maxItems": 0,
"extractPdfText": false
}
FieldTypeDefaultDescription
startYearInteger2015Earliest FOMC year to include (1936–2020).
endYearInteger2020Latest FOMC year to include (1936–2020).
artifactTypesArray["transcript", "minutes"]Types to collect: transcript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, press_conference.
maxItemsInteger0Maximum artifact records to return. 0 = unlimited.
extractPdfTextBooleanfalseDownload each PDF and extract plain text. Significantly increases runtime.

Collect Only Transcripts, 2010–2020

{
"startYear": 2010,
"endYear": 2020,
"artifactTypes": ["transcript"]
}

Extract PDF Text for NLP Analysis

{
"startYear": 2015,
"endYear": 2020,
"artifactTypes": ["transcript"],
"extractPdfText": true,
"maxItems": 20
}

Output Schema

FieldDescription
meeting_dateMeeting date in YYYY-MM-DD (last day for multi-day meetings)
meeting_typeregular or conference_call
yearMeeting year as integer
artifact_typetranscript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, or press_conference
artifact_urlFull URL to the PDF or HTML artifact
artifact_filenameFilename from the URL
artifact_textPlain text from PDF (only when extractPdfText: true)
participantsSemicolon-separated participant names from the transcript PRESENT section
chair_nameFed Chair at the time of the meeting
minutes_release_dateDate the minutes were publicly released
statement_urlPolicy statement URL (post-2008 meetings)
press_conference_urlChair press conference URL (post-2011)
canonical_urlYear-index source page URL
embargo_statuspublic for all artifacts in the archive
extracted_topicsSemicolon-separated topic tags from PDF text (when extractPdfText: true)
scraped_atISO 8601 timestamp

Notes

  • The 5-year embargo means transcripts are only available for meetings that occurred at least 5 years ago. As of 2026, the archive covers through 2020.
  • Conference call meetings (emergency sessions) are labeled meeting_type: conference_call. They were common during the 2008 financial crisis.
  • PDF text extraction works well for transcripts from 1990 onwards (searchable PDFs). Pre-1990 transcripts may be image-only scans; the extractor returns an empty artifact_text for those rather than failing.
  • All data is public domain (U.S. federal government publication).