FOMC Meeting Transcripts & Minutes Scraper
Pricing
Pay per event
FOMC Meeting Transcripts & Minutes Scraper
Scrapes the Federal Reserve FOMC historical archive (1936-present). Extracts transcripts, minutes, Tealbooks, Beige Books, and statements for every FOMC meeting. Optionally extracts PDF plain-text with participant lists and topic tags.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Extracts FOMC meeting artifacts from the Federal Reserve's historical materials archive — the official public record of every Federal Open Market Committee meeting since 1936. Collects transcript PDFs, minutes, Tealbooks, Beige Books, policy statements, and press conference links for every FOMC meeting in the embargo-cleared corpus (currently 1936–2020). Optionally extracts full plain-text from PDFs with participant lists and topic tags.
What This Scraper Collects
- Transcripts — verbatim PDFs of FOMC meeting proceedings (released under the 5-year embargo rule)
- Minutes — official summary of each meeting, released approximately 3 weeks after the meeting
- Tealbooks A & B — staff economic forecasts and analysis prepared before each meeting
- Beige Book — regional economic conditions summary from all 12 Federal Reserve Districts
- Agendas — formal meeting agenda PDFs
- Policy Statements — press release HTML links for post-meeting rate decisions
- Press Conferences — chair press conference page links (post-2011)
Each record includes: meeting date, meeting type (regular or conference call), artifact type, artifact URL, Fed Chair name at time of the meeting, minutes release date, statement URL, press conference URL, embargo status, and scraped timestamp. With extractPdfText: true, also includes plain text, semicolon-separated participant names, and heuristic topic tags.
Features
- Covers the full historical archive from 1936 to 2020 (85 years, 800+ artifacts)
- Filter by year range with
startYear/endYear— run only the years you care about - Filter by artifact type — transcripts only, minutes only, or any combination
- Identifies Fed Chair by meeting date using a built-in tenure map (Volcker, Greenspan, Bernanke, Yellen, Powell)
- Optional PDF text extraction — extracts participant list from the PRESENT section and heuristic topic tags (inflation, employment, interest rates, balance sheet, GDP, credit, financial stability, international)
- Detects conference call meetings separately from regular scheduled meetings
- Runs on 512 MB memory, no proxy required — federalreserve.gov is fully public
Who Uses a FOMC Transcript Dataset?
- Macroeconomic research desks — build time-series analysis of Fed language, voting patterns, and policy signals across chair eras
- AI training shops — primary-source central-bank verbatim is high-value training data for finance-aware LLMs and monetary policy models
- Academic researchers — automates what was previously a hand-download task for papers citing FOMC transcripts
- Quantitative analysts — run NLP models over FOMC text to extract sentiment, policy stance, and forward guidance signals
- Journalists and financial writers — search the full historical record for specific topics or speeches
How the Scraper Works
- Fetches the Historical Materials by Year index page to enumerate all available year pages.
- Filters to years within
startYear–endYearand crawls each per-year page. - Parses every meeting panel, classifying each link by artifact type.
- Emits one record per artifact link, enriched with meeting metadata and chair name.
- If
extractPdfText: true, downloads each PDF and extracts plain text, participants, and topic tags before saving.
Input
{"startYear": 2015,"endYear": 2020,"artifactTypes": ["transcript", "minutes"],"maxItems": 0,"extractPdfText": false}
| Field | Type | Default | Description |
|---|---|---|---|
startYear | Integer | 2015 | Earliest FOMC year to include (1936–2020). |
endYear | Integer | 2020 | Latest FOMC year to include (1936–2020). |
artifactTypes | Array | ["transcript", "minutes"] | Types to collect: transcript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, press_conference. |
maxItems | Integer | 0 | Maximum artifact records to return. 0 = unlimited. |
extractPdfText | Boolean | false | Download each PDF and extract plain text. Significantly increases runtime. |
Collect Only Transcripts, 2010–2020
{"startYear": 2010,"endYear": 2020,"artifactTypes": ["transcript"]}
Extract PDF Text for NLP Analysis
{"startYear": 2015,"endYear": 2020,"artifactTypes": ["transcript"],"extractPdfText": true,"maxItems": 20}
Output Schema
| Field | Description |
|---|---|
meeting_date | Meeting date in YYYY-MM-DD (last day for multi-day meetings) |
meeting_type | regular or conference_call |
year | Meeting year as integer |
artifact_type | transcript, minutes, tealbook_a, tealbook_b, beige_book, agenda, statement, or press_conference |
artifact_url | Full URL to the PDF or HTML artifact |
artifact_filename | Filename from the URL |
artifact_text | Plain text from PDF (only when extractPdfText: true) |
participants | Semicolon-separated participant names from the transcript PRESENT section |
chair_name | Fed Chair at the time of the meeting |
minutes_release_date | Date the minutes were publicly released |
statement_url | Policy statement URL (post-2008 meetings) |
press_conference_url | Chair press conference URL (post-2011) |
canonical_url | Year-index source page URL |
embargo_status | public for all artifacts in the archive |
extracted_topics | Semicolon-separated topic tags from PDF text (when extractPdfText: true) |
scraped_at | ISO 8601 timestamp |
Notes
- The 5-year embargo means transcripts are only available for meetings that occurred at least 5 years ago. As of 2026, the archive covers through 2020.
- Conference call meetings (emergency sessions) are labeled
meeting_type: conference_call. They were common during the 2008 financial crisis. - PDF text extraction works well for transcripts from 1990 onwards (searchable PDFs). Pre-1990 transcripts may be image-only scans; the extractor returns an empty
artifact_textfor those rather than failing. - All data is public domain (U.S. federal government publication).