Japan Kokkai Diet Proceedings Scraper - NDL Speech Records
Pricing
Pay per event
Japan Kokkai Diet Proceedings Scraper - NDL Speech Records
Extract speech records from Japan's National Diet Library (NDL) Kokkai API. Search 1M+ speeches across both chambers and all committees (1947–present) by keyword, speaker, committee, or date. Output includes full Japanese speech text, speaker party, Gregorian and wareki dates, and NDL citation URLs.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 hours ago
Last modified
Categories
Share
Japan Diet Kokkai Proceedings Scraper — NDL Speeches
Extract speech transcripts from the National Diet Library (NDL) Kokkai proceedings API — over 1 million speeches from 1947 to the present, covering both chambers and all committees of the Japanese parliament. Returns speaker name, reading (yomi), party affiliation, position, speech text, and meeting metadata. Data is public domain under Article 13 of Japan's Copyright Law.
What does the Kokkai Diet Proceedings Scraper do?
- Queries the NDL Kokkai proceedings API for speeches from the House of Representatives (衆議院), House of Councillors (参議院), or both chambers
- Filters by speaker name (partial match), committee name, session number, and date range
- Returns full speech text plus speaker yomi (reading), party group, and official position
- Returns PDF links, speech URLs, and meeting URLs for document-level access
- Exits with zero records and a logged error if no filter is set — the NDL API requires at least one search parameter
What data does it extract?
| Field | Description |
|---|---|
speech_id | Unique speech identifier |
issue_id | Meeting issue identifier |
session | Diet session number |
chamber | 衆議院 (House of Representatives) or 参議院 (House of Councillors) |
committee | Committee or meeting name |
issue_number | Issue number within the session |
meeting_date | Meeting date (ISO 8601) |
meeting_date_wareki | Meeting date in Japanese imperial calendar format |
speech_order | Order of this speech within the meeting |
speaker | Speaker name in Japanese |
speaker_yomi | Speaker name reading (hiragana) |
speaker_group | Political party or group affiliation |
speaker_position | Official position or title of the speaker |
speech_text | Full transcript text of the speech |
speech_url | URL to the individual speech record |
meeting_url | URL to the full meeting record |
pdf_url | URL to the meeting PDF where available |
search_query | The search query used to retrieve this record |
source_api_endpoint | NDL API endpoint used |
How to use it
At least one filter (searchQuery, speakerName, nameOfMeeting, chamber, sessionNumber, dateFrom, or dateTo) is required. If no filter is set, the NDL API returns an error — the actor will log the error and exit with zero records rather than crashing.
| Field | Type | Default | Description |
|---|---|---|---|
searchQuery | string | — | Full-text search query across speech text (Japanese or romaji) |
speakerName | string | — | Speaker name (partial match). Japanese characters recommended for accuracy. |
nameOfMeeting | string | — | Committee or meeting name filter |
chamber | string | — | 衆議院, 参議院, or 両院 (both chambers) |
sessionNumber | integer | 0 | Diet session number. 0 = all sessions. |
dateFrom | string | — | Start date in YYYY-MM-DD format |
dateTo | string | — | End date in YYYY-MM-DD format |
maxItems | integer | — | Maximum speeches to return |
Use cases
- Quantitative finance and monetary policy research — Track central bank governor testimony: set
speakerNameto植田和男for current BOJ Governor Ueda's speeches, or黒田東彦for former Governor Kuroda. Filter by date range and committee to isolate MPM-adjacent parliamentary appearances. - Policy trend analysis — Search
searchQueryfor specific policy terms (fiscal policy, social security, defense) across sessions and chambers to track legislative debate evolution over decades. - Political science research — Analyze party-affiliation patterns in speech frequency and committee participation using
speaker_groupandcommitteefields across the full 1947–present corpus. - Journalism and transparency — Retrieve transcripts of committee hearings on specific legislation by meeting name and date range; link to
meeting_urlandpdf_urlfor source verification. - Natural language processing — Build Japanese political speech corpora for NLP training using the 1M+ speech dataset — all public domain under Article 13 of Japan's Copyright Law.
FAQ
How do I filter for a specific politician's speeches?
Use speakerName with the politician's name in Japanese characters. For BOJ Governor Ueda: speakerName: 植田和男. For former Governor Kuroda: speakerName: 黒田東彦. Partial name matches are supported — the NDL API searches within the speaker name field.
What happens if I run the actor without any filters?
The NDL API requires at least one search parameter and returns an error for unrestricted queries. The actor detects this, logs the error with guidance, and exits with zero records rather than crashing the run. Add at least one of: searchQuery, speakerName, nameOfMeeting, chamber, sessionNumber, dateFrom, or dateTo.
Is the speech text complete?
Yes. The NDL API returns the full verbatim transcript of each speech as recorded in the official Diet stenographic record. The speech_text field contains the complete text. Some historical sessions (especially early postwar) may have shorter records due to original transcription gaps in the source archive.
Results are available for export in JSON, CSV, and Excel formats from the Apify dataset tab.