Japan Kokkai Diet Proceedings Scraper - NDL Speech Records avatar

Japan Kokkai Diet Proceedings Scraper - NDL Speech Records

Pricing

Pay per event

Go to Apify Store
Japan Kokkai Diet Proceedings Scraper - NDL Speech Records

Japan Kokkai Diet Proceedings Scraper - NDL Speech Records

Extract speech records from Japan's National Diet Library (NDL) Kokkai API. Search 1M+ speeches across both chambers and all committees (1947–present) by keyword, speaker, committee, or date. Output includes full Japanese speech text, speaker party, Gregorian and wareki dates, and NDL citation URLs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

25 days ago

Last modified

Share

Japan Diet (Kokkai) NDL Proceedings Scraper

Scrapes speech records from Japan's National Diet Library (NDL) Kokkai API. Returns per-speech records across both chambers and all committees from 1947 to the present — over 1 million speeches in the corpus, available at no cost from the NDL's official public API.

No auth. No proxy. Pure structured JSON from a government API that actually works.

Kokkai NDL Scraper Features

  • Extracts per-speech records with full Japanese text, speaker name, party affiliation, official position, and speaker name reading (よみ)
  • Covers 75+ years of Diet proceedings: House of Representatives (衆議院), House of Councillors (参議院), and joint sessions (両院) from 1947 through the current session
  • Full-text keyword search across the entire corpus — Japanese terms, romaji, or policy keywords
  • Filters by speaker name, committee/meeting name, chamber, session number, and date range
  • Returns Gregorian and wareki dates — because your downstream system wants 2026-04-23 and your Japanese colleagues want 令和8年4月23日
  • Stable NDL citation URLs for every speech and meeting record — suitable for academic references, regulatory citations, and RAG pipelines
  • Returns PDF URLs where available (NDL publishes PDFs with a lag; nulls are normal for recent sessions)
  • No proxy required — the NDL API is a public government service with no IP restrictions

What Can You Do With Kokkai Proceedings Data?

  • Quantitative finance researchers — track BOJ governor and MOF minister commentary on monetary policy, JGB supply, fiscal consolidation. The Diet record is the unfiltered version.
  • Policy researchers — build comparative parliamentary analysis datasets across sessions, parties, and committees
  • LLM training corpora — formal Japanese diarised speech is rare. This corpus is public domain under Article 13 of Japan's Copyright Law, multi-speaker, and consistently formatted
  • Western think tanks — Brookings, CSIS, and RAND Japan desks spend considerable time translating Diet proceedings. This delivers the raw record programmatically
  • Civic tech — political monitoring, party voting analysis, MP speech frequency dashboards

How It Works

  1. Configure your search. Set a keyword, speaker name, committee, chamber, session number, or date range. At least one filter is required — the NDL API doesn't do open-ended dumps.
  2. The scraper calls the NDL speech API with your filters and paginates through results using the cursor-based startRecord / nextRecordPosition mechanism. Page size is 100 — the API maximum.
  3. Each speech record is normalized to the output schema: raw API field names are mapped to snake_case, dates are augmented with wareki equivalents, and speaker position and role are merged into a single speaker_position field.
  4. Results are returned as structured JSON to the Apify dataset.

Kokkai NDL Scraper Input

{
"searchQuery": "金融政策",
"speakerName": "",
"nameOfMeeting": "財務金融委員会",
"chamber": "衆議院",
"sessionNumber": 0,
"dateFrom": "2023-01-01",
"dateTo": "2024-12-31",
"maxItems": 100
}

At least one filter must be set. A search that matches zero results returns zero records rather than an error.

FieldTypeDefaultDescription
searchQuerystring"予算"Full-text keyword search. Supports Japanese and romaji.
speakerNamestring""Filter by speaker name (partial match). E.g., "安倍" matches 安倍晋三.
nameOfMeetingstring""Committee or meeting name filter. E.g., "予算委員会", "財務金融委員会", "本会議".
chamberstring""Chamber filter: "衆議院", "参議院", "両院", or blank for all.
sessionNumberinteger0Diet session number (e.g., 213). 0 = all sessions.
dateFromstring""Start date filter (YYYY-MM-DD). Leave blank for earliest (1947-05-20).
dateTostring""End date filter (YYYY-MM-DD). Leave blank for most recent.
maxItemsinteger10Maximum records to return. 0 = unlimited.

Kokkai NDL Scraper Output

{
"speech_id": "121104376X01620230425_070",
"issue_id": "121104376X01620230425",
"session": 211,
"chamber": "衆議院",
"committee": "財務金融委員会",
"issue_number": "第16号",
"meeting_date": "2023-04-25",
"meeting_date_wareki": "令和5年4月25日",
"speech_order": 70,
"speaker": "植田和男",
"speaker_yomi": "うえだかずお",
"speaker_group": "内閣提出",
"speaker_position": "日本銀行総裁",
"speech_text": "○植田日銀総裁 まず、現在の金融政策運営の考え方についてご説明します...",
"speech_url": "https://kokkai.ndl.go.jp/txt/121104376X01620230425/70",
"meeting_url": "https://kokkai.ndl.go.jp/txt/121104376X01620230425",
"pdf_url": "https://kokkai.ndl.go.jp/pdfb/cm211046_20230425_00.pdf",
"search_query": "金融政策",
"source_api_endpoint": "https://kokkai.ndl.go.jp/api/speech"
}
FieldTypeDescription
speech_idstringUnique NDL speech identifier
issue_idstringMeeting record identifier
sessionintegerDiet session number (国会回次)
chamberstring衆議院, 参議院, or 両院
committeestringCommittee or meeting name
issue_numberstringIssue label within the session (e.g., 第16号)
meeting_datestringMeeting date in Gregorian YYYY-MM-DD
meeting_date_warekistringMeeting date in Japanese wareki (e.g., 令和5年4月25日)
speech_orderintegerSpeaker turn number within the meeting
speakerstringSpeaker full name
speaker_yomistringSpeaker name reading in hiragana
speaker_groupstringSpeaker's party or parliamentary group
speaker_positionstringOfficial position or role (PM, minister, committee chair, etc.)
speech_textstringFull speech text in Japanese
speech_urlstringCanonical NDL URL for this speech
meeting_urlstringCanonical NDL URL for the full meeting record
pdf_urlstringPDF URL for the meeting record (null for recent sessions pending publication)
search_querystringThe keyword that returned this record
source_api_endpointstringNDL API endpoint that produced this record

🔍 FAQ

How do I scrape Japan Diet proceedings?

The Japan Kokkai NDL Proceedings Scraper calls the National Diet Library's official public API at kokkai.ndl.go.jp/api/speech. Set at least one filter — a keyword, speaker name, committee, or date range — and the scraper paginates through all matching records. No credentials or proxy are needed.

What does the Japan Kokkai Diet Scraper cost to run?

The scraper charges $0.10 per run start plus $0.001 per record. A keyword search returning 500 speeches costs roughly $0.60. Unlimited runs returning the full corpus are feasible for budget-conscious users.

Does the Japan Kokkai Diet Scraper need proxies?

The Japan Kokkai NDL Proceedings Scraper doesn't need proxies. The NDL API is a public government service with no IP restrictions or rate limiting beyond reasonable request spacing.

Can I filter by committee or speaker?

The scraper supports filtering by speaker name, committee name, chamber, session number, and date range — independently or in combination. You can pull every BOJ governor speech in the 財務金融委員会 since 2013 in a single run.

What language is the speech text in?

Speech text is in Japanese. The NDL API returns the verbatim Diet record text, which includes standard parliamentary speech conventions (speaker interjections marked ○, procedural text, etc.). Dates are returned in both Gregorian (YYYY-MM-DD) and wareki notation.

How current is the data?

The NDL updates the proceedings database as records are officially published. Major committee hearings from the current session are typically available within days to weeks. PDFs lag longer.


Need More Features?

Need a meeting-level endpoint, additional filters, or bulk export by session? File an issue or get in touch.

Why Use Japan Kokkai NDL Proceedings Scraper?

  • No competition — Zero other Apify actors cover the Japanese Diet. You won't find this data pre-packaged anywhere else for $0.001/record.
  • Public domain corpus — Article 13 of Japan's Copyright Law places government records in the public domain. No licensing headaches, no terms-of-service gray zones.
  • RAG-ready output — Each speech is a self-contained chunk with a stable citation URL, speaker attribution, and precise date. Feed it directly into a vector store or policy monitoring pipeline.