Cultural Heritage Online Archive Scraper
Pricing
Pay per event
Cultural Heritage Online Archive Scraper
Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン), the Agency for Cultural Affairs' digital museum. Extracts titles, classifications, eras, genres, regions, holding institutions, and image URLs from Japan's national heritage archive.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン, online.bunka.go.jp) — the Agency for Cultural Affairs' digital museum of Japan's national heritage.
The actor extracts object-level records from the site's full 136,000+ item archive, searchable by keyword, classification, era, genre, and region. Each record includes title, kana reading, era, genre, region, holding institution, description, and a list of high-resolution image URLs — the imagery is the unique value of this source.
What you get
Each record includes:
| Field | Description |
|---|---|
heritage_id | Unique item ID from /heritages/detail/<id> |
title | Object title (名称) |
title_kana | Phonetic reading (ふりがな) |
genre | Category (絵画 / 彫刻 / 工芸品 / 書跡 / etc.) |
era | Historical period in Japanese (江戸時代, 平安時代, etc.) |
era_normalized | Normalised Latin slug (edo / heian / kamakura / etc.) |
region | Prefecture or region (所在地域) |
holder | Holding institution (所蔵館) |
material | Material and technique (材質・技法) where listed |
dimensions | Dimensions (法量) where listed |
description | Object description (解説) |
image_urls | Array of high-resolution image URLs |
detail_url | Full URL of the detail page |
Usage
Basic keyword search
{"keywords": "仏像","maxItems": 100}
Searches the keyword parameter on /heritages/search/result. Any Japanese text works — artist names, object names, classifications, institution names.
Scrape all items
Leave keywords empty to iterate the full archive listing (/heritages/search/result with no filter). The archive contains 136,000+ records; use maxItems to control run scope.
{"maxItems": 500}
Input schema
| Parameter | Type | Default | Description |
|---|---|---|---|
keywords | string | — | Search keyword (e.g. 仏像, 絵画, 平安). Leave blank for all items. |
maxItems | integer | 20 | Maximum number of records to scrape. |
Notes
- Respects the site's
crawl-delay: 3by capping concurrency at 3. - No authentication, no Cloudflare — government endpoint (bunka.go.jp) is fully open.
- Era normalization maps Japanese period names to lowercase Latin slugs for use in downstream pipelines.
- Images use the pattern
https://online.bunka.go.jp/heritage/<id>/_<N>/...— no auth needed. - This source is distinct from the kunishitei designation database (
kunishitei.bunka.go.jp) and the NDL jpsearch (jpsearch.go.jp). It surfaces the object-level museum records with images, not the legal designation register or the bibliographic aggregator.