Cultural Heritage Online Archive Scraper avatar

Cultural Heritage Online Archive Scraper

Pricing

Pay per event

Go to Apify Store
Cultural Heritage Online Archive Scraper

Cultural Heritage Online Archive Scraper

Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン), the Agency for Cultural Affairs' digital museum. Extracts titles, classifications, eras, genres, regions, holding institutions, and image URLs from Japan's national heritage archive.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape heritage object records from Cultural Heritage Online (文化遺産オンライン, online.bunka.go.jp) — the Agency for Cultural Affairs' digital museum of Japan's national heritage.

The actor extracts object-level records from the site's full 136,000+ item archive, searchable by keyword, classification, era, genre, and region. Each record includes title, kana reading, era, genre, region, holding institution, description, and a list of high-resolution image URLs — the imagery is the unique value of this source.

What you get

Each record includes:

FieldDescription
heritage_idUnique item ID from /heritages/detail/<id>
titleObject title (名称)
title_kanaPhonetic reading (ふりがな)
genreCategory (絵画 / 彫刻 / 工芸品 / 書跡 / etc.)
eraHistorical period in Japanese (江戸時代, 平安時代, etc.)
era_normalizedNormalised Latin slug (edo / heian / kamakura / etc.)
regionPrefecture or region (所在地域)
holderHolding institution (所蔵館)
materialMaterial and technique (材質・技法) where listed
dimensionsDimensions (法量) where listed
descriptionObject description (解説)
image_urlsArray of high-resolution image URLs
detail_urlFull URL of the detail page

Usage

{
"keywords": "仏像",
"maxItems": 100
}

Searches the keyword parameter on /heritages/search/result. Any Japanese text works — artist names, object names, classifications, institution names.

Scrape all items

Leave keywords empty to iterate the full archive listing (/heritages/search/result with no filter). The archive contains 136,000+ records; use maxItems to control run scope.

{
"maxItems": 500
}

Input schema

ParameterTypeDefaultDescription
keywordsstringSearch keyword (e.g. 仏像, 絵画, 平安). Leave blank for all items.
maxItemsinteger20Maximum number of records to scrape.

Notes

  • Respects the site's crawl-delay: 3 by capping concurrency at 3.
  • No authentication, no Cloudflare — government endpoint (bunka.go.jp) is fully open.
  • Era normalization maps Japanese period names to lowercase Latin slugs for use in downstream pipelines.
  • Images use the pattern https://online.bunka.go.jp/heritage/<id>/_<N>/... — no auth needed.
  • This source is distinct from the kunishitei designation database (kunishitei.bunka.go.jp) and the NDL jpsearch (jpsearch.go.jp). It surfaces the object-level museum records with images, not the legal designation register or the bibliographic aggregator.