IMSLP Public Domain Sheet Music Scraper avatar

IMSLP Public Domain Sheet Music Scraper

Pricing

Pay per event

Go to Apify Store
IMSLP Public Domain Sheet Music Scraper

IMSLP Public Domain Sheet Music Scraper

Scrape the full IMSLP public-domain score catalog — 230k+ works across 24k composers, with file URLs, copyright tags, and work metadata via the IMSLP worklist API and MediaWiki API.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 days ago

Last modified

Share

Walk the full IMSLP catalog and pull structured data on 230,000+ musical works, 24,000+ composers, and their associated score files — all public-domain by construction.

IMSLP has two public APIs. This scraper uses both. The worklist API delivers the complete work index at 1,000 records per page. The MediaWiki API fills in per-work details: key, genre, instrumentation, composition year, and the file manifest with direct PDF download links. You can run fast (worklist-only, no detail calls) or complete (full enrichment). Both modes respect the site's request etiquette.

What You Get

Each record covers one musical work.

FieldTypeDescription
work_idstringIMSLP/MediaWiki page ID
work_titlestringWork title as listed on IMSLP
composerstringComposer full name
composer_slugstringIMSLP category identifier
opus_cataloguestringOp., BWV, K., or other catalogue number
genrestringPiece style and genre (e.g. "Baroque — fugues")
instrumentationstringScored for (e.g. "piano", "2 violins, viola, cello")
keystringMusical key
composition_yearstringYear or date of composition
first_publicationstringYear of first publication
score_filesstringJSON array of score PDFs with filename, description, file URL, copyright, editor
parts_filesstringJSON array of parts PDFs (same structure)
arrangementsstringJSON array of arrangement PDFs (same structure)
copyright_statusstringCopyright tag from IMSLP (almost always "Public Domain")
licensestringSpecific license
imslp_urlstringCanonical IMSLP work page URL
scraped_atstringISO 8601 timestamp

File arrays are JSON-encoded strings. Each entry has: filename, description, editor, copyright, file_url.

Input

ParameterTypeDefaultDescription
maxItemsinteger10Maximum works to return
includeFileDetailsbooleantrueFetch MediaWiki API for file lists, key, genre, instrumentation. Disable for faster bulk exports — you get the catalog skeleton without per-work details.
composerFilterstringOptional composer name filter (e.g. "Bach, Johann Sebastian"). Leave blank for the full catalog.

File Detail Mode

When includeFileDetails is enabled, the scraper makes one additional MediaWiki API call per work to parse the work's wikitext. This populates score_files, parts_files, arrangements, instrumentation, key, genre, composition_year, and first_publication. It also adds ~200ms per record to the run time. For full-catalog exports where you only need the work index, disable it.

Coverage

IMSLP's public-domain mandate is not a coincidence. The library was built specifically to host scores where the copyright has expired or been dedicated to the public domain. The copyright_status field reflects IMSLP's own tagging — but the corpus is the corpus because legal reviews are baked in at submission time.

Score file URLs point to imslp.org/wiki/Special:ReverseLookup/<filename>, which resolves to the PDF download. These are the same URLs end users click in the IMSLP UI.

Use Cases

  • Build a searchable public-domain score database
  • Feed OMR (optical music recognition) or generative music training pipelines
  • Music education platforms that need structured work metadata
  • Digital library catalogs with direct PDF access
  • Composer or instrumentation research at scale

Data Volume

The full catalog is approximately 230,000 works. Without a composer filter and with includeFileDetails enabled, a complete run takes several hours due to polite pacing between MediaWiki API calls. Use composerFilter to scope to a specific composer, or set includeFileDetails: false for a fast full-catalog index run.


Built by OrbTop. Data sourced from IMSLP via its public APIs.