Library of Congress Scraper avatar

Library of Congress Scraper

Pricing

from $14.00 / 1,000 result items

Go to Apify Store
Library of Congress Scraper

Library of Congress Scraper

Export records from the US Library of Congress catalog of 170M+ items. Search books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. Pull titles, contributors, dates, subjects, languages, image URLs, and direct catalog links.

Pricing

from $14.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

23 days ago

Last modified

Categories

Share

ParseForge Banner

๐Ÿ›๏ธ Library of Congress Scraper

๐Ÿš€ Export the world's largest cultural archive in seconds. Search 170,000,000+ digitized items at the US Library of Congress across 11 format types, including books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. No login, no manual harvesting.

๐Ÿ•’ Last updated: 2026-05-23 ยท ๐Ÿ“Š 18 fields per record ยท ๐Ÿ›๏ธ 170M+ items ยท ๐ŸŽž๏ธ 11 formats ยท ๐ŸŒ multilingual catalog

The Library of Congress Scraper queries the LOC digital catalog and returns 18 structured fields per record, including title, contributors, date, subjects, languages, format, mediums, rights, repository info, and direct links to resource files and image derivatives. The LOC has been digitizing its holdings since the 1990s and exposes the world's most comprehensive open cultural catalog.

The catalog spans books and printed material, audio recordings, films, maps, manuscripts, historical newspapers, photographs, sheet music, notated music, web archives, and curated collections. This Actor returns the data as CSV, Excel, JSON, or XML in under five minutes, with year-range, language, and collection filters applied server-side.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Historians, archivists, journalists, educators, documentary producers, genealogists, digital humanities researchers, museum curatorsSource primary documents, build classroom packs, enrich research databases, locate rights-cleared media, map historical newspapers, source public-domain images

๐Ÿ“‹ What the Library of Congress Scraper does

Five archival workflows in a single run:

  • ๐Ÿ“š Format-scoped search. Pick one of 11 LOC formats (books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives, notated music, collections).
  • ๐Ÿ”Ž Keyword search. Free-text search across the chosen format.
  • ๐ŸŒ Language filter. Restrict to a single language slug (e.g. english, spanish, french, chinese, arabic).
  • ๐Ÿ“… Date range. Earliest and latest year inclusive, for time-bounded research.
  • ๐Ÿ—‚๏ธ Collection filter. Restrict to a curated LOC collection slug (e.g. wpa-life-histories, civil-war-maps).

Each record includes the LOC item ID, title, description, contributor list, date, subject tags, language list, format and medium, parent collection, repository, rights statement, every resource URL (manifests, audio, video, IIIF images), and a primary image thumbnail.

๐Ÿ’ก Why it matters: the LOC catalog is the foundational reference for American cultural and political history. Building your own harvester means navigating multiple catalog endpoints, parsing nested metadata, and chasing pagination across millions of records. This Actor turns the entire catalog into a download.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded archive dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
formatstring"books"One of 11 LOC format collections.
searchQuerystring"jazz"Free-text keyword search.
languagestring""Language slug (english, spanish, french, german, chinese, arabic, ...).
dateStart, dateEndintegernullEarliest and latest year, inclusive.
collectionstring""LOC collection slug, e.g. wpa-life-histories.

Example: 100 jazz-related sheet music items from 1920-1940.

{
"maxItems": 100,
"format": "sheet-music",
"searchQuery": "jazz",
"dateStart": 1920,
"dateEnd": 1940
}

Example: 200 Civil War era photographs.

{
"maxItems": 200,
"format": "photos",
"searchQuery": "civil war",
"dateStart": 1861,
"dateEnd": 1865
}

โš ๏ธ Good to Know: the LOC catalog spans materials from antiquity to the present. Rights statements vary by item, so always check the rights field before reuse. Many photos, maps, and historical newspapers are public domain, while audio and film may carry contributor or estate restrictions.


๐Ÿ“Š Output

Each record contains 18 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ–ผ๏ธ imageUrlstring | null"https://tile.loc.gov/image-services/iiif/.../full/pct:12.5/0/default.jpg"
๐Ÿ†” itemIdstring"http://www.loc.gov/item/2017660631/"
๐Ÿ“š titlestring"Take the 'A' train"
๐Ÿ“ descriptionarray["Sheet music with cover art..."]
๐Ÿ‘ฅ contributorsarray["Strayhorn, Billy", "Ellington, Duke"]
๐Ÿ“… datestring | null"1941"
๐Ÿท๏ธ subjectsarray["Jazz", "Big band music"]
๐ŸŒ languagesarray["english"]
๐ŸŽž๏ธ formatarray["sheet music"]
๐ŸŽจ mediumsarray["1 score (3 pages)"]
๐Ÿ“š collectionsarray["Music for the Nation"]
๐Ÿ—‚๏ธ partofarray[{"title":"American Song Sheets", "url":"..."}]
๐Ÿ›๏ธ repositoriesarray["Library of Congress, Music Division"]
โš–๏ธ rightsstring | null"Rights Advisory: Public domain"
๐Ÿ“ฆ resourceUrlsarray["https://www.loc.gov/resource/.../mp3"]
๐Ÿ–ผ๏ธ imageUrlsarray["https://tile.loc.gov/image-services/..."]
๐Ÿ”— urlstring"https://www.loc.gov/item/2017660631/"
๐Ÿ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ›๏ธ170M+ item catalog. Books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives.
๐ŸŽฏMulti-dimensional filtering. Format, query, language, year range, and collection combine in a single run.
๐Ÿ–ผ๏ธDirect image URLs. Thumbnail plus full image derivatives via the IIIF tile service.
โš–๏ธRights metadata included. Every record carries its rights statement for clean reuse.
โšกFast. 10 items in under 5 seconds, 10,000 records in under 10 minutes.
๐Ÿ”Always fresh. Every run hits the live LOC catalog.
๐ŸšซNo authentication. Public catalog, no key required.

๐Ÿ“Š The Library of Congress is the world's largest open cultural archive. Structured access to it powers documentaries, classrooms, podcasts, and serious historical research.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ Library of Congress Scraper (this Actor)$5 free credit, then pay-per-use170M+ itemsLive per runformat, query, language, date, collectionโšก 2 min
Commercial archive aggregators$200+/monthCurated sliceQuarterlyLimited๐Ÿข Days
Custom OAI-PMH harvesterFree engineeringFullCron drivenHand builtโณ Weeks
One-off catalog browsingFreePer-search onlyLiveUI only๐Ÿ•’ Manual

Pick this Actor when you want clean, filterable rows ready for a database, with zero parser maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the Library of Congress Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick a format, add a keyword, set optional language, year range, or collection.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect catalog records.
  5. ๐Ÿ“ฅ Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ“บ Documentary & Media Production

  • Locate rights-cleared archival imagery for films
  • Source historical newspaper clippings for B-roll graphics
  • Build music libraries from public-domain sheet music
  • Surface manuscript pages for on-screen quotation

๐ŸŽ“ Education & EdTech

  • Build classroom packs around primary sources
  • Generate AP-history flashcards with original documents
  • Curate language-arts excerpts from public-domain books
  • Stock virtual museum exhibits for K-12 platforms

๐Ÿ“ฐ Journalism & Fact-Checking

  • Verify historical claims against primary sources
  • Pull contemporary newspaper coverage for explainers
  • Source archival quotes for long-form features
  • Build searchable backgrounders on historic events

๐ŸŒณ Genealogy & Local History

  • Surface ancestor mentions in historic newspapers
  • Map regional photo collections by place
  • Compile family history with WPA life history excerpts
  • Build town-history sites from local archive items

๐Ÿ”Œ Automating Library of Congress Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream archive database in sync with new digitizations.


๐ŸŒŸ Beyond business use cases

Cultural archives power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Digital humanities theses and projects
  • Quantitative history with newspaper text mining
  • Archival reproducibility with cited dataset pulls
  • Cross-collection comparative studies

๐ŸŽจ Personal and creative

  • Public-domain artwork for design portfolios
  • Historical-fiction research bibles
  • Hobbyist genealogy and local-history projects
  • Creative-commons mood boards and Pinterest archives

๐Ÿค Non-profit and civic

  • Civic history exhibits for libraries and town halls
  • Investigative journalism on historic events
  • Free educational materials for underserved schools
  • Heritage projects for immigrant and minority communities

๐Ÿงช Experimentation

  • Train OCR and handwriting recognition models
  • Build LLM training corpora from public-domain text
  • Prototype IIIF-based viewer apps
  • Test image-classification models on historical media

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Pick one of 11 LOC formats, add a keyword and optional filters (language, year range, collection), then click Start. The Actor returns structured rows with titles, contributors, dates, subjects, rights, and direct image and resource links.

๐Ÿ“ How complete is the metadata?

LOC metadata is curated by professional catalogers and is among the most complete in the world. Some fields like rights or description can be empty for niche items, which reflects the source record rather than a scraper gap.

๐Ÿ” How often is the catalog refreshed?

The LOC adds and updates records continuously. Every Actor run hits the live catalog, so new digitizations and edits appear in your dataset right away.

๐ŸŽž๏ธ Which formats are supported?

Books and printed material, audio recordings, film and video, maps, manuscripts, newspapers (Chronicling America), photos and prints, sheet music, notated music, web archives, and curated collections.

๐Ÿ–ผ๏ธ Are image URLs returned?

Yes. Most visual items expose a primary thumbnail in imageUrl plus higher-resolution derivatives in imageUrls via the LOC IIIF tile service.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (daily, weekly, monthly).

LOC catalog metadata is public. Item rights vary, so check the rights field before reusing media. Many photographs, historical newspapers, and pre-1929 materials are in the public domain in the United States.

๐Ÿ’ผ Can I use this data commercially?

Catalog metadata, yes. Individual media files depend on the rights statement returned per item. Always honor the rights field.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

๐Ÿ” What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

๐Ÿ†˜ What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.


๐Ÿ”Œ Integrate with any app

Library of Congress Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe archive data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh records into your archive database or alert your editorial team in Slack.


๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the US Library of Congress. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. Honor each item's individual rights statement.