Library of Congress Scraper avatar

Library of Congress Scraper

Pricing

from $14.00 / 1,000 result items

Go to Apify Store
Library of Congress Scraper

Library of Congress Scraper

Export records from the US Library of Congress catalog of 170M+ items. Search books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. Pull titles, contributors, dates, subjects, languages, image URLs, and direct catalog links.

Pricing

from $14.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

ParseForge Banner

πŸ›οΈ Library of Congress Scraper

πŸš€ Export the world's largest cultural archive in seconds. Search 170,000,000+ digitized items at the US Library of Congress across 11 format types, including books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. No login, no manual harvesting.

πŸ•’ Last updated: 2026-05-23 Β· πŸ“Š 18 fields per record Β· πŸ›οΈ 170M+ items Β· 🎞️ 11 formats Β· 🌍 multilingual catalog

The Library of Congress Scraper queries the LOC digital catalog and returns 18 structured fields per record, including title, contributors, date, subjects, languages, format, mediums, rights, repository info, and direct links to resource files and image derivatives. The LOC has been digitizing its holdings since the 1990s and exposes the world's most comprehensive open cultural catalog.

The catalog spans books and printed material, audio recordings, films, maps, manuscripts, historical newspapers, photographs, sheet music, notated music, web archives, and curated collections. This Actor returns the data as CSV, Excel, JSON, or XML in under five minutes, with year-range, language, and collection filters applied server-side.

🎯 Target AudienceπŸ’‘ Primary Use Cases
Historians, archivists, journalists, educators, documentary producers, genealogists, digital humanities researchers, museum curatorsSource primary documents, build classroom packs, enrich research databases, locate rights-cleared media, map historical newspapers, source public-domain images

πŸ“‹ What the Library of Congress Scraper does

Five archival workflows in a single run:

  • πŸ“š Format-scoped search. Pick one of 11 LOC formats (books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives, notated music, collections).
  • πŸ”Ž Keyword search. Free-text search across the chosen format.
  • 🌐 Language filter. Restrict to a single language slug (e.g. english, spanish, french, chinese, arabic).
  • πŸ“… Date range. Earliest and latest year inclusive, for time-bounded research.
  • πŸ—‚οΈ Collection filter. Restrict to a curated LOC collection slug (e.g. wpa-life-histories, civil-war-maps).

Each record includes the LOC item ID, title, description, contributor list, date, subject tags, language list, format and medium, parent collection, repository, rights statement, every resource URL (manifests, audio, video, IIIF images), and a primary image thumbnail.

πŸ’‘ Why it matters: the LOC catalog is the foundational reference for American cultural and political history. Building your own harvester means navigating multiple catalog endpoints, parsing nested metadata, and chasing pagination across millions of records. This Actor turns the entire catalog into a download.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded archive dataset.


βš™οΈ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
formatstring"books"One of 11 LOC format collections.
searchQuerystring"jazz"Free-text keyword search.
languagestring""Language slug (english, spanish, french, german, chinese, arabic, ...).
dateStart, dateEndintegernullEarliest and latest year, inclusive.
collectionstring""LOC collection slug, e.g. wpa-life-histories.

Example: 100 jazz-related sheet music items from 1920-1940.

{
"maxItems": 100,
"format": "sheet-music",
"searchQuery": "jazz",
"dateStart": 1920,
"dateEnd": 1940
}

Example: 200 Civil War era photographs.

{
"maxItems": 200,
"format": "photos",
"searchQuery": "civil war",
"dateStart": 1861,
"dateEnd": 1865
}

⚠️ Good to Know: the LOC catalog spans materials from antiquity to the present. Rights statements vary by item, so always check the rights field before reuse. Many photos, maps, and historical newspapers are public domain, while audio and film may carry contributor or estate restrictions.


πŸ“Š Output

Each record contains 18 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
πŸ–ΌοΈ imageUrlstring | null"https://tile.loc.gov/image-services/iiif/.../full/pct:12.5/0/default.jpg"
πŸ†” itemIdstring"http://www.loc.gov/item/2017660631/"
πŸ“š titlestring"Take the 'A' train"
πŸ“ descriptionarray["Sheet music with cover art..."]
πŸ‘₯ contributorsarray["Strayhorn, Billy", "Ellington, Duke"]
πŸ“… datestring | null"1941"
🏷️ subjectsarray["Jazz", "Big band music"]
🌐 languagesarray["english"]
🎞️ formatarray["sheet music"]
🎨 mediumsarray["1 score (3 pages)"]
πŸ“š collectionsarray["Music for the Nation"]
πŸ—‚οΈ partofarray[{"title":"American Song Sheets", "url":"..."}]
πŸ›οΈ repositoriesarray["Library of Congress, Music Division"]
βš–οΈ rightsstring | null"Rights Advisory: Public domain"
πŸ“¦ resourceUrlsarray["https://www.loc.gov/resource/.../mp3"]
πŸ–ΌοΈ imageUrlsarray["https://tile.loc.gov/image-services/..."]
πŸ”— urlstring"https://www.loc.gov/item/2017660631/"
πŸ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

πŸ“¦ Sample records


✨ Why choose this Actor

Capability
πŸ›οΈ170M+ item catalog. Books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives.
🎯Multi-dimensional filtering. Format, query, language, year range, and collection combine in a single run.
πŸ–ΌοΈDirect image URLs. Thumbnail plus full image derivatives via the IIIF tile service.
βš–οΈRights metadata included. Every record carries its rights statement for clean reuse.
⚑Fast. 10 items in under 5 seconds, 10,000 records in under 10 minutes.
πŸ”Always fresh. Every run hits the live LOC catalog.
🚫No authentication. Public catalog, no key required.

πŸ“Š The Library of Congress is the world's largest open cultural archive. Structured access to it powers documentaries, classrooms, podcasts, and serious historical research.


πŸ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Library of Congress Scraper (this Actor)$5 free credit, then pay-per-use170M+ itemsLive per runformat, query, language, date, collection⚑ 2 min
Commercial archive aggregators$200+/monthCurated sliceQuarterlyLimited🐒 Days
Custom OAI-PMH harvesterFree engineeringFullCron drivenHand built⏳ Weeks
One-off catalog browsingFreePer-search onlyLiveUI onlyπŸ•’ Manual

Pick this Actor when you want clean, filterable rows ready for a database, with zero parser maintenance.


πŸš€ How to use

  1. πŸ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Library of Congress Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a format, add a keyword, set optional language, year range, or collection.
  4. πŸš€ Run it. Click Start and let the Actor collect catalog records.
  5. πŸ“₯ Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


πŸ’Ό Business use cases

πŸ“Ί Documentary & Media Production

  • Locate rights-cleared archival imagery for films
  • Source historical newspaper clippings for B-roll graphics
  • Build music libraries from public-domain sheet music
  • Surface manuscript pages for on-screen quotation

πŸŽ“ Education & EdTech

  • Build classroom packs around primary sources
  • Generate AP-history flashcards with original documents
  • Curate language-arts excerpts from public-domain books
  • Stock virtual museum exhibits for K-12 platforms

πŸ“° Journalism & Fact-Checking

  • Verify historical claims against primary sources
  • Pull contemporary newspaper coverage for explainers
  • Source archival quotes for long-form features
  • Build searchable backgrounders on historic events

🌳 Genealogy & Local History

  • Surface ancestor mentions in historic newspapers
  • Map regional photo collections by place
  • Compile family history with WPA life history excerpts
  • Build town-history sites from local archive items

πŸ”Œ Automating Library of Congress Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟒 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • πŸ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream archive database in sync with new digitizations.


🌟 Beyond business use cases

Cultural archives power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

πŸŽ“ Research and academia

  • Digital humanities theses and projects
  • Quantitative history with newspaper text mining
  • Archival reproducibility with cited dataset pulls
  • Cross-collection comparative studies

🎨 Personal and creative

  • Public-domain artwork for design portfolios
  • Historical-fiction research bibles
  • Hobbyist genealogy and local-history projects
  • Creative-commons mood boards and Pinterest archives

🀝 Non-profit and civic

  • Civic history exhibits for libraries and town halls
  • Investigative journalism on historic events
  • Free educational materials for underserved schools
  • Heritage projects for immigrant and minority communities

πŸ§ͺ Experimentation

  • Train OCR and handwriting recognition models
  • Build LLM training corpora from public-domain text
  • Prototype IIIF-based viewer apps
  • Test image-classification models on historical media

πŸ€– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick one of 11 LOC formats, add a keyword and optional filters (language, year range, collection), then click Start. The Actor returns structured rows with titles, contributors, dates, subjects, rights, and direct image and resource links.

πŸ“ How complete is the metadata?

LOC metadata is curated by professional catalogers and is among the most complete in the world. Some fields like rights or description can be empty for niche items, which reflects the source record rather than a scraper gap.

πŸ” How often is the catalog refreshed?

The LOC adds and updates records continuously. Every Actor run hits the live catalog, so new digitizations and edits appear in your dataset right away.

🎞️ Which formats are supported?

Books and printed material, audio recordings, film and video, maps, manuscripts, newspapers (Chronicling America), photos and prints, sheet music, notated music, web archives, and curated collections.

πŸ–ΌοΈ Are image URLs returned?

Yes. Most visual items expose a primary thumbnail in imageUrl plus higher-resolution derivatives in imageUrls via the LOC IIIF tile service.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (daily, weekly, monthly).

LOC catalog metadata is public. Item rights vary, so check the rights field before reusing media. Many photographs, historical newspapers, and pre-1929 materials are in the public domain in the United States.

πŸ’Ό Can I use this data commercially?

Catalog metadata, yes. Individual media files depend on the rights statement returned per item. Always honor the rights field.

πŸ’³ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

πŸ” What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

πŸ†˜ What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.


πŸ”Œ Integrate with any app

Library of Congress Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe archive data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh records into your archive database or alert your editorial team in Slack.


πŸ’‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


πŸ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the US Library of Congress. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. Honor each item's individual rights statement.