Library of Congress Scraper
Pricing
from $14.00 / 1,000 result items
Library of Congress Scraper
Export records from the US Library of Congress catalog of 170M+ items. Search books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. Pull titles, contributors, dates, subjects, languages, image URLs, and direct catalog links.
Pricing
from $14.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share

ποΈ Library of Congress Scraper
π Export the world's largest cultural archive in seconds. Search 170,000,000+ digitized items at the US Library of Congress across 11 format types, including books, audio, film, maps, manuscripts, newspapers, photos, sheet music, and web archives. No login, no manual harvesting.
π Last updated: 2026-05-23 Β· π 18 fields per record Β· ποΈ 170M+ items Β· ποΈ 11 formats Β· π multilingual catalog
The Library of Congress Scraper queries the LOC digital catalog and returns 18 structured fields per record, including title, contributors, date, subjects, languages, format, mediums, rights, repository info, and direct links to resource files and image derivatives. The LOC has been digitizing its holdings since the 1990s and exposes the world's most comprehensive open cultural catalog.
The catalog spans books and printed material, audio recordings, films, maps, manuscripts, historical newspapers, photographs, sheet music, notated music, web archives, and curated collections. This Actor returns the data as CSV, Excel, JSON, or XML in under five minutes, with year-range, language, and collection filters applied server-side.
| π― Target Audience | π‘ Primary Use Cases |
|---|---|
| Historians, archivists, journalists, educators, documentary producers, genealogists, digital humanities researchers, museum curators | Source primary documents, build classroom packs, enrich research databases, locate rights-cleared media, map historical newspapers, source public-domain images |
π What the Library of Congress Scraper does
Five archival workflows in a single run:
- π Format-scoped search. Pick one of 11 LOC formats (books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives, notated music, collections).
- π Keyword search. Free-text search across the chosen format.
- π Language filter. Restrict to a single language slug (e.g. english, spanish, french, chinese, arabic).
- π Date range. Earliest and latest year inclusive, for time-bounded research.
- ποΈ Collection filter. Restrict to a curated LOC collection slug (e.g.
wpa-life-histories,civil-war-maps).
Each record includes the LOC item ID, title, description, contributor list, date, subject tags, language list, format and medium, parent collection, repository, rights statement, every resource URL (manifests, audio, video, IIIF images), and a primary image thumbnail.
π‘ Why it matters: the LOC catalog is the foundational reference for American cultural and political history. Building your own harvester means navigating multiple catalog endpoints, parsing nested metadata, and chasing pagination across millions of records. This Actor turns the entire catalog into a download.
π¬ Full Demo
π§ Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded archive dataset.
βοΈ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
| format | string | "books" | One of 11 LOC format collections. |
| searchQuery | string | "jazz" | Free-text keyword search. |
| language | string | "" | Language slug (english, spanish, french, german, chinese, arabic, ...). |
| dateStart, dateEnd | integer | null | Earliest and latest year, inclusive. |
| collection | string | "" | LOC collection slug, e.g. wpa-life-histories. |
Example: 100 jazz-related sheet music items from 1920-1940.
{"maxItems": 100,"format": "sheet-music","searchQuery": "jazz","dateStart": 1920,"dateEnd": 1940}
Example: 200 Civil War era photographs.
{"maxItems": 200,"format": "photos","searchQuery": "civil war","dateStart": 1861,"dateEnd": 1865}
β οΈ Good to Know: the LOC catalog spans materials from antiquity to the present. Rights statements vary by item, so always check the
rightsfield before reuse. Many photos, maps, and historical newspapers are public domain, while audio and film may carry contributor or estate restrictions.
π Output
Each record contains 18 fields. Download the dataset as CSV, Excel, JSON, or XML.
π§Ύ Schema
| Field | Type | Example |
|---|---|---|
πΌοΈ imageUrl | string | null | "https://tile.loc.gov/image-services/iiif/.../full/pct:12.5/0/default.jpg" |
π itemId | string | "http://www.loc.gov/item/2017660631/" |
π title | string | "Take the 'A' train" |
π description | array | ["Sheet music with cover art..."] |
π₯ contributors | array | ["Strayhorn, Billy", "Ellington, Duke"] |
π
date | string | null | "1941" |
π·οΈ subjects | array | ["Jazz", "Big band music"] |
π languages | array | ["english"] |
ποΈ format | array | ["sheet music"] |
π¨ mediums | array | ["1 score (3 pages)"] |
π collections | array | ["Music for the Nation"] |
ποΈ partof | array | [{"title":"American Song Sheets", "url":"..."}] |
ποΈ repositories | array | ["Library of Congress, Music Division"] |
βοΈ rights | string | null | "Rights Advisory: Public domain" |
π¦ resourceUrls | array | ["https://www.loc.gov/resource/.../mp3"] |
πΌοΈ imageUrls | array | ["https://tile.loc.gov/image-services/..."] |
π url | string | "https://www.loc.gov/item/2017660631/" |
π scrapedAt | ISO 8601 | "2026-05-23T00:00:00.000Z" |
π¦ Sample records
β¨ Why choose this Actor
| Capability | |
|---|---|
| ποΈ | 170M+ item catalog. Books, audio, film, maps, manuscripts, newspapers, photos, sheet music, web archives. |
| π― | Multi-dimensional filtering. Format, query, language, year range, and collection combine in a single run. |
| πΌοΈ | Direct image URLs. Thumbnail plus full image derivatives via the IIIF tile service. |
| βοΈ | Rights metadata included. Every record carries its rights statement for clean reuse. |
| β‘ | Fast. 10 items in under 5 seconds, 10,000 records in under 10 minutes. |
| π | Always fresh. Every run hits the live LOC catalog. |
| π« | No authentication. Public catalog, no key required. |
π The Library of Congress is the world's largest open cultural archive. Structured access to it powers documentaries, classrooms, podcasts, and serious historical research.
π How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| β Library of Congress Scraper (this Actor) | $5 free credit, then pay-per-use | 170M+ items | Live per run | format, query, language, date, collection | β‘ 2 min |
| Commercial archive aggregators | $200+/month | Curated slice | Quarterly | Limited | π’ Days |
| Custom OAI-PMH harvester | Free engineering | Full | Cron driven | Hand built | β³ Weeks |
| One-off catalog browsing | Free | Per-search only | Live | UI only | π Manual |
Pick this Actor when you want clean, filterable rows ready for a database, with zero parser maintenance.
π How to use
- π Sign up. Create a free account with $5 credit (takes 2 minutes).
- π Open the Actor. Go to the Library of Congress Scraper page on the Apify Store.
- π― Set input. Pick a format, add a keyword, set optional language, year range, or collection.
- π Run it. Click Start and let the Actor collect catalog records.
- π₯ Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.
β±οΈ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
πΌ Business use cases
π Automating Library of Congress Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- π’ Node.js. Install the
apify-clientNPM package. - π Python. Use the
apify-clientPyPI package. - π See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream archive database in sync with new digitizations.
π Beyond business use cases
Cultural archives power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
π€ Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- π¬ ChatGPT
- π§ Claude
- π Perplexity
- π Copilot
β Frequently Asked Questions
π§© How does it work?
Pick one of 11 LOC formats, add a keyword and optional filters (language, year range, collection), then click Start. The Actor returns structured rows with titles, contributors, dates, subjects, rights, and direct image and resource links.
π How complete is the metadata?
LOC metadata is curated by professional catalogers and is among the most complete in the world. Some fields like rights or description can be empty for niche items, which reflects the source record rather than a scraper gap.
π How often is the catalog refreshed?
The LOC adds and updates records continuously. Every Actor run hits the live catalog, so new digitizations and edits appear in your dataset right away.
ποΈ Which formats are supported?
Books and printed material, audio recordings, film and video, maps, manuscripts, newspapers (Chronicling America), photos and prints, sheet music, notated music, web archives, and curated collections.
πΌοΈ Are image URLs returned?
Yes. Most visual items expose a primary thumbnail in imageUrl plus higher-resolution derivatives in imageUrls via the LOC IIIF tile service.
β° Can I schedule regular runs?
Yes. Use Apify Schedules to trigger this Actor on any cron interval (daily, weekly, monthly).
βοΈ Is this data legal to use?
LOC catalog metadata is public. Item rights vary, so check the rights field before reusing media. Many photographs, historical newspapers, and pre-1929 materials are in the public domain in the United States.
πΌ Can I use this data commercially?
Catalog metadata, yes. Individual media files depend on the rights statement returned per item. Always honor the rights field.
π³ Do I need a paid Apify plan to use this Actor?
No. The free Apify plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.
π What happens if a run fails or gets interrupted?
Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.
π What if I need help?
Our support team is here. Use the Apify platform messaging or the Tally form linked below.
π Integrate with any app
Library of Congress Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe archive data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh records into your archive database or alert your editorial team in Slack.
π Recommended Actors
- π LibriVox Audiobooks Scraper - Public-domain audiobooks with reader credits
- π£οΈ Tatoeba Sentence Corpus Scraper - 12M+ multilingual example sentences
- π¨ Met Museum Scraper - Open-access artworks from The Met
- π° ArXiv Scraper - Academic preprints with metadata
- π Figshare Scraper - Open research datasets and figures
π‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
π Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
β οΈ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the US Library of Congress. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. Honor each item's individual rights statement.