SPDX Software Licenses Scraper avatar

SPDX Software Licenses Scraper

Pricing

from $10.00 / 1,000 result items

Go to Apify Store
SPDX Software Licenses Scraper

SPDX Software Licenses Scraper

Pull the SPDX License List with standardized identifiers: license ID, full name, OSI approved flag, FSF libre flag, deprecated status, reference URL, text, and cross references. Export to JSON, CSV, or Excel for SBOM, open source compliance, license scanning, and supply chain audits.

Pricing

from $10.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

📜 SPDX Software Licenses Scraper

🚀 Export the entire SPDX open-source license catalog in seconds. Pull 729 software licenses and 72 license-applicable exceptions with full legal text, OSI / FSF status, deprecation flags, and cross-references. No API key, no scraping rate limits, no manual list maintenance.

🕒 Last updated: 2026-05-23 · 📊 13 fields per record · 📜 729 licenses + 72 exceptions · ⚖️ OSI / FSF flags · 🧾 Full standard license text

The SPDX Software Licenses Scraper exports the canonical SPDX License List, the open-source licensing standard adopted by GitHub, npm, Maven, PyPI, the Linux Foundation, and every major SBOM tool on the market. Each record contains the SPDX short identifier, the official license name, OSI approval status, FSF Free / Libre status, deprecation flag, the standard license template, and the full license body text.

The catalog covers MIT, Apache 2.0, GPL family, BSD family, MPL, EPL, Creative Commons, and every other recognized open-source license, plus license-applicable exceptions like Classpath, Bison, Autoconf, and GCC Runtime. Use it to bootstrap a license-compliance database, feed an SBOM generator, drive a legal review pipeline, or audit dependencies against a permitted-licenses allowlist.

🎯 Target Audience💡 Primary Use Cases
Open-source compliance teams, legal counsel, SBOM tool builders, security engineers, package registry maintainers, DevSecOps platformsLicense allowlist / denylist enforcement, SBOM enrichment, GPL / AGPL detection, OSI vs proprietary classification, full-text license diff and audit

📋 What the SPDX Software Licenses Scraper does

Three catalog modes in a single Actor:

  • 📜 Licenses mode. All 729 SPDX-recognized open-source licenses with full text.
  • ⚖️ Exceptions mode. All 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, etc.).
  • 🔎 Single license mode. Pull one license by SPDX short ID for spot lookups.

Optional filters: OSI Approved only, FSF Free / Libre only, Include deprecated, and Include full license text for body-text export.

Each record includes the SPDX short identifier, official name, OSI / FSF flags, deprecation status, reference URL, see-also cross-references, and the full standard license template plus license body.

💡 Why it matters: the SPDX License List is the single source of truth for open-source license identification across SBOMs, package registries, and compliance workflows. Building your own scraper means tracking semantic-version bumps of the canonical list, parsing dual JSON layers, and stitching detail bodies back to the index. This Actor skips all of that.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded license dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"licenses"One of licenses, exceptions, or single.
licenseIdstring""SPDX short identifier. Only used when mode = single. 729 enum values available.
fsfLibrebooleanfalseKeep only licenses the FSF marks as Free / Libre.
osiApprovedbooleanfalseKeep only OSI-approved licenses.
deprecatedbooleanfalseInclude licenses SPDX has marked deprecated.
includeTextbooleantruePull the full license body text. Adds one HTTP request per license.

Example: 50 OSI-approved licenses with full text.

{
"maxItems": 50,
"mode": "licenses",
"osiApproved": true,
"includeText": true
}

Example: lookup a single license by SPDX ID.

{
"mode": "single",
"licenseId": "Apache-2.0",
"includeText": true
}

⚠️ Good to Know: SPDX is updated regularly by the SPDX legal team. Every run pulls the latest catalog from the canonical source, so the dataset reflects the current listVersion at run time.


📊 Output

Each record contains 13 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 licenseIdstring"Apache-2.0"
🏷️ namestring"Apache License 2.0"
📂 kindstring"license" or "exception"
isOsiApprovedboolean | nulltrue
🆓 isFsfLibreboolean | nulltrue
isDeprecatedLicenseIdboolean | nullfalse
🔗 referenceUrlstring | null"https://spdx.org/licenses/Apache-2.0.html"
📦 detailsUrlstring | null"https://spdx.org/licenses/Apache-2.0.json"
🔁 seeAlsoarray | null["http://www.apache.org/licenses/LICENSE-2.0"]
📜 standardLicenseTemplatestring | null"<<beginOptional>>Apache License..."
📄 licenseTextstring | nullFull license body
🏷️ listVersionstring | null"3.27.0"
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📜Complete coverage. All 729 SPDX licenses and 72 license-applicable exceptions in a single pull.
⚖️OSI + FSF flags. Filter by OSI Approved, FSF Libre, or deprecated status in one click.
📄Full license text. Body text plus the SPDX standard template for exact-match scanning.
🔗Cross-references. seeAlso URLs plug into your existing license-source tooling.
🔁Always fresh. Every run pulls the latest published SPDX list version.
Fast. 50 licenses with full text in under a minute.
🚫No authentication. Works with public open-source licensing data. No login, no key.

📊 The SPDX License List is cited by GitHub, npm, Maven, PyPI, RubyGems, the Linux Foundation, and almost every SBOM standard (CycloneDX, SPDX SBOMs, SWID).


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ SPDX Software Licenses Scraper (this Actor)$5 free credit, then pay-per-use729 + 72Live per runOSI / FSF / deprecated / text⚡ 2 min
Hand-cloned SPDX repoFreeManual upkeepGit pull requiredNone🐢 Hours
Commercial license scanners$$$/seatBundledVendor cycleVendor-defined⏳ Days
Wikipedia license tablesFreePartial, staleSporadicNone🕒 Variable

Pick this Actor when you want the canonical SPDX dataset, downloadable as CSV / Excel / JSON / XML, with no pipeline maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the SPDX Software Licenses Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a mode (licenses / exceptions / single), apply optional OSI / FSF filters, and set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🛡️ Open-Source Compliance

  • Build a permitted-licenses allowlist
  • Detect GPL / AGPL contamination in dependencies
  • Enforce OSI-approved-only policy at CI time
  • Maintain a license catalog for audit trails

🧾 SBOM Tooling

  • Enrich SBOM records with canonical SPDX IDs
  • Map vendor license strings to standard identifiers
  • Add license text snapshots to SPDX 2.3 / 3.x docs
  • Drive CycloneDX licenses[] arrays automatically
  • Side-by-side license diffs for redlining
  • Lookup license obligations by SPDX ID
  • Build internal counsel knowledge bases
  • Speed-run M&A open-source audits

🛠️ Package Registry / Platform

  • Show canonical license labels on package pages
  • Map legacy license strings to SPDX IDs
  • Power "compatible-with" license advisors
  • Surface deprecated identifier warnings

🔌 Automating SPDX Licenses Scraper

Control the scraper programmatically for scheduled refreshes and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly refreshes keep your internal license database in sync with the canonical SPDX list.


🌟 Beyond business use cases

Open-source licensing data powers more than commercial workflows. The same structured records support research, education, civic transparency, and personal initiatives.

🎓 Research and academia

  • Empirical studies on license adoption trends
  • Reproducible OSS-policy research with cited dataset pulls
  • Coursework on open-source licensing for CS / law schools
  • Cross-license compatibility matrices for thesis work

🎨 Personal and creative

  • Personal dependency-audit dashboards
  • Indie tooling for license-check pre-commit hooks
  • Side projects that match dependency licenses to allowlists
  • Hobbyist OSS-compliance experiments

🤝 Non-profit and civic

  • Public-interest open-source compliance audits
  • Library and museum software inventory reviews
  • Civic-tech projects with transparent license trails
  • Open-data initiatives requiring open license filtering

🧪 Experimentation

  • Train classifiers that map free-text license blurbs to SPDX IDs
  • Prototype LLM agents that explain license obligations
  • Build "license diff" tools comparing versions side-by-side
  • Test SBOM ingestion pipelines on realistic license data

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a mode (licenses, exceptions, or single), apply optional OSI / FSF / deprecation filters, and click Start. The Actor pulls the canonical SPDX list, optionally fetches per-license detail bodies, and emits one structured record per license.

📏 How accurate is the data?

This Actor mirrors the canonical SPDX License List exactly. SPDX IDs, names, OSI / FSF flags, and license text all come straight from the upstream catalog without modification.

🔁 How often is the dataset refreshed?

The SPDX legal team publishes new versions regularly. Every run of this Actor pulls the latest version, so your dataset always reflects the current listVersion.

⚖️ Does it include license-applicable exceptions?

Yes. Set the Mode filter to exceptions to pull all 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, and more).

📄 Can I get the full license text?

Yes. The Include full license text toggle is on by default. Each record includes both the standard license template (used for canonical matching) and the human-readable body text.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to refresh your internal SPDX cache on a weekly or monthly cron, so your compliance pipeline always reflects the current list version.

SPDX publishes the License List under a permissive open license (CC0-1.0). You can use, redistribute, and embed the dataset in your own products without restriction.

💼 Can I use this data commercially?

Yes. The underlying SPDX License List is published under CC0-1.0, which permits commercial use. You are responsible for complying with the licenses you discover in your own dependency tree.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

SPDX Software Licenses Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe SPDX records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push a fresh license catalog into your CI license-check, or alert your compliance team in Slack on every new SPDX version.


💡 Pro Tip: browse the complete ParseForge collection for more developer-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the SPDX project, the Linux Foundation, the OSI, or the FSF. All trademarks mentioned are the property of their respective owners. Only publicly available open license data is collected.