SPDX Software Licenses Scraper
Pricing
from $10.00 / 1,000 result items
SPDX Software Licenses Scraper
Pull the SPDX License List with standardized identifiers: license ID, full name, OSI approved flag, FSF libre flag, deprecated status, reference URL, text, and cross references. Export to JSON, CSV, or Excel for SBOM, open source compliance, license scanning, and supply chain audits.
Pricing
from $10.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share

📜 SPDX Software Licenses Scraper
🚀 Export the entire SPDX open-source license catalog in seconds. Pull 729 software licenses and 72 license-applicable exceptions with full legal text, OSI / FSF status, deprecation flags, and cross-references. No API key, no scraping rate limits, no manual list maintenance.
🕒 Last updated: 2026-05-23 · 📊 13 fields per record · 📜 729 licenses + 72 exceptions · ⚖️ OSI / FSF flags · 🧾 Full standard license text
The SPDX Software Licenses Scraper exports the canonical SPDX License List, the open-source licensing standard adopted by GitHub, npm, Maven, PyPI, the Linux Foundation, and every major SBOM tool on the market. Each record contains the SPDX short identifier, the official license name, OSI approval status, FSF Free / Libre status, deprecation flag, the standard license template, and the full license body text.
The catalog covers MIT, Apache 2.0, GPL family, BSD family, MPL, EPL, Creative Commons, and every other recognized open-source license, plus license-applicable exceptions like Classpath, Bison, Autoconf, and GCC Runtime. Use it to bootstrap a license-compliance database, feed an SBOM generator, drive a legal review pipeline, or audit dependencies against a permitted-licenses allowlist.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Open-source compliance teams, legal counsel, SBOM tool builders, security engineers, package registry maintainers, DevSecOps platforms | License allowlist / denylist enforcement, SBOM enrichment, GPL / AGPL detection, OSI vs proprietary classification, full-text license diff and audit |
📋 What the SPDX Software Licenses Scraper does
Three catalog modes in a single Actor:
- 📜 Licenses mode. All 729 SPDX-recognized open-source licenses with full text.
- ⚖️ Exceptions mode. All 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, etc.).
- 🔎 Single license mode. Pull one license by SPDX short ID for spot lookups.
Optional filters: OSI Approved only, FSF Free / Libre only, Include deprecated, and Include full license text for body-text export.
Each record includes the SPDX short identifier, official name, OSI / FSF flags, deprecation status, reference URL, see-also cross-references, and the full standard license template plus license body.
💡 Why it matters: the SPDX License List is the single source of truth for open-source license identification across SBOMs, package registries, and compliance workflows. Building your own scraper means tracking semantic-version bumps of the canonical list, parsing dual JSON layers, and stitching detail bodies back to the index. This Actor skips all of that.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded license dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
| mode | string | "licenses" | One of licenses, exceptions, or single. |
| licenseId | string | "" | SPDX short identifier. Only used when mode = single. 729 enum values available. |
| fsfLibre | boolean | false | Keep only licenses the FSF marks as Free / Libre. |
| osiApproved | boolean | false | Keep only OSI-approved licenses. |
| deprecated | boolean | false | Include licenses SPDX has marked deprecated. |
| includeText | boolean | true | Pull the full license body text. Adds one HTTP request per license. |
Example: 50 OSI-approved licenses with full text.
{"maxItems": 50,"mode": "licenses","osiApproved": true,"includeText": true}
Example: lookup a single license by SPDX ID.
{"mode": "single","licenseId": "Apache-2.0","includeText": true}
⚠️ Good to Know: SPDX is updated regularly by the SPDX legal team. Every run pulls the latest catalog from the canonical source, so the dataset reflects the current
listVersionat run time.
📊 Output
Each record contains 13 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 licenseId | string | "Apache-2.0" |
🏷️ name | string | "Apache License 2.0" |
📂 kind | string | "license" or "exception" |
✅ isOsiApproved | boolean | null | true |
🆓 isFsfLibre | boolean | null | true |
⛔ isDeprecatedLicenseId | boolean | null | false |
🔗 referenceUrl | string | null | "https://spdx.org/licenses/Apache-2.0.html" |
📦 detailsUrl | string | null | "https://spdx.org/licenses/Apache-2.0.json" |
🔁 seeAlso | array | null | ["http://www.apache.org/licenses/LICENSE-2.0"] |
📜 standardLicenseTemplate | string | null | "<<beginOptional>>Apache License..." |
📄 licenseText | string | null | Full license body |
🏷️ listVersion | string | null | "3.27.0" |
🕒 scrapedAt | ISO 8601 | "2026-05-23T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📜 | Complete coverage. All 729 SPDX licenses and 72 license-applicable exceptions in a single pull. |
| ⚖️ | OSI + FSF flags. Filter by OSI Approved, FSF Libre, or deprecated status in one click. |
| 📄 | Full license text. Body text plus the SPDX standard template for exact-match scanning. |
| 🔗 | Cross-references. seeAlso URLs plug into your existing license-source tooling. |
| 🔁 | Always fresh. Every run pulls the latest published SPDX list version. |
| ⚡ | Fast. 50 licenses with full text in under a minute. |
| 🚫 | No authentication. Works with public open-source licensing data. No login, no key. |
📊 The SPDX License List is cited by GitHub, npm, Maven, PyPI, RubyGems, the Linux Foundation, and almost every SBOM standard (CycloneDX, SPDX SBOMs, SWID).
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ SPDX Software Licenses Scraper (this Actor) | $5 free credit, then pay-per-use | 729 + 72 | Live per run | OSI / FSF / deprecated / text | ⚡ 2 min |
| Hand-cloned SPDX repo | Free | Manual upkeep | Git pull required | None | 🐢 Hours |
| Commercial license scanners | $$$/seat | Bundled | Vendor cycle | Vendor-defined | ⏳ Days |
| Wikipedia license tables | Free | Partial, stale | Sporadic | None | 🕒 Variable |
Pick this Actor when you want the canonical SPDX dataset, downloadable as CSV / Excel / JSON / XML, with no pipeline maintenance.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the SPDX Software Licenses Scraper page on the Apify Store.
- 🎯 Set input. Pick a mode (licenses / exceptions / single), apply optional OSI / FSF filters, and set
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating SPDX Licenses Scraper
Control the scraper programmatically for scheduled refreshes and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly refreshes keep your internal license database in sync with the canonical SPDX list.
🌟 Beyond business use cases
Open-source licensing data powers more than commercial workflows. The same structured records support research, education, civic transparency, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick a mode (licenses, exceptions, or single), apply optional OSI / FSF / deprecation filters, and click Start. The Actor pulls the canonical SPDX list, optionally fetches per-license detail bodies, and emits one structured record per license.
📏 How accurate is the data?
This Actor mirrors the canonical SPDX License List exactly. SPDX IDs, names, OSI / FSF flags, and license text all come straight from the upstream catalog without modification.
🔁 How often is the dataset refreshed?
The SPDX legal team publishes new versions regularly. Every run of this Actor pulls the latest version, so your dataset always reflects the current listVersion.
⚖️ Does it include license-applicable exceptions?
Yes. Set the Mode filter to exceptions to pull all 72 license-applicable exceptions (Classpath, GCC Runtime, Autoconf, Bison, Font, and more).
📄 Can I get the full license text?
Yes. The Include full license text toggle is on by default. Each record includes both the standard license template (used for canonical matching) and the human-readable body text.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to refresh your internal SPDX cache on a weekly or monthly cron, so your compliance pipeline always reflects the current list version.
⚖️ Is this data legal to use?
SPDX publishes the License List under a permissive open license (CC0-1.0). You can use, redistribute, and embed the dataset in your own products without restriction.
💼 Can I use this data commercially?
Yes. The underlying SPDX License List is published under CC0-1.0, which permits commercial use. You are responsible for complying with the licenses you discover in your own dependency tree.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.
🔁 What happens if a run fails or gets interrupted?
Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
SPDX Software Licenses Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe SPDX records into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push a fresh license catalog into your CI license-check, or alert your compliance team in Slack on every new SPDX version.
🔗 Recommended Actors
- 📦 PyPI Package Scraper - Python package metadata for SBOM enrichment
- 📦 npm Package Scraper - JavaScript registry data and dependency trees
- 🐙 GitHub Repos Scraper - Repository metadata including license info
- 🔐 CVE Vulnerabilities Scraper - Security vulnerability database for risk scoring
- 🌐 OSV Vulnerability Scraper - Open-source vulnerability database
💡 Pro Tip: browse the complete ParseForge collection for more developer-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the SPDX project, the Linux Foundation, the OSI, or the FSF. All trademarks mentioned are the property of their respective owners. Only publicly available open license data is collected.