W3C Standards Catalog Scraper avatar

W3C Standards Catalog Scraper

Pricing

from $13.00 / 1,000 result items

Go to Apify Store
W3C Standards Catalog Scraper

W3C Standards Catalog Scraper

Scrape W3C standards catalog: title, status, type, date, editors, abstract, shortname, group, deliverer, errata, and specification URL. Covers Recommendations, Working Drafts, Notes, and Candidate Recommendations. Export web standards to JSON, CSV, or Excel for developer tooling.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 hours ago

Last modified

Share

ParseForge Banner

📐 W3C Standards Catalog Scraper

🚀 Export the full W3C Web standards catalog in seconds. Pull 1,696 specifications including HTML, CSS, ARIA, WebSocket, Web Components, and every other open Web standard with maturity status, deliverers, and full version history.

🕒 Last updated: 2026-05-23 · 📊 15 fields per record · 📚 1,696 specifications · 🏛️ All W3C working groups · 🔖 9 maturity levels

The W3C Standards Catalog Scraper exports the official W3C specifications corpus, returning 15 fields per record, including shortname, title, maturity status, description, latest version URL, first version URL, working-group deliverer shortnames, and full version history when requested. The dataset is the authoritative catalog of Web standards published by the World Wide Web Consortium since 1994.

The catalog covers 1,696 specifications across HTML, CSS, the DOM, Web APIs, ARIA accessibility standards, WebSocket, Web Components, payment APIs, internationalization, security, privacy, and dozens of other working groups. A second mode enumerates W3C working groups and community groups themselves, returning the org chart of the open Web.

🎯 Target Audience💡 Primary Use Cases
Web developers, browser engineers, standards researchers, accessibility auditors, technical writers, conformance teams, framework authorsConformance audits, "supported standards" dashboards, browser feature trackers, accessibility coverage, framework spec mapping, standards research

📋 What the W3C Standards Catalog Scraper does

Three workflows in a single run:

  • 📚 Full specifications catalog. Every W3C spec from Recommendation to Working Draft to Retired, with shortname, title, status, and links.
  • 🏛️ Working groups directory. Switch to mode: "groups" to enumerate the W3C organisational chart of working groups and community groups.
  • 🔖 Status and group filters. Narrow to one maturity level (Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded, Rescinded, Proposed Recommendation) or to one working-group shortname (css, webapps, html, aria).
  • 🗂️ Optional version history. Toggle includeVersions to pull the per-spec version list with one extra call per record.

Each record carries the canonical shortname, the human title, the maturity status, the editor's draft URL, the latest and first version URLs, the deliverers (working-group shortnames), and a stable API URL back to the W3C catalog.

💡 Why it matters: the Web is an open platform because standards are public, traceable, and versioned. Building a conformance, browser-tracker, or framework dashboard around them means parsing inconsistent HTML, scraping multiple pages, and stitching the org chart together by hand. This Actor gives you the structured catalog in one call.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to filter by working group and export the catalog as JSON.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"specifications""specifications" for standards, "groups" for working groups.
statusstring""One of 9 maturity levels. Empty = any.
groupShortnamestring""Filter to one group shortname (e.g. css, html, aria).
includeVersionsbooleanfalseWhen true, pulls per-spec version history. Adds ~1 extra call per record.

Example: every CSS Working Group specification with version history.

{
"maxItems": 200,
"mode": "specifications",
"groupShortname": "css",
"includeVersions": true
}

Example: all current Recommendations across W3C.

{
"maxItems": 500,
"mode": "specifications",
"status": "Recommendation"
}

⚠️ Good to Know: version history is fetched on demand. Pulling 1,000 specs with includeVersions: true doubles the call count and runtime. Leave it off for the catalog overview, turn it on for archival use cases.


📊 Output

Each record contains 15 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 shortnamestring | null"css-color-4"
📜 titlestring | null"CSS Color Module Level 4"
🔖 statusstring | null"Candidate Recommendation"
📝 descriptionstring | null"This module describes CSS color values..."
🗂️ seriesShortnamestring | null"css-color"
🔢 seriesVersionstring | null"4"
✏️ editorDraftUrlstring | null"https://drafts.csswg.org/css-color/"
🔗 shortlinkstring | null"https://www.w3.org/TR/css-color-4/"
🆕 latestVersionUrlstring | null"https://www.w3.org/TR/2024/CR-css-color-4-20240314/"
🥇 firstVersionUrlstring | null"https://api.w3.org/specifications/css-color-4/versions/1"
🏛️ groupShortnamesstring[] | null["css"]
📚 versionsCountnumber | null12
📑 versionHistorystring[] | nullarray of version URLs
🔌 apiUrlstring"https://api.w3.org/specifications/css-color-4"
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📚Full catalog. 1,696 specifications across every W3C working group.
🔖Maturity filters. Slice by Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded, Rescinded, Proposed Recommendation.
🏛️Two modes. Specifications or working groups. Run both to map the open Web's org chart.
📑Version history. Optional per-spec version trail so you can build archival dashboards.
🔌Stable identifiers. Shortname plus apiUrl gives you durable joins back to the W3C source.
Fast. 10 specifications in under 15 seconds.
🚫No authentication. Public W3C API. No login or token needed.

📊 The open Web runs on these specs. A clean, queryable copy of the catalog is the foundation of every conformance tracker, browser feature dashboard, and accessibility audit.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ W3C Standards Catalog Scraper (this Actor)$5 free credit, then pay-per-use1,696 specsLive per runstatus, group, mode, versions⚡ 2 min
W3C TR/ index by handFreeAll publishedManualNone🐢 Days to parse
MDN BCD dataFreeBrowser-feature focusedQuarterlySome⏳ Different shape
Static caniuse exportFreeBrowser-support focusedPeriodicSome🕒 Different shape

Pick this Actor when you need a structured catalog of W3C specifications themselves, not browser support data.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the W3C Standards Catalog Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a mode (specifications or groups), optionally filter by status or group, and set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to a downloaded catalog: 3-5 minutes. No coding required.


💼 Business use cases

🧭 Browser & Framework Engineering

  • Track which specs your engine implements
  • Compare your framework's coverage against the corpus
  • Spec adoption dashboards by maturity level
  • Detect new Working Drafts as they appear

♿ Accessibility & Conformance

  • Audit ARIA spec coverage in your component library
  • WCAG and accessibility tooling source-of-truth refresh
  • Conformance dashboards for procurement teams
  • Internal "supported standards" pages

📚 Standards Research

  • Trace spec lineage with version history
  • Working-group org charts for academic citations
  • Standards-adoption timelines by group
  • Cross-reference deliverers and specs

📰 Technical Writing & DevRel

  • Auto-update docs links to latest spec versions
  • Generate "see also" links across related specs
  • Build internal style guides anchored to specs
  • Newsletter content on standards updates

🔌 Automating W3C Standards Catalog Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly catalog refreshes are common for browser-feature trackers and accessibility tooling.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Web-standards lineage studies for HCI and CS papers
  • Standards-process research with reproducible pulls
  • Coursework on open Web architecture
  • Open-source contribution dashboards

🎨 Personal and creative

  • Personal "Web platform features I love" sites
  • Reference cards and pocket guides for indie devs
  • Build a personal spec-tracker dashboard
  • Hobby projects mapping the open Web

🤝 Non-profit and civic

  • Accessibility advocacy with conformance reports
  • Open-Web preservation projects
  • Standards transparency dashboards
  • Educational outreach about Web governance

🧪 Experimentation

  • Train spec-classification models
  • Prototype agent pipelines that read W3C docs
  • Test "what changed since last quarter" workflows
  • Build embeddable spec lookup widgets

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a mode, optionally set a status or group filter, and click Start. The Actor walks the W3C catalog page by page and emits a clean structured record per specification or per working group.

📚 Is the dataset complete?

The W3C catalog reports 1,696 specifications at the time of writing. The Actor pages through the entire catalog when no filters are set and maxItems is high enough.

🔖 What maturity levels are supported?

Recommendation, Proposed Recommendation, Candidate Recommendation, Working Draft, Group Note, Retired, Superseded Recommendation, and Rescinded Recommendation. Filter to one or leave the field empty for the full catalog.

🏛️ Can I get only one working group's specs?

Yes. Set groupShortname to the group's shortname (for example css, html, aria, webapps). The Actor resolves the deliverers for each spec and filters server-side.

📑 Should I enable version history?

Only when you need it. Each version pull adds one extra call per record. For a catalog overview, leave it off. For an archival dataset, turn it on.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to refresh the catalog weekly or monthly into a downstream dashboard.

Yes. The W3C catalog is published under terms that permit reuse. The specs themselves are open standards, freely available for reading and implementation.

💼 Can I use this commercially?

Yes. The Actor returns metadata about open Web standards. Commercial conformance dashboards, browser-feature trackers, and accessibility tooling are all valid use cases.

💳 Do I need a paid Apify plan?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger catalog pulls.

🔁 What happens if a run fails partway through?

Apify retries transient errors automatically. Records already pushed to the dataset are preserved, so a re-run picks up cleanly with the same input.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

W3C Standards Catalog Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe spec data into your warehouse
  • GitHub - Trigger runs from repo commits
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to fire downstream actions when a run finishes. Push a fresh standards catalog into your conformance dashboard, or alert your team in Slack when a new Working Draft drops.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the W3C or its member organisations. All trademarks mentioned are the property of their respective owners. Only publicly available W3C catalog data is collected.