Schema Markup Validator avatar

Schema Markup Validator

Pricing

$0.05 / 1,000 validated pages

Go to Apify Store
Schema Markup Validator

Schema Markup Validator

Validate schema markup on public pages. Extract JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, meta tags, schema.org types, issue counts, and rich-result readiness signals.

Pricing

$0.05 / 1,000 validated pages

Rating

0.0

(0)

Developer

Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🔎 Schema markup validator for structured data

Schema Markup Validator checks public web pages for structured data and returns one clean page audit per successful URL. Add pages such as schema.org Article, choose whether to audit only submitted URLs or follow same-site links, and get JSON-LD, Microdata, RDFa, schema.org types, Open Graph, Twitter Cards, meta tags, validation issues, and rich-result readiness signals in the dataset.

Use this structured data validator when you need to debug rich-result markup, compare pages during an SEO audit, check JSON-LD syntax, or collect schema evidence before a release. The Actor runs on public pages and does not need source cookies, website credentials, source API keys, or a separate account from you.

✅ What this Actor does

  • Accepts public page URLs in a batch.
  • Can check only the submitted URLs or follow same-site links for a small site audit.
  • Extracts JSON-LD blocks and checks JSON syntax, schema.org context, and schema types.
  • Extracts Microdata and RDFa items when the audit focus includes schema.org markup.
  • Extracts Open Graph, Twitter Card, canonical, and core meta tag data in the full audit.
  • Reports detected schema.org types, structured-data counts, issue counts, and issue details.
  • Adds transparent rich-result readiness signals for common types such as Article, Product, Recipe, FAQPage, HowTo, Event, Organization, and LocalBusiness.
  • Saves one dataset row per successfully fetched page audit.

This Actor is focused on schema markup validation and structured-data extraction. It is not a Lighthouse audit, page-speed checker, broken-link crawler, sitemap indexability audit, or full technical SEO scanner.

📊 Data you get

Each dataset row is one successful page audit. Rows can include:

  • url, title, canonicalUrl, statusCode, contentType, and crawlDepth
  • schemaTypes found across JSON-LD, Microdata, and RDFa
  • markupSummary with counts for JSON-LD blocks, Microdata items, RDFa items, Open Graph properties, Twitter Card properties, and meta tags
  • validationStatus and issueCounts for quick filtering
  • issues with severity, code, message, format, schema type, property, and evidence
  • jsonLd parsed blocks with validity, context, detected types, source data, and block-level issues
  • microdata and rdfa extracted items with types, properties, and item issues
  • metadata with Open Graph, Twitter Card, and meta tag values
  • richResults with readiness level, reasons, candidate types, missing fields, and issue codes

You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or read the same rows through the Apify API, schedules, webhooks, and integrations.

🚀 How to run it

  1. Add one or more public page URLs in Page URLs.
  2. Keep Audit focus on Full structured-data audit for the broadest output.
  3. Choose Submitted URLs only for exact page checks.
  4. Choose Follow same-site links when you want a bounded same-site schema audit.
  5. Set Maximum pages to control output size and cost.
  6. Run the Actor and open the dataset.

For a quick first run, keep the prefilled schema.org URLs. They are public pages with structured data, so you can inspect the output shape quickly before adding your own website.

⚙️ Input example

{
"startUrls": [
{ "url": "https://schema.org/Article" },
{ "url": "https://schema.org/Product" }
],
"auditFocus": "full",
"crawlScope": "submittedUrls",
"maxPages": 25
}

🎯 Audit focus

Use full when you want JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, canonical, and meta tags. Use schemaOrg when you only want schema.org markup surfaces. Use jsonLd for a focused JSON-LD validator run.

🧭 Crawl scope

Use submittedUrls when your input list already contains every page you want to check. Use sameSite when one submitted page should discover more pages on the same website. maxPages caps the total successful page audits saved by the run.

📦 Output example

{
"url": "https://schema.org/Article",
"title": "Article - Schema.org Type",
"canonicalUrl": "https://schema.org/Article",
"statusCode": 200,
"contentType": "text/html",
"crawlDepth": 0,
"schemaTypes": ["Article", "WebPage"],
"markupSummary": {
"hasStructuredData": true,
"jsonLdBlocks": 1,
"microdataItems": 0,
"rdfaItems": 0,
"openGraphProperties": 0,
"twitterCardProperties": 0,
"metaTags": 2
},
"validationStatus": "warning",
"issueCounts": {
"errors": 0,
"warnings": 2,
"info": 0
},
"issues": [
{
"severity": "warning",
"code": "missing-recommended-property",
"message": "Article is missing the recommended image property.",
"format": null,
"schemaType": "Article",
"property": "image",
"evidence": null
}
],
"jsonLd": [
{
"index": 1,
"valid": true,
"context": "https://schema.org",
"types": ["Article"],
"data": {
"@context": "https://schema.org",
"@type": "Article",
"headline": "Schema.org Article"
},
"issues": []
}
],
"microdata": [],
"rdfa": [],
"metadata": {
"openGraph": {},
"twitterCard": {},
"metaTags": {
"description": "Schema.org page description"
}
},
"richResults": {
"readiness": {
"level": "needsFixes",
"reasons": ["Article is missing recommended image."]
},
"candidates": [
{
"type": "Article",
"eligible": true,
"requiredMissing": [],
"recommendedMissing": ["image"],
"issueCodes": ["missing-recommended-property"]
}
]
}
}

💳 Pricing

This Actor uses pay-per-event pricing. You are charged for each successful page audit saved to the dataset with the page-validated event. Pages that cannot be fetched or audited are logged as handled non-results and are not saved as dataset rows.

⚠️ Limits and caveats

  • Pages must be public and reachable over http or https.
  • The Actor checks markup present in fetched HTML. Markup that only appears after private login flows or unsupported client-side states may not be visible.
  • Rich-result readiness is a deterministic markup check, not a Google Search Console verdict and not an AI-generated recommendation.
  • Same-site crawling is bounded by maxPages and follows same-origin links from submitted pages.

❓ FAQ

🧪 Can this replace Google's Rich Results Test?

Use it for batch audits, exports, API workflows, and structured-data evidence. Treat Google's tools as the final authority for Google-specific display eligibility.

🧩 Does it validate only JSON-LD?

No. The full audit extracts JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, canonical, and meta tags. Choose jsonLd when you want a focused JSON-LD validator run.

🌐 Can I audit a whole website?

You can start from one or more pages and choose same-site crawling with a page limit. For very large websites, use smaller batches or curated URL lists so each run stays easy to review.

📝 Changelog

  • 0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Made with ❤️ by Maxime Dupré