# Schema Markup Validator (`maximedupre/schema-markup-validator`) Actor

Validate schema markup on public pages. Extract JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, meta tags, schema.org types, issue counts, and rich-result readiness signals.

- **URL**: https://apify.com/maximedupre/schema-markup-validator.md
- **Developed by:** [Maxime Dupré](https://apify.com/maximedupre) (community)
- **Categories:** Developer tools, Marketing
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$0.05 / 1,000 validated pages

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### 🔎 Schema markup validator for structured data

Schema Markup Validator checks public web pages for structured data and returns one clean page audit per successful URL. Add pages such as [schema.org Article](https://schema.org/Article), choose whether to audit only submitted URLs or follow same-site links, and get JSON-LD, Microdata, RDFa, schema.org types, Open Graph, Twitter Cards, meta tags, validation issues, and rich-result readiness signals in the dataset.

Use this structured data validator when you need to debug rich-result markup, compare pages during an SEO audit, check JSON-LD syntax, or collect schema evidence before a release. The Actor runs on public pages and does not need source cookies, website credentials, source API keys, or a separate account from you.

### ✅ What this Actor does

- Accepts public page URLs in a batch.
- Can check only the submitted URLs or follow same-site links for a small site audit.
- Extracts JSON-LD blocks and checks JSON syntax, schema.org context, and schema types.
- Extracts Microdata and RDFa items when the audit focus includes schema.org markup.
- Extracts Open Graph, Twitter Card, canonical, and core meta tag data in the full audit.
- Reports detected schema.org types, structured-data counts, issue counts, and issue details.
- Adds transparent rich-result readiness signals for common types such as Article, Product, Recipe, FAQPage, HowTo, Event, Organization, and LocalBusiness.
- Saves one dataset row per successfully fetched page audit.

This Actor is focused on schema markup validation and structured-data extraction. It is not a Lighthouse audit, page-speed checker, broken-link crawler, sitemap indexability audit, or full technical SEO scanner.

### 📊 Data you get

Each dataset row is one successful page audit. Rows can include:

- `url`, `title`, `canonicalUrl`, `statusCode`, `contentType`, and `crawlDepth`
- `schemaTypes` found across JSON-LD, Microdata, and RDFa
- `markupSummary` with counts for JSON-LD blocks, Microdata items, RDFa items, Open Graph properties, Twitter Card properties, and meta tags
- `validationStatus` and `issueCounts` for quick filtering
- `issues` with severity, code, message, format, schema type, property, and evidence
- `jsonLd` parsed blocks with validity, context, detected types, source data, and block-level issues
- `microdata` and `rdfa` extracted items with types, properties, and item issues
- `metadata` with Open Graph, Twitter Card, and meta tag values
- `richResults` with readiness level, reasons, candidate types, missing fields, and issue codes

You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or read the same rows through the Apify API, schedules, webhooks, and integrations.

### 🚀 How to run it

1. Add one or more public page URLs in **Page URLs**.
2. Keep **Audit focus** on **Full structured-data audit** for the broadest output.
3. Choose **Submitted URLs only** for exact page checks.
4. Choose **Follow same-site links** when you want a bounded same-site schema audit.
5. Set **Maximum pages** to control output size and cost.
6. Run the Actor and open the dataset.

For a quick first run, keep the prefilled schema.org URLs. They are public pages with structured data, so you can inspect the output shape quickly before adding your own website.

### ⚙️ Input example

```json
{
	"startUrls": [
		{ "url": "https://schema.org/Article" },
		{ "url": "https://schema.org/Product" }
	],
	"auditFocus": "full",
	"crawlScope": "submittedUrls",
	"maxPages": 25
}
````

#### 🎯 Audit focus

Use `full` when you want JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, canonical, and meta tags. Use `schemaOrg` when you only want schema.org markup surfaces. Use `jsonLd` for a focused JSON-LD validator run.

#### 🧭 Crawl scope

Use `submittedUrls` when your input list already contains every page you want to check. Use `sameSite` when one submitted page should discover more pages on the same website. `maxPages` caps the total successful page audits saved by the run.

### 📦 Output example

```json
{
	"url": "https://schema.org/Article",
	"title": "Article - Schema.org Type",
	"canonicalUrl": "https://schema.org/Article",
	"statusCode": 200,
	"contentType": "text/html",
	"crawlDepth": 0,
	"schemaTypes": ["Article", "WebPage"],
	"markupSummary": {
		"hasStructuredData": true,
		"jsonLdBlocks": 1,
		"microdataItems": 0,
		"rdfaItems": 0,
		"openGraphProperties": 0,
		"twitterCardProperties": 0,
		"metaTags": 2
	},
	"validationStatus": "warning",
	"issueCounts": {
		"errors": 0,
		"warnings": 2,
		"info": 0
	},
	"issues": [
		{
			"severity": "warning",
			"code": "missing-recommended-property",
			"message": "Article is missing the recommended image property.",
			"format": null,
			"schemaType": "Article",
			"property": "image",
			"evidence": null
		}
	],
	"jsonLd": [
		{
			"index": 1,
			"valid": true,
			"context": "https://schema.org",
			"types": ["Article"],
			"data": {
				"@context": "https://schema.org",
				"@type": "Article",
				"headline": "Schema.org Article"
			},
			"issues": []
		}
	],
	"microdata": [],
	"rdfa": [],
	"metadata": {
		"openGraph": {},
		"twitterCard": {},
		"metaTags": {
			"description": "Schema.org page description"
		}
	},
	"richResults": {
		"readiness": {
			"level": "needsFixes",
			"reasons": ["Article is missing recommended image."]
		},
		"candidates": [
			{
				"type": "Article",
				"eligible": true,
				"requiredMissing": [],
				"recommendedMissing": ["image"],
				"issueCodes": ["missing-recommended-property"]
			}
		]
	}
}
```

### 💳 Pricing

This Actor uses pay-per-event pricing. You are charged for each successful page audit saved to the dataset with the `page-validated` event. Pages that cannot be fetched or audited are logged as handled non-results and are not saved as dataset rows.

### ⚠️ Limits and caveats

- Pages must be public and reachable over `http` or `https`.
- The Actor checks markup present in fetched HTML. Markup that only appears after private login flows or unsupported client-side states may not be visible.
- Rich-result readiness is a deterministic markup check, not a Google Search Console verdict and not an AI-generated recommendation.
- Same-site crawling is bounded by `maxPages` and follows same-origin links from submitted pages.

### ❓ FAQ

#### 🧪 Can this replace Google's Rich Results Test?

Use it for batch audits, exports, API workflows, and structured-data evidence. Treat Google's tools as the final authority for Google-specific display eligibility.

#### 🧩 Does it validate only JSON-LD?

No. The full audit extracts JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, canonical, and meta tags. Choose `jsonLd` when you want a focused JSON-LD validator run.

#### 🌐 Can I audit a whole website?

You can start from one or more pages and choose same-site crawling with a page limit. For very large websites, use smaller batches or curated URL lists so each run stays easy to review.

### 📝 Changelog

- 0.1: Initial release.

### 🆘 Support

For issues, questions, or feature requests, [file a ticket](https://console.apify.com/actors/maximedupre~schema-markup-validator/issues) and I'll fix or implement it in less than 24h 🫡

### 🔗 Other actors

- [Sitemap Sniffer ↗](https://apify.com/maximedupre/sitemap-sniffer) - Find sitemap files and sitemap URL inventories before SEO audits.
- [Website URL Crawler ↗](https://apify.com/maximedupre/website-url-crawler) - Build URL inventories from public websites, rendered links, and sitemaps.
- [Font Detector ↗](https://apify.com/maximedupre/font-detector) - Audit website fonts, font files, and typography evidence from public pages.
- [Ahrefs Free Website Stats Scraper ↗](https://apify.com/maximedupre/ahrefs-free-website-stats-scraper) - Collect public Ahrefs website metrics for domain research.
- [SEMrush Free Website Stats Scraper ↗](https://apify.com/maximedupre/semrush-free-website-stats-scraper) - Collect public SEMrush overview metrics for SEO research.

**Made with ❤️ by Maxime Dupré**

# Actor input Schema

## `startUrls` (type: `array`):

Add public pages to audit for structured data and rich-result signals.

## `auditFocus` (type: `string`):

Choose how much structured data to include in each page audit.

## `crawlScope` (type: `string`):

Choose whether the run checks only the URLs you add or also follows links on the same site.

## `maxPages` (type: `integer`):

Caps the total pages checked across submitted and discovered pages.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://schema.org/Article"
    },
    {
      "url": "https://schema.org/Product"
    }
  ],
  "auditFocus": "full",
  "crawlScope": "sameSite",
  "maxPages": 512
}
```

# Actor output Schema

## `results` (type: `string`):

Open one dataset row per successfully validated public page.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://schema.org/Article"
        },
        {
            "url": "https://schema.org/Product"
        }
    ],
    "auditFocus": "full",
    "crawlScope": "sameSite",
    "maxPages": 512
};

// Run the Actor and wait for it to finish
const run = await client.actor("maximedupre/schema-markup-validator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [
        { "url": "https://schema.org/Article" },
        { "url": "https://schema.org/Product" },
    ],
    "auditFocus": "full",
    "crawlScope": "sameSite",
    "maxPages": 512,
}

# Run the Actor and wait for it to finish
run = client.actor("maximedupre/schema-markup-validator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://schema.org/Article"
    },
    {
      "url": "https://schema.org/Product"
    }
  ],
  "auditFocus": "full",
  "crawlScope": "sameSite",
  "maxPages": 512
}' |
apify call maximedupre/schema-markup-validator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=maximedupre/schema-markup-validator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Schema Markup Validator",
        "description": "Validate schema markup on public pages. Extract JSON-LD, Microdata, RDFa, Open Graph, Twitter Cards, meta tags, schema.org types, issue counts, and rich-result readiness signals.",
        "version": "0.1",
        "x-build-id": "4qhbXJ03863AzG7KQ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/maximedupre~schema-markup-validator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-maximedupre-schema-markup-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/maximedupre~schema-markup-validator/runs": {
            "post": {
                "operationId": "runs-sync-maximedupre-schema-markup-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/maximedupre~schema-markup-validator/run-sync": {
            "post": {
                "operationId": "run-sync-maximedupre-schema-markup-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Page URLs",
                        "type": "array",
                        "description": "Add public pages to audit for structured data and rich-result signals.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "auditFocus": {
                        "title": "Audit focus",
                        "enum": [
                            "full",
                            "schemaOrg",
                            "jsonLd"
                        ],
                        "type": "string",
                        "description": "Choose how much structured data to include in each page audit.",
                        "default": "full"
                    },
                    "crawlScope": {
                        "title": "Pages to check",
                        "enum": [
                            "submittedUrls",
                            "sameSite"
                        ],
                        "type": "string",
                        "description": "Choose whether the run checks only the URLs you add or also follows links on the same site.",
                        "default": "sameSite"
                    },
                    "maxPages": {
                        "title": "Maximum pages",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Caps the total pages checked across submitted and discovered pages.",
                        "default": 512
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
