Structured Data Extractor avatar

Structured Data Extractor

Pricing

Pay per event

Go to Apify Store
Structured Data Extractor

Structured Data Extractor

This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (`<script type="application/ld+json">`), Microdata (`itemscope`/`itemprop`), and RDFa (`typeof`/`property`). For each page, it returns the full structured data objects, detected Schema.org...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Extract JSON-LD, Microdata, and RDFa structured data from web pages for SEO auditing and Schema.org validation.

What does Structured Data Extractor do?

This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (<script type="application/ld+json">), Microdata (itemscope/itemprop), and RDFa (typeof/property). For each page, it returns the full structured data objects, detected Schema.org types, and format counts. Use it to audit rich snippet eligibility, verify Schema.org implementation, or monitor structured data across your entire site.

Use cases

  • SEO specialists -- verify Schema.org markup implementation across hundreds of pages in a single run
  • Rich snippet auditors -- check that pages have the right structured data types for Google rich results (Product, Article, FAQ, etc.)
  • Competitive analysts -- see what structured data competitors use and identify markup opportunities you are missing
  • Migration testers -- ensure structured data survives CMS, domain, or URL migrations without data loss
  • Content monitoring teams -- track structured data changes across pages over time to catch regressions

Why use Structured Data Extractor?

  • All three formats -- extracts JSON-LD, Microdata, and RDFa in a single pass, so you never miss markup regardless of implementation
  • Full data objects -- returns the complete structured data payload, not just type names, so you can inspect every property
  • Batch processing -- analyze hundreds of URLs at once instead of checking pages one at a time in Google's testing tool
  • Structured JSON output -- each result includes format counts, detected Schema.org types, and boolean flags for easy filtering
  • API and integration ready -- trigger runs programmatically or connect to dashboards via Google Sheets, Zapier, and more
  • Pay-per-event pricing -- only pay for pages you actually analyze, starting at $0.001 per URL

Input parameters

ParameterTypeRequiredDefaultDescription
urlsstring[]Yes--List of web page URLs to extract structured data from

Example input

{
"urls": [
"https://www.google.com",
"https://en.wikipedia.org/wiki/Web_scraping",
"https://www.imdb.com/title/tt0111161/"
]
}

Output example

{
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"title": "Web scraping - Wikipedia",
"structuredDataCount": 2,
"jsonLdCount": 1,
"microdataCount": 1,
"rdfaCount": 0,
"schemaTypes": ["Article", "BreadcrumbList"],
"structuredData": [
{
"type": "Article",
"format": "json-ld",
"data": { "@type": "Article", "name": "Web scraping", "headline": "Web scraping" }
}
],
"hasJsonLd": true,
"hasMicrodata": true,
"hasRdfa": false,
"error": null,
"extractedAt": "2026-03-01T12:00:00.000Z"
}

Output fields

FieldTypeDescription
urlstringThe analyzed page URL
titlestringThe page title
structuredDataCountnumberTotal number of structured data items found
jsonLdCountnumberNumber of JSON-LD blocks found
microdataCountnumberNumber of Microdata items found
rdfaCountnumberNumber of RDFa items found
schemaTypesstring[]List of detected Schema.org types
structuredDataarrayFull structured data objects with type, format, and data
hasJsonLdbooleanWhether the page contains any JSON-LD
hasMicrodatabooleanWhether the page contains any Microdata
hasRdfabooleanWhether the page contains any RDFa
errorstringError message if extraction failed, null otherwise
extractedAtstringISO timestamp of the extraction

How much does it cost?

Structured Data Extractor uses Apify's pay-per-event pricing model. You only pay for what you use.

EventPriceDescription
Start$0.035One-time per run
URL extracted$0.001Per page extracted

Example costs:

  • 10 pages: $0.035 + 10 x $0.001 = $0.045
  • 100 pages: $0.035 + 100 x $0.001 = $0.135
  • 1,000 pages: $0.035 + 1,000 x $0.001 = $1.035

Using the Apify API

You can start Structured Data Extractor programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/structured-data-extractor').call({
urls: ['https://en.wikipedia.org/wiki/Web_scraping'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/structured-data-extractor').call(run_input={
'urls': ['https://en.wikipedia.org/wiki/Web_scraping'],
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

Integrations

Structured Data Extractor works with all major automation platforms available on Apify. Export results to Google Sheets to build a structured data audit dashboard across your site. Use Zapier or Make to trigger extraction runs whenever new pages are published. Send alerts to Slack when pages are missing expected Schema.org types. Pipe results into n8n workflows for custom validation logic, or set up webhooks to trigger downstream actions as soon as a run finishes. Chain it with JSON-LD Validator to first extract and then validate your structured data.

Tips and best practices

  • Focus on pages eligible for rich results -- prioritize product pages, articles, FAQ pages, and recipe pages where structured data directly impacts search appearance
  • Filter by schemaTypes to quickly find pages missing specific types like Product, Article, or BreadcrumbList
  • Use structuredDataCount: 0 to find pages with no markup -- these are your biggest opportunities for SEO improvement
  • Combine with JSON-LD Validator to first extract structured data with this actor, then validate the JSON-LD blocks for errors and warnings
  • Schedule regular runs to catch structured data regressions after site deployments or CMS updates

FAQ

What structured data formats does this actor support? It extracts all three major formats: JSON-LD (script tags), Microdata (itemscope/itemprop attributes), and RDFa (typeof/property attributes).

Does it validate the structured data? No. This actor extracts and reports what structured data exists on a page. For validation of JSON-LD syntax and required fields, use the JSON-LD Validator actor.

Can it extract structured data from JavaScript-rendered pages? No. The actor uses plain HTTP requests and parses the initial HTML response. Structured data that is injected by client-side JavaScript after page load will not be captured.