Structured Data Extractor
Pricing
Pay per event
Structured Data Extractor
This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (`<script type="application/ld+json">`), Microdata (`itemscope`/`itemprop`), and RDFa (`typeof`/`property`). For each page, it returns the full structured data objects, detected Schema.org...
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract JSON-LD, Microdata, and RDFa structured data from web pages for SEO auditing and Schema.org validation.
What does Structured Data Extractor do?
This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (<script type="application/ld+json">), Microdata (itemscope/itemprop), and RDFa (typeof/property). For each page, it returns the full structured data objects, detected Schema.org types, and format counts. Use it to audit rich snippet eligibility, verify Schema.org implementation, or monitor structured data across your entire site.
Use cases
- SEO specialists -- verify Schema.org markup implementation across hundreds of pages in a single run
- Rich snippet auditors -- check that pages have the right structured data types for Google rich results (Product, Article, FAQ, etc.)
- Competitive analysts -- see what structured data competitors use and identify markup opportunities you are missing
- Migration testers -- ensure structured data survives CMS, domain, or URL migrations without data loss
- Content monitoring teams -- track structured data changes across pages over time to catch regressions
Why use Structured Data Extractor?
- All three formats -- extracts JSON-LD, Microdata, and RDFa in a single pass, so you never miss markup regardless of implementation
- Full data objects -- returns the complete structured data payload, not just type names, so you can inspect every property
- Batch processing -- analyze hundreds of URLs at once instead of checking pages one at a time in Google's testing tool
- Structured JSON output -- each result includes format counts, detected Schema.org types, and boolean flags for easy filtering
- API and integration ready -- trigger runs programmatically or connect to dashboards via Google Sheets, Zapier, and more
- Pay-per-event pricing -- only pay for pages you actually analyze, starting at $0.001 per URL
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string[] | Yes | -- | List of web page URLs to extract structured data from |
Example input
{"urls": ["https://www.google.com","https://en.wikipedia.org/wiki/Web_scraping","https://www.imdb.com/title/tt0111161/"]}
Output example
{"url": "https://en.wikipedia.org/wiki/Web_scraping","title": "Web scraping - Wikipedia","structuredDataCount": 2,"jsonLdCount": 1,"microdataCount": 1,"rdfaCount": 0,"schemaTypes": ["Article", "BreadcrumbList"],"structuredData": [{"type": "Article","format": "json-ld","data": { "@type": "Article", "name": "Web scraping", "headline": "Web scraping" }}],"hasJsonLd": true,"hasMicrodata": true,"hasRdfa": false,"error": null,"extractedAt": "2026-03-01T12:00:00.000Z"}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | The analyzed page URL |
title | string | The page title |
structuredDataCount | number | Total number of structured data items found |
jsonLdCount | number | Number of JSON-LD blocks found |
microdataCount | number | Number of Microdata items found |
rdfaCount | number | Number of RDFa items found |
schemaTypes | string[] | List of detected Schema.org types |
structuredData | array | Full structured data objects with type, format, and data |
hasJsonLd | boolean | Whether the page contains any JSON-LD |
hasMicrodata | boolean | Whether the page contains any Microdata |
hasRdfa | boolean | Whether the page contains any RDFa |
error | string | Error message if extraction failed, null otherwise |
extractedAt | string | ISO timestamp of the extraction |
How much does it cost?
Structured Data Extractor uses Apify's pay-per-event pricing model. You only pay for what you use.
| Event | Price | Description |
|---|---|---|
| Start | $0.035 | One-time per run |
| URL extracted | $0.001 | Per page extracted |
Example costs:
- 10 pages: $0.035 + 10 x $0.001 = $0.045
- 100 pages: $0.035 + 100 x $0.001 = $0.135
- 1,000 pages: $0.035 + 1,000 x $0.001 = $1.035
Using the Apify API
You can start Structured Data Extractor programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('automation-lab/structured-data-extractor').call({urls: ['https://en.wikipedia.org/wiki/Web_scraping'],});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('automation-lab/structured-data-extractor').call(run_input={'urls': ['https://en.wikipedia.org/wiki/Web_scraping'],})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
Integrations
Structured Data Extractor works with all major automation platforms available on Apify. Export results to Google Sheets to build a structured data audit dashboard across your site. Use Zapier or Make to trigger extraction runs whenever new pages are published. Send alerts to Slack when pages are missing expected Schema.org types. Pipe results into n8n workflows for custom validation logic, or set up webhooks to trigger downstream actions as soon as a run finishes. Chain it with JSON-LD Validator to first extract and then validate your structured data.
Tips and best practices
- Focus on pages eligible for rich results -- prioritize product pages, articles, FAQ pages, and recipe pages where structured data directly impacts search appearance
- Filter by
schemaTypesto quickly find pages missing specific types like Product, Article, or BreadcrumbList - Use
structuredDataCount: 0to find pages with no markup -- these are your biggest opportunities for SEO improvement - Combine with JSON-LD Validator to first extract structured data with this actor, then validate the JSON-LD blocks for errors and warnings
- Schedule regular runs to catch structured data regressions after site deployments or CMS updates
FAQ
What structured data formats does this actor support? It extracts all three major formats: JSON-LD (script tags), Microdata (itemscope/itemprop attributes), and RDFa (typeof/property attributes).
Does it validate the structured data? No. This actor extracts and reports what structured data exists on a page. For validation of JSON-LD syntax and required fields, use the JSON-LD Validator actor.
Can it extract structured data from JavaScript-rendered pages? No. The actor uses plain HTTP requests and parses the initial HTML response. Structured data that is injected by client-side JavaScript after page load will not be captured.