Pricing

Pay per event

Structured Data Extractor

This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (`<script type="application/ld+json">`), Microdata (`itemscope`/`itemprop`), and RDFa (`typeof`/`property`). For each page, it returns the full structured data objects, detected Schema.org...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

14 days ago

Last modified

What does Structured Data Extractor do?

This actor extracts structured data markup from web pages. It parses all three major formats: JSON-LD (<script type="application/ld+json">), Microdata (itemscope/itemprop), and RDFa (typeof/property). For each page, it returns the full structured data objects, detected Schema.org types, and format counts. Use it to audit rich snippet eligibility, verify Schema.org implementation, or monitor structured data across your entire site.

Use cases

SEO specialists -- verify Schema.org markup implementation across hundreds of pages in a single run
Rich snippet auditors -- check that pages have the right structured data types for Google rich results (Product, Article, FAQ, etc.)
Competitive analysts -- see what structured data competitors use and identify markup opportunities you are missing
Migration testers -- ensure structured data survives CMS, domain, or URL migrations without data loss
Content monitoring teams -- track structured data changes across pages over time to catch regressions
AI/ML engineers -- extract structured Schema.org data to build knowledge graphs, enrich RAG pipelines, or create training datasets with clean entity relationships

Why use Structured Data Extractor?

All three formats -- extracts JSON-LD, Microdata, and RDFa in a single pass, so you never miss markup regardless of implementation
Full data objects -- returns the complete structured data payload, not just type names, so you can inspect every property
Batch processing -- analyze hundreds of URLs at once instead of checking pages one at a time in Google's testing tool
AI-ready structured output -- each result includes format counts, detected Schema.org types, and boolean flags, ready for LLM training data or knowledge graph construction
API and integration ready -- trigger runs programmatically or connect to dashboards via Google Sheets, Zapier, and more
Pay-per-event pricing -- only pay for pages you actually analyze, starting at $0.001 per URL

Input parameters

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	--	List of web page URLs to extract structured data from

Example input

{
    "urls": [
        "https://www.google.com",
        "https://en.wikipedia.org/wiki/Web_scraping",
        "https://www.imdb.com/title/tt0111161/"
    ]
}

Output example

{
    "url": "https://en.wikipedia.org/wiki/Web_scraping",
    "title": "Web scraping - Wikipedia",
    "structuredDataCount": 2,
    "jsonLdCount": 1,
    "microdataCount": 1,
    "rdfaCount": 0,
    "schemaTypes": ["Article", "BreadcrumbList"],
    "structuredData": [
        {
            "type": "Article",
            "format": "json-ld",
            "data": { "@type": "Article", "name": "Web scraping", "headline": "Web scraping" }
        }
    ],
    "hasJsonLd": true,
    "hasMicrodata": true,
    "hasRdfa": false,
    "error": null,
    "extractedAt": "2026-03-01T12:00:00.000Z"
}

Output fields

Field	Type	Description
`url`	string	The analyzed page URL
`title`	string	The page title
`structuredDataCount`	number	Total number of structured data items found
`jsonLdCount`	number	Number of JSON-LD blocks found
`microdataCount`	number	Number of Microdata items found
`rdfaCount`	number	Number of RDFa items found
`schemaTypes`	string[]	List of detected Schema.org types
`structuredData`	array	Full structured data objects with type, format, and data
`hasJsonLd`	boolean	Whether the page contains any JSON-LD
`hasMicrodata`	boolean	Whether the page contains any Microdata
`hasRdfa`	boolean	Whether the page contains any RDFa
`error`	string	Error message if extraction failed, null otherwise
`extractedAt`	string	ISO timestamp of the extraction

How to extract structured data from web pages

Go to Structured Data Extractor on Apify Store
Enter one or more URLs in the urls field
Click Start to run the extractor
Wait for results -- each page is analyzed in seconds
Review the output for JSON-LD, Microdata, and RDFa structured data found on each page
Download results as JSON, CSV, or Excel, or connect via API

How much does it cost to extract structured data?

Structured Data Extractor uses Apify's pay-per-event pricing model. You only pay for what you use.

Event	Price	Description
Start	$0.035	One-time per run
URL extracted	$0.001	Per page extracted

Example costs:

10 pages: $0.035 + 10 x $0.001 = $0.045
100 pages: $0.035 + 100 x $0.001 = $0.135
1,000 pages: $0.035 + 1,000 x $0.001 = $1.035

Using the Apify API

You can start Structured Data Extractor programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/structured-data-extractor').call({
    urls: ['https://en.wikipedia.org/wiki/Web_scraping'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/structured-data-extractor').call(run_input={
    'urls': ['https://en.wikipedia.org/wiki/Web_scraping'],
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~structured-data-extractor/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://en.wikipedia.org/wiki/Web_scraping"]
  }'

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/structured-data-extractor"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/structured-data-extractor"
        }
    }
}

Example prompts

"Extract structured data from this product page: https://www.example.com/product/123"
"Get schema.org markup from these URLs and tell me which types they use"
"Check if these pages have JSON-LD structured data for rich snippets"

Learn more in the Apify MCP documentation.

Integrations

Structured Data Extractor works with all major automation platforms available on Apify. Export results to Google Sheets to build a structured data audit dashboard across your site. Use Zapier or Make to trigger extraction runs whenever new pages are published. Send alerts to Slack when pages are missing expected Schema.org types. Pipe results into n8n workflows for custom validation logic, or set up webhooks to trigger downstream actions as soon as a run finishes. Chain it with JSON-LD Validator to first extract and then validate your structured data.

Tips and best practices

Focus on pages eligible for rich results -- prioritize product pages, articles, FAQ pages, and recipe pages where structured data directly impacts search appearance
Filter by schemaTypes to quickly find pages missing specific types like Product, Article, or BreadcrumbList
Use structuredDataCount: 0 to find pages with no markup -- these are your biggest opportunities for SEO improvement
Combine with JSON-LD Validator to first extract structured data with this actor, then validate the JSON-LD blocks for errors and warnings
Schedule regular runs to catch structured data regressions after site deployments or CMS updates

Legality

This tool analyzes publicly accessible web content. Automated analysis of public web resources is standard practice in SEO and web development. Always respect robots.txt directives and rate limits when analyzing third-party websites. For personal data processing, ensure compliance with applicable privacy regulations.

FAQ

What structured data formats does this actor support? It extracts all three major formats: JSON-LD (script tags), Microdata (itemscope/itemprop attributes), and RDFa (typeof/property attributes).

Does it validate the structured data? No. This actor extracts and reports what structured data exists on a page. For validation of JSON-LD syntax and required fields, use the JSON-LD Validator actor.

Can it extract structured data from JavaScript-rendered pages? No. The actor uses plain HTTP requests and parses the initial HTML response. Structured data that is injected by client-side JavaScript after page load will not be captured.

The actor returns structuredDataCount: 0 for a page I know has structured data. Why? The actor uses plain HTTP requests and parses the initial HTML. If the structured data is injected by client-side JavaScript after page load (common with React, Angular, or Vue apps), it will not be captured. Test by viewing the page source (Ctrl+U) rather than the browser's inspector to see what the actor receives.

Why does the actor find Microdata but not JSON-LD on a page? Some websites use Microdata (HTML attributes like itemscope and itemprop) instead of JSON-LD script tags. Both are valid formats for structured data. The actor extracts both, and the format field in each structuredData entry tells you which format was used.

Other SEO tools

JSON-LD Validator -- Validate JSON-LD structured data for errors and warnings
OG Meta Extractor -- Extract Open Graph meta tags from web pages
SEO Title Checker -- Check page titles for SEO best practices
Subdomain Finder -- Discover subdomains via certificate transparency logs
Domain Availability Checker -- Check if domain names are available for registration

Who is it for?

This actor is for teams that need repeatable, API-ready extraction without maintaining scraping infrastructure.

API usage

Use the Apify API examples above to run this actor from scripts, scheduled jobs, or backend workflows.

Explore related Automation Lab actors on Apify for adjacent extraction and automation workflows.

Google Search Results Scraper - SERP API, PAA & Rich Results

santhej/google-search-results-scraper

Scrape Google Search results at scale: organic rankings, People Also Ask, AI Overview, featured snippets, ads & related searches. Any keyword, 190+ countries, desktop or mobile. Clean JSON/CSV. 95%+ success. No proxies or API keys to manage.

Santhej Kallada

5.0

Google Autocomplete Scraper - Keyword Suggestions at Scale

santhej/google-autocomplete-scraper

Scrape Google Autocomplete suggestions for any seed keyword — the real queries people type. Mine long-tail keywords, questions & content ideas in bulk. 190+ countries. Clean JSON/CSV for SEO & content research. No API keys.

Santhej Kallada

5.0

Google Shopping Scraper - Products, Prices & Sellers

santhej/google-shopping-scraper

Scrape Google Shopping results: product title, price, old price, currency, seller, rating, review count & product URL. Compare prices across sellers for any product. 190+ countries. Clean JSON/CSV for price intelligence & e-commerce. No API keys.

Santhej Kallada

5.0

LinkedIn Profile Scraper & Email Finder - No Login

santhej/linkedin-profile-email-scraper

Scrape public LinkedIn profiles in bulk: name, headline, location, about & current company. Optionally find verified work emails by company domain. Paste profile URLs or slugs. Clean JSON/CSV for recruiting & sales. No login required.

Santhej Kallada

5.0

Ranked Keywords Checker - Any Domain's Google Keywords

santhej/ranked-keywords-checker

See every keyword a domain ranks for on Google: position, search volume, CPC, competition, traffic estimate & the exact ranking URL. Spy on competitors or audit your own site. 190+ countries. Clean JSON/CSV. A cheap Ahrefs/Semrush alternative. No API keys.

Santhej Kallada

5.0

Google Search Autocomplete API

scraper-mind/google-search-autocomplete-api

[𝗖𝗵𝗲𝗮𝗽𝗲𝘀𝘁 𝗣𝗿𝗶𝗰𝗲] Google Search Autocomplete API: A powerful keyword research tool. Extract keywords with Google search autocomplete scraper. Ideal for bloggers, site owners, and marketers to boost SEO.

Scraper Mind

194

1.0

Instagram Profile Scraper - Followers, Bio & Business Info

santhej/instagram-profile-scraper

Scrape Instagram profiles in bulk: followers, following, posts, bio, verified & business status, category, external link, and public business email/phone. Paste usernames or URLs. Clean JSON/CSV for influencer & lead research. No login.

Santhej Kallada

5.0

AI Search Volume Explorer - Keyword Demand Across AI Search

santhej/ai-search-volume-explorer

Discover how often keywords are actually searched inside AI assistants like ChatGPT. Get AI search volume and 12-month trend for any keyword — the GEO/AEO metric traditional SEO tools can't show you. Clean JSON/CSV. Future-proof your content. No API keys.

Santhej Kallada

5.0

Backlink Checker - Backlinks, Referring Domains & Anchors

santhej/backlink-checker

Check backlinks for any domain or URL: referring page, anchor text, dofollow/nofollow, domain authority rank, spam score, and first/last seen dates. Filter by quality. Clean JSON/CSV for link audits & competitor research. An Ahrefs alternative. No API keys.

Santhej Kallada

5.0