JSON-LD & Schema.org Extractor avatar

JSON-LD & Schema.org Extractor

Pricing

from $1.00 / 1,000 url checkeds

Go to Apify Store
JSON-LD & Schema.org Extractor

JSON-LD & Schema.org Extractor

Extract structured microdata (JSON-LD) from webpages to audit SEO schema implementations and rich snippets.

Pricing

from $1.00 / 1,000 url checkeds

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

17 days ago

Last modified

Share

Extract JSON-LD structured data and Schema.org markup from any web page without writing a custom parser. Structured data powers rich search results, knowledge panels, and product carousels -- yet validating it at scale is painful. Feed in a list of URLs and get every JSON-LD block parsed, validated, and returned as clean JSON objects ready for SEO audits or data pipelines.

Features

  • Full JSON-LD extraction — parses every <script type="application/ld+json"> block on the page
  • Error reporting — catches and reports malformed JSON so you can fix broken markup immediately
  • Schema.org support — handles Product, Article, BreadcrumbList, LocalBusiness, Recipe, Organization, and all other types
  • Bulk processing — scan hundreds of URLs in a single run with configurable concurrency
  • Clean structured output — each JSON-LD object is returned as a parsed JavaScript object, not a raw string
  • Pay-per-event billing — you only pay for each URL checked, with automatic charge-limit enforcement

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesList of page URLs to scan for JSON-LD structured data blocks.
urlstringNoSingle URL to scan (for backwards compatibility). Use urls for bulk processing.
timeoutSecondsintegerNo15Maximum seconds to wait for each page response before timing out.
concurrencyintegerNo10Number of URLs to process in parallel. Increase for large batches, decrease if you hit rate limits.

Input Example

{
"urls": [
"https://crawlee.dev",
"https://www.bbc.com/news",
"https://www.amazon.com/dp/B09V3KXJPB"
]
}

Output

Each URL produces one dataset item containing all parsed JSON-LD objects and any parse errors.

  • inputUrl (string) — the URL you submitted
  • finalUrl (string) — the URL after redirects
  • status (number) — HTTP status code
  • jsonLdCount (number) — number of JSON-LD blocks found
  • jsonLdData (array) — list of parsed JSON-LD objects (each preserving its original @type, @context, etc.)
  • parseErrors (array) — list of error messages for malformed JSON-LD blocks
  • error (string | null) — error message if the URL could not be fetched
  • checkedAt (string) — ISO 8601 timestamp of when the check was performed

Output Example

{
"inputUrl": "https://www.bbc.com/news",
"finalUrl": "https://www.bbc.com/news",
"status": 200,
"jsonLdCount": 2,
"jsonLdData": [
{
"@context": "https://schema.org",
"@type": "WebPage",
"name": "BBC News",
"url": "https://www.bbc.com/news"
},
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "BBC News",
"logo": "https://www.bbc.com/news/logo.png"
}
],
"parseErrors": [],
"error": null,
"checkedAt": "2025-01-15T10:30:00.000Z"
}

Pricing

EventCost
URL CheckedPay-per-event (see actor pricing page)

The actor stops automatically when the per-run charge limit is reached, so you never overspend.

Use Cases

  • SEO auditing — verify that Schema.org markup is present and correctly structured across your entire site
  • Rich snippet QA — check Product, Recipe, and Article schemas before deploying to production
  • Data extraction — pull structured pricing, ratings, and author info without building a custom scraper per site
  • Competitive analysis — compare which structured data types your competitors implement to identify gaps
  • Migration validation — confirm that JSON-LD blocks survived a site redesign or CMS migration intact
ActorWhat it adds
OpenGraph & Twitter Card InspectorChecks OG and Twitter Card tags — combine with JSON-LD extraction for a complete metadata audit
Website Tech Stack AnalyzerDetects the CMS and framework — understand which platform generates the structured data
Sitemap URL ExtractorExtracts all URLs from a sitemap — feed the output into this actor to audit JSON-LD across an entire site