JSON-LD & Schema.org Extractor
Pricing
from $1.00 / 1,000 url checkeds
JSON-LD & Schema.org Extractor
Extract structured microdata (JSON-LD) from webpages to audit SEO schema implementations and rich snippets.
Pricing
from $1.00 / 1,000 url checkeds
Rating
0.0
(0)
Developer
Andok
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
17 days ago
Last modified
Categories
Share
Extract JSON-LD structured data and Schema.org markup from any web page without writing a custom parser. Structured data powers rich search results, knowledge panels, and product carousels -- yet validating it at scale is painful. Feed in a list of URLs and get every JSON-LD block parsed, validated, and returned as clean JSON objects ready for SEO audits or data pipelines.
Features
- Full JSON-LD extraction — parses every
<script type="application/ld+json">block on the page - Error reporting — catches and reports malformed JSON so you can fix broken markup immediately
- Schema.org support — handles Product, Article, BreadcrumbList, LocalBusiness, Recipe, Organization, and all other types
- Bulk processing — scan hundreds of URLs in a single run with configurable concurrency
- Clean structured output — each JSON-LD object is returned as a parsed JavaScript object, not a raw string
- Pay-per-event billing — you only pay for each URL checked, with automatic charge-limit enforcement
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | — | List of page URLs to scan for JSON-LD structured data blocks. |
url | string | No | — | Single URL to scan (for backwards compatibility). Use urls for bulk processing. |
timeoutSeconds | integer | No | 15 | Maximum seconds to wait for each page response before timing out. |
concurrency | integer | No | 10 | Number of URLs to process in parallel. Increase for large batches, decrease if you hit rate limits. |
Input Example
{"urls": ["https://crawlee.dev","https://www.bbc.com/news","https://www.amazon.com/dp/B09V3KXJPB"]}
Output
Each URL produces one dataset item containing all parsed JSON-LD objects and any parse errors.
inputUrl(string) — the URL you submittedfinalUrl(string) — the URL after redirectsstatus(number) — HTTP status codejsonLdCount(number) — number of JSON-LD blocks foundjsonLdData(array) — list of parsed JSON-LD objects (each preserving its original@type,@context, etc.)parseErrors(array) — list of error messages for malformed JSON-LD blockserror(string | null) — error message if the URL could not be fetchedcheckedAt(string) — ISO 8601 timestamp of when the check was performed
Output Example
{"inputUrl": "https://www.bbc.com/news","finalUrl": "https://www.bbc.com/news","status": 200,"jsonLdCount": 2,"jsonLdData": [{"@context": "https://schema.org","@type": "WebPage","name": "BBC News","url": "https://www.bbc.com/news"},{"@context": "https://schema.org","@type": "Organization","name": "BBC News","logo": "https://www.bbc.com/news/logo.png"}],"parseErrors": [],"error": null,"checkedAt": "2025-01-15T10:30:00.000Z"}
Pricing
| Event | Cost |
|---|---|
| URL Checked | Pay-per-event (see actor pricing page) |
The actor stops automatically when the per-run charge limit is reached, so you never overspend.
Use Cases
- SEO auditing — verify that Schema.org markup is present and correctly structured across your entire site
- Rich snippet QA — check Product, Recipe, and Article schemas before deploying to production
- Data extraction — pull structured pricing, ratings, and author info without building a custom scraper per site
- Competitive analysis — compare which structured data types your competitors implement to identify gaps
- Migration validation — confirm that JSON-LD blocks survived a site redesign or CMS migration intact
Related Actors
| Actor | What it adds |
|---|---|
| OpenGraph & Twitter Card Inspector | Checks OG and Twitter Card tags — combine with JSON-LD extraction for a complete metadata audit |
| Website Tech Stack Analyzer | Detects the CMS and framework — understand which platform generates the structured data |
| Sitemap URL Extractor | Extracts all URLs from a sitemap — feed the output into this actor to audit JSON-LD across an entire site |