JSON-LD Schema & Meta Tag Extractor avatar

JSON-LD Schema & Meta Tag Extractor

Pricing

from $3.50 / 1,000 results

Go to Apify Store
JSON-LD Schema & Meta Tag Extractor

JSON-LD Schema & Meta Tag Extractor

Bulk JSON-LD structured data scraper and meta tag extractor for any URL. Export Schema.org, OpenGraph and Twitter Cards to CSV/JSON. No API.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

1

Bookmarked

35

Total users

8

Monthly active users

a day ago

Last modified

Share

Extract JSON-LD (Schema.org) structured data and SEO meta tags from any URL. This structured data scraper and meta tag checker pulls JSON-LD blocks, standard meta tags, OpenGraph and Twitter Card markup from a list of pages and returns one clean, structured row per URL — ready for technical SEO audits, schema validation, competitor research and AI datasets.

If you need a JSON-LD extractor, Schema.org scraper, OpenGraph scraper or Twitter Card validator that runs in bulk and exports clean JSON and CSV, this Actor is built for it. No login, no browser.


What you get

One row per URL, with every structured-data and metadata source on the page:

FieldDescription
urlThe scraped page URL
titleThe HTML <title> of the page
descriptionThe <meta name="description"> value
jsonLdArray of all JSON-LD objects (<script type="application/ld+json">) found on the page
openGraphOpenGraph tag object (og:title, og:type, og:image, og:url, og:site_name, …)
twitterTwitter Card tag object (twitter:card, twitter:title, twitter:image, …)
scrapeDateISO 8601 timestamp of the scrape, for diffing and change tracking

Example output

{
"url": "https://example.com/product/abc",
"title": "ABC Product — Example",
"description": "Buy ABC Product with fast shipping.",
"jsonLd": [
{
"@context": "https://schema.org",
"@type": "Product",
"name": "ABC Product",
"offers": { "@type": "Offer", "price": "49.99", "priceCurrency": "USD" }
}
],
"openGraph": {
"og:title": "ABC Product — Example",
"og:type": "product",
"og:image": "https://example.com/images/abc.jpg",
"og:url": "https://example.com/product/abc"
},
"twitter": {
"twitter:card": "summary_large_image",
"twitter:title": "ABC Product — Example"
},
"scrapeDate": "2026-06-05T12:00:00.000Z"
}

Export everything as JSON or CSV, or pull it straight from the Apify API into your reporting stack.


Use cases

  • Technical SEO audits — validate Schema.org coverage and consistency across pages, and catch missing or malformed OpenGraph and Twitter metadata before they hurt rich results and social previews.
  • Schema validation & QA — spot malformed JSON-LD and missing required properties (Product without offers, Article without author, etc.) across a whole site.
  • Competitor analysis — reverse-engineer which schema types competitors deploy (Product, FAQPage, Recipe, Article, Organization, BreadcrumbList) and how they structure their data.
  • Social preview QA — detect missing og:image or wrong twitter:card that break link previews on LinkedIn, X, Slack and Facebook.
  • AI training & content datasets — build structured datasets that pair page metadata with Schema.org markup for LLM and RAG pipelines.

How to use

  1. Click Try for free / Start.
  2. Add the pages you want to audit under Target URLs.
  3. (Recommended) Leave Proxy Configuration on to avoid blocking on larger crawls.
  4. Click Save & Start.
  5. Export results as JSON or CSV, or fetch them via the Apify REST API.

Example input

{
"startUrls": [
{ "url": "https://www.imdb.com/title/tt0111161/" },
{ "url": "https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/" }
],
"proxyConfiguration": { "useApifyProxy": true }
}

How it works

For each URL the Actor fetches the served HTML and parses out four signal sources: every <script type="application/ld+json"> block (returned as parsed objects in jsonLd), the page <title> and meta description, all og:* OpenGraph tags, and all twitter:* Twitter Card tags. Results are normalized into one structured row per URL with a scrapeDate timestamp so you can diff runs over time. It's pure HTTP, so runs are fast and inexpensive.


Pro tips

  • Audit schema coverage at scale — export JSON and aggregate the @type values across jsonLd to see exactly which Schema.org types your site covers (Organization, WebSite, Article, Product, FAQPage, BreadcrumbList, Recipe, LocalBusiness).
  • Compare templates — run the homepage, a category page, a product page and a blog post together to catch template-level structured-data bugs.
  • Track regressions — schedule daily or weekly runs and diff scrapeDate-stamped outputs to detect schema or OpenGraph regressions right after a deployment.
  • Start small — test a single URL first, then scale up to thousands with proxy enabled.

FAQ

What is JSON-LD and why extract it?

JSON-LD is the Schema.org structured-data format Google reads to power rich results (stars, prices, FAQs, recipes). Extracting it lets you verify that every page exposes the correct, complete markup search engines need.

Why is my jsonLd field empty?

Either the page does not implement JSON-LD, the schema is injected client-side by JavaScript (this Actor reads server-rendered HTML), or the request was rate-limited. Enable proxy and test a single URL first.

Can it scrape OpenGraph and Twitter Cards too?

Yes. Every run returns the full openGraph and twitter tag objects alongside JSON-LD, so you can audit structured data and social-preview metadata in one pass.

How many URLs can I run at once?

You can pass thousands of URLs per run. Keep proxy enabled for large crawls to avoid blocking and rate limits.

What export formats are supported?

JSON, CSV, Excel, HTML table and the full Apify REST API, like every Apify Actor.

How do I extract JSON-LD structured data in bulk without an API?

Paste your list of URLs and run. The Actor reads the served HTML directly, so there is no site API, login or browser to set up — it returns parsed JSON-LD for every page in one pass.

How do I export Schema.org and meta tag data to CSV or JSON?

Every run writes a clean dataset you can download as CSV or JSON (or pull via the Apify REST API), with one row per URL containing JSON-LD, OpenGraph and Twitter Card fields.

Can I use this as a bulk OpenGraph and Twitter Card scraper?

Yes. It doubles as a bulk OpenGraph scraper and Twitter Card validator, returning the full openGraph and twitter tag objects alongside the JSON-LD for thousands of URLs per run.


Pairs well with

  • Website to Markdown & Text Crawler — clean page content for LLM and RAG pipelines.
  • Website SEO Audit Crawler — full on-page SEO audit for every page.
  • Broken Link Checker — find dead 404 links across a whole site.

📝 Changelog

2026-06-07

  • Docs: added coverage for bulk JSON-LD extraction without an API, exporting Schema.org and meta tag data to CSV/JSON, and using the Actor as a bulk OpenGraph/Twitter Card scraper.

2026-06-05

  • 🛡️ Reliability fix: results are no longer dropped by strict output validation — runs now complete cleanly even at high volume (thousands of results).
  • ⚡ Stability & performance hardening; fresh rebuild.

2026-06-04

  • Verified live & refreshed build — reliability/maintenance pass.