Structured Data Scraper (Schema.org) avatar
Structured Data Scraper (Schema.org)

Pricing

Pay per event

Go to Apify Store
Structured Data Scraper (Schema.org)

Structured Data Scraper (Schema.org)

Developed by

Datavault

Datavault

Maintained by Community

Fast, lightweight scraper that extracts structured data (JSON-LD & microdata) from HTML pages. Ideal for e-commerce and sites that embed schema.org markup without heavy client-side rendering.

0.0 (0)

Pricing

Pay per event

0

5

5

Last modified

8 days ago

Fast scraper optimized for sites that follows schema.org structured data without heavy client-side rendering. It is great for e-commerce sites.

Speed first. Lightweight because it parses static HTML instead of launching a browser. Pages that require client-side rendering may need a headless browser (for example Playwright or Puppeteer).

What you get

  • Schema.org payloads collected from JSON-LD <script> tags and microdata attributes.
  • Final URL, status code, and page title for quick validation.
  • Dataset output suitable for feeding into validation tools or downstream pipelines.

Input

Provide at least one URL via url (string, array, or Apify request object) or urls (array). Optional settings:

  • maxRequestsPerCrawl – stop the crawl after N requests (defaults to the number of provided URLs).
  • proxyConfiguration – standard Apify proxy configuration block.

Output

Each dataset item contains:

  • inputUrl, loadedUrl, statusCode, title, retrievedAt
  • schema.jsonLd – parsed JSON-LD blocks
  • schema.microdata – microdata trees normalised into nested objects

Sample INPUT.json

{
"url": [
{
"url": "https://schema.dev/blog/schema-markup-builder-video-walkthroughs/"
},
{
"url": "https://schema.dev/blog/schema-seo-boost-your-websites-visibility-with-structured-data/"
},
{
"url": "https://schema.dev/blog/schema-tests-unleashing-the-full-potential-of-your-seo-strategy/"
},
{
"url": "https://schema.dev/blog/understanding-product-schema-a-key-to-better-product-visibility-online/"
},
{
"url": "https://schema.dev/blog/5-types-of-schema-markup-every-legal-service-should-use-for-seo/"
}
]
}