Pricing

from $10.00 / 1,000 results

Go to Apify Store

Json Ld Schema Extractor

Try for free

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Donny

Actor stats

Bookmarked

Total users

Monthly active users

7 hours ago

Last modified

JSON-LD Schema Markup Extractor

What it does

Extracts JSON-LD structured data markup from web pages. For each provided URL, the actor fetches the HTML content, finds all script tags with type "application/ld+json", parses the JSON content, and outputs structured schema.org data including schema types, names, descriptions, and the raw schema JSON. Essential for SEO audits, competitive analysis, schema validation, and understanding how websites implement structured data markup for search engines.

This Apify actor automates the collection of data from a public API or website, extracting structured information and saving it directly into an Apify dataset. It handles pagination automatically where applicable, supports configurable result limits, and includes robust error handling with timeouts on all HTTP requests. The actor is designed for reliability: it validates inputs, applies sensible defaults, and produces a fallback record when no results are found, so your downstream workflows never receive an empty dataset. Built on the Apify SDK with native Node.js 20 fetch for lightweight, fast execution without browser overhead.

Why use it

Manually collecting data from web APIs and websites is tedious and error-prone. This actor eliminates that burden by running in the cloud on the Apify platform, where it can be scheduled, integrated with webhooks, or chained with other actors. Whether you are conducting research, building a knowledge base, monitoring data sources, or feeding data into an analytics pipeline, this actor gives you structured, ready-to-use JSON output with zero browser overhead. It uses lightweight HTTP requests instead of a full browser, which makes it fast and cost-effective. Every request includes a 120-second timeout to prevent hanging, and all string fields are null-checked for data consistency.

Input parameters

urls (array, required): List of webpage URLs to extract JSON-LD schema from. Default: ["https://www.wikipedia.org", "https://www.bbc.com"].

All inputs are validated at startup with sensible defaults applied when values are missing. The actor will log warnings for any misconfigured options and continue with safe defaults rather than failing outright.

Output data

Each item in the output dataset contains the following fields:

url: The URL of the page
schemaType: The @type value from the JSON-LD
schemaContext: The @context value from the JSON-LD
name: The name field if present in the schema
description: The description field if present in the schema
rawSchema: Full JSON stringified schema (truncated to 1000 chars)

All string fields are null-checked; missing values are stored as null rather than undefined.

Example output

{
    "url": "https://www.bbc.com",
    "schemaType": "Organization",
    "schemaContext": "https://schema.org",
    "name": "BBC",
    "description": "The BBC is the world's leading public service broadcaster.",
    "rawSchema": "{\"@context\":\"https://schema.org\",\"@type\":\"Organization\",\"name\":\"BBC\"...}"
}

Pricing

This actor is priced on a usage basis:

$0.01 per result returned in the dataset.
$0.005 per actor start (fixed platform fee).

For example, scraping 500 results would cost approximately $5.005. Apify provides free monthly credits for new users, so you can try the actor at no charge. Actual costs depend on the number of results, API response times, and memory allocation. You can control costs by setting the maxResults parameter to limit the number of results collected per run. For high-volume use cases, consider running the actor on a schedule during off-peak hours to optimize platform resource usage.

More scrapers from brave_paradise

Check out other useful scrapers built by brave_paradise:

Visit the brave_paradise profile on Apify to see the full catalogue of actors.

LD+JSON Schema scraper

pocesar/json-ld-schema

Extract all LD+JSON tags from the given URLs.

Paulo Cesar

430

5.0

(1)

JSON-LD Schema & Meta Tag Extractor

logiover/json-ld-schema-meta-tag-extractor

Extract JSON-LD/Schema.org structured data, Meta tags, OpenGraph and Twitter Cards from any URL. Get page title + meta description with a clean JSON output for SEO audits, validation, competitor research and AI datasets. Proxy-ready for large crawls.

Logiover

Schema Markup Extractor

urban_quidnunc/schema-markup-extractor

Donny

ZipRecruiter.com Job Listings Scraper

memo23/apify-ziprecruiter-scraper

Unlock the power of millions of job listings with our ZipRecruiter Scraper – Your gateway to real-time labor market insights! Navigate the job market like a pro with our ZipRecruiter Scraper. From salary trends to skill demands, access the data you need to stay ahead in today's competitive landscape

Muhamed Didovic

311

4.9

(6)

SEO/GEO - Schema Markup Scraper

wisteria_banjo/schema-markup-scraper

This actor to fetches JSON-LD/Schema Markup from Multiple URLs & checks whether the page contains markups for the following types: AggregateRating, Article, Event, FAQPage, LocalBusiness, Organization, Person, Product, & Review. Schema Markup helps search and generative engines find & read webpages.

Chris Xavier

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

1.0

(1)

Structured Data Scraper (Schema.org)

datavault/schemaorg

Fast, lightweight scraper that extracts structured data (JSON-LD & microdata) from HTML pages. Ideal for e-commerce and sites that embed schema.org markup without heavy client-side rendering.

Datavault

Website Content Crawler Pro

datascoutapi/website-content-crawler-pro

Crawl websites and extract clean, structured content in Markdown, JSON, or plain text for AI models, LLMs, vector DBs, or RAG pipelines. Fast, reliable, and stealthy, with bulk processing, advanced metadata extraction, and seamless integration with LangChain, LlamaIndex, and AI workflows.

halam

454

3.4

(3)

James Edition Real Estate Scraper

parseforge/james-edition-real-estate-scraper

Effortlessly collect detailed luxury real estate listings from James Edition with our advanced data collection tool. Designed for real estate professionals, investors, and market researchers, this tool pulls comprehensive listing data from world's premier luxury real estate marketplace.

ParseForge

5.0

(3)

Youtube Video Details Scraper

perfectscrape/youtube-video-details-scraper

Extract YouTube video details at scale from individual videos, search results, or playlists. Processes hundreds per minute - perfect for bulk data collection. Made for performance and saves apify costs. Extract titles, descriptions, views, likes, channel data etc.