Pricing

from $10.00 / 1,000 page extracteds

Web Structured Data Extractor (Claude, JSON Schema)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Pricing

from $10.00 / 1,000 page extracteds

Rating

0.0

(0)

Developer

Hojun Lee

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

Web Structured Data Extractor (Claude)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Input Parameters

Parameter	Type	Default	Description
`url`	string	`—`	URL of the page to extract from.
`schema`	object	`{}`	If provided, Claude will fill an object matching this schema. Use 'goa
`goal`	string	``	Natural-language description (e.g. 'price, in stock, top review'). Use
`model`	string	`claude-opus-4-7`	Claude model.
`anthropicApiKey`	string	`—`	Get one at console.anthropic.com.
`userAgent`	string	``	Custom UA for page fetch.

Why this exists

You want to scrape structured fields out of arbitrary web pages — price, SKU, rating, hours, contact info, reviews. Building a per-site CSS-selector scraper is brittle (sites change every week). General-purpose LLM extraction is robust but requires prompting + parsing.

This actor wraps the whole pipeline:

URL → clean Markdown via trafilatura
Markdown + your schema/goal → Claude
Claude returns strict JSON → we parse and validate

Same idea as DiffBot's Article API ($299/mo) or Browse AI's extraction ($99/mo) — but with your own Claude API key and no monthly subscription.

What you get

{
  "url": "https://...",
  "model": "claude-opus-4-7",
  "goal": "Extract product info",
  "schema_used": true,
  "extracted_data": {
    "name": "Nintendo Switch 2",
    "price_usd": 499.99,
    "in_stock": true,
    "rating": 4.8,
    "reviews_count": 1247
  },
  "raw_output": "...",
  "input_chars": 5230,
  "usage": {"input_tokens": 1450, "output_tokens": 80}
}

Two ways to specify what to extract

Option 1: JSON Schema (recommended)

{
  "url": "https://www.amazon.com/dp/B07VPHN6CR",
  "schema": {
    "type": "object",
    "properties": {
      "name":         {"type": "string"},
      "price_usd":    {"type": "number"},
      "in_stock":     {"type": "boolean"},
      "rating":       {"type": "number"},
      "reviews_count":{"type": "integer"}
    }
  },
  "anthropicApiKey": "sk-ant-..."
}

Option 2: Natural language goal

{
  "url": "https://...",
  "goal": "Extract product name, USD price, in-stock status, average rating",
  "anthropicApiKey": "sk-ant-..."
}

You can combine both — schema sets the shape, goal sets emphasis.

Use cases

E-commerce competitor monitoring — Track price + availability across competitors
Real estate listing aggregation — Extract beds/baths/price/sqft from Zillow, Redfin, Realtor
Job board scraping — Title, company, salary, location, remote-flag from LinkedIn, Indeed
News article fact extraction — Get the same 5 fields from any news source
Hotel / travel research — Name, rating, price/night, amenities from any booking site

Pricing

Pay-Per-Event: $0.01 per page (Apify-side).

Anthropic tokens charged separately. Typical:

Page complexity	Input tokens	Anthropic	Total
Simple product page	~1500	~$0.008	$0.018
Long article	~4000	~$0.020	$0.030
Big e-commerce listing	~8000	~$0.040	$0.050

Use Haiku for batch / cheap extraction (~10x cheaper).

Setting your Anthropic API key

See Article Summarizer README for the full BYO API key guide. Short version:

Get key at console.anthropic.com
Paste in anthropicApiKey input (Apify saves it encrypted)
Or save as Apify Account-level Secret and reference as @MY_KEY

Tips for reliable extraction

Better prompts → better results. A goal string like "extract product name, USD price, in-stock boolean, rating 1-5" outperforms "extract product info".
Constrain types in the schema. "type": "number" is stricter than "type": ["string","number","null"].
Test with Haiku first. Haiku 4.5 is fast and 10x cheaper for prototyping. Switch to Opus 4.7 when you need accuracy.

Article Summarizer — TL;DR instead of structured
Web Page → Markdown Converter — Just the body, no LLM
HTML Metadata Extractor — Cheaper for OG / Twitter / JSON-LD
JSON Schema Generator — Bootstrap a schema from samples

Feedback

A short review helps engineers find it: Leave a review on Apify Store

Structured Data Extractor — URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

Scott Helvick

Resume / CV Parser (Claude → Structured JSON)

gochujang/resume-parser

Pass a PDF resume URL (or text). Returns structured JSON: name, email, phone, location, current title, skills, education, experience (with highlights), languages, links. Powered by Claude with strict schema. BYO Anthropic API key. $0.02 per resume.

Hojun Lee

Schema Markup Extractor - JSON-LD SEO Data

benthepythondev/schema-markup-extractor

Extract Schema.org JSON-LD structured data from web pages, including schema types, nodes, block counts and parse errors.

Ben

Website JSON-LD and Schema.org Extractor

automationagents/web-json-ld

Extract structured JSON-LD and Schema.org data from any web page. Pull products, articles, breadcrumbs, and rich results for SEO and data work.

Alex Jordan

AI Web Scraper — Any Site to JSON with GPT or Claude

flash_scraper/ai-universal-scraper

AI web scraper that turns any URL into clean, structured JSON. List the fields you want or describe them in plain English, bring your own OpenAI (GPT) or Anthropic (Claude) key, and the model reads the page like a human — no CSS selectors, no per-site code. Export JSON, CSV, or Excel.

Flash Scrape

5.0

Schema.org Markup Validator

scrappy_garden/schema-org-markup-validator

Validate Schema.org structured data for SEO. Parses JSON-LD, detects Microdata and RDFa, highlights schema types, and reports common issues like invalid JSON-LD, missing @type, non-schema.org @context, and missing key properties for popular schema types.

Bikram Adhikari

JSON-LD Extractor - Schema.org Structured Data & Rich Snippets

ninhothedev/json-ld-extractor

$0.5/1K 🔥 JSON-LD structured data extractor! Pull Schema.org markup — products, articles, recipes & events — from any URL. No API key. Export JSON, CSV, Excel or API in seconds. Perfect for SEO audits & RAG ⚡

ninhothedev

Schema Markup Generator — JSON-LD Structured Data

perryay/schema-markup-generator

Generate Google-approved JSON-LD structured data for 30+ Schema.org types. Supports SEO schema like Article, Product, FAQ, Event, LocalBusiness, and more.

Perry AY

Claude AI Web Automation

dtrungtin/claude-ai-web-automation

A real browser with Anthropic's Claude models to navigate any website and extract structured data — no CSS selectors or page-specific scraping code required.

Tin

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.