Web Structured Data Extractor (Claude, JSON Schema) avatar

Web Structured Data Extractor (Claude, JSON Schema)

Pricing

Pay per usage

Go to Apify Store
Web Structured Data Extractor (Claude, JSON Schema)

Web Structured Data Extractor (Claude, JSON Schema)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

Web Structured Data Extractor (Claude)

Pass a URL + JSON schema (or natural-language goal). Claude reads the page and returns a strict JSON object matching your schema. Product / news / hotel / real-estate / job-board extraction. BYO Anthropic API key. $0.01 per page.


Why this exists

You want to scrape structured fields out of arbitrary web pages — price, SKU, rating, hours, contact info, reviews. Building a per-site CSS-selector scraper is brittle (sites change every week). General-purpose LLM extraction is robust but requires prompting + parsing.

This actor wraps the whole pipeline:

  1. URL → clean Markdown via trafilatura
  2. Markdown + your schema/goal → Claude
  3. Claude returns strict JSON → we parse and validate

Same idea as DiffBot's Article API ($299/mo) or Browse AI's extraction ($99/mo) — but with your own Claude API key and no monthly subscription.


What you get

{
"url": "https://...",
"model": "claude-opus-4-7",
"goal": "Extract product info",
"schema_used": true,
"extracted_data": {
"name": "Nintendo Switch 2",
"price_usd": 499.99,
"in_stock": true,
"rating": 4.8,
"reviews_count": 1247
},
"raw_output": "...",
"input_chars": 5230,
"usage": {"input_tokens": 1450, "output_tokens": 80}
}

Two ways to specify what to extract

{
"url": "https://www.amazon.com/dp/B07VPHN6CR",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price_usd": {"type": "number"},
"in_stock": {"type": "boolean"},
"rating": {"type": "number"},
"reviews_count":{"type": "integer"}
}
},
"anthropicApiKey": "sk-ant-..."
}

Option 2: Natural language goal

{
"url": "https://...",
"goal": "Extract product name, USD price, in-stock status, average rating",
"anthropicApiKey": "sk-ant-..."
}

You can combine both — schema sets the shape, goal sets emphasis.


Use cases

  1. E-commerce competitor monitoring — Track price + availability across competitors
  2. Real estate listing aggregation — Extract beds/baths/price/sqft from Zillow, Redfin, Realtor
  3. Job board scraping — Title, company, salary, location, remote-flag from LinkedIn, Indeed
  4. News article fact extraction — Get the same 5 fields from any news source
  5. Hotel / travel research — Name, rating, price/night, amenities from any booking site

Pricing

Pay-Per-Event: $0.01 per page (Apify-side).

Anthropic tokens charged separately. Typical:

Page complexityInput tokensAnthropicTotal
Simple product page~1500~$0.008$0.018
Long article~4000~$0.020$0.030
Big e-commerce listing~8000~$0.040$0.050

Use Haiku for batch / cheap extraction (~10x cheaper).


Setting your Anthropic API key

See Article Summarizer README for the full BYO API key guide. Short version:

  1. Get key at console.anthropic.com
  2. Paste in anthropicApiKey input (Apify saves it encrypted)
  3. Or save as Apify Account-level Secret and reference as @MY_KEY

Tips for reliable extraction

  • Better prompts → better results. A goal string like "extract product name, USD price, in-stock boolean, rating 1-5" outperforms "extract product info".
  • Constrain types in the schema. "type": "number" is stricter than "type": ["string","number","null"].
  • Test with Haiku first. Haiku 4.5 is fast and 10x cheaper for prototyping. Switch to Opus 4.7 when you need accuracy.


Feedback

A short review helps engineers find it: Leave a review on Apify Store