Product Data Extractor (price, stock, rating)
Pricing
Pay per usage
Product Data Extractor (price, stock, rating)
Extract clean, normalized product data — name, price, currency, availability, brand, rating, SKU/GTIN, image — from public product pages via JSON-LD, microdata, and OpenGraph. HTML-only, fast, structured output.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Tommy G
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Product Data Extractor (Apify Actor)
Give it public product page URLs, get back clean, normalized product data — name, price, currency, availability, in-stock, brand, rating, SKU/GTIN/MPN, image — pulled from JSON-LD, microdata, and OpenGraph. HTML-only (no headless browser) so it's fast and cheap. Ideal for price monitoring, competitor tracking, catalog enrichment, and feed building.
Why it's useful (and money-first)
Price/stock monitoring is one of the most-demanded scraping jobs. This actor turns messy
product markup (which comes in dozens of shapes — Offer vs AggregateOffer, price as string vs
number, 1.299,00 vs $1,299.00, availability URLs vs text) into one stable, tidy record.
Input
{ "startUrls": [{ "url": "https://scrapeme.live/shop/Bulbasaur/" }], "maxConcurrency": 5, "maxPages": 100 }
maxPages capped at 200, maxConcurrency at 20 (cost guard).
Output — one STABLE record per URL (ok and error rows share the shape)
{"status": "ok","requested_url": "https://shop.example.com/widget","final_url": "https://shop.example.com/widget","http_status": 200,"found": true,"source": "json-ld","name": "Acme Widget","brand": "Acme","price": 19.99,"currency": "USD","availability": "InStock","in_stock": true,"rating_value": 4.5,"rating_count": 231,"sku": "AW-1","gtin": "0123456789012","mpn": null,"image": "https://cdn.example.com/w.jpg","description": "...","offers_count": 1,"extracted_at": "2026-05-29T..."}
source is json-ld | microdata | opengraph | none. found:false means no product data
was present in the page markup (e.g. a blog or a JS-rendered shop). Failed fetches return the
same keys with status:"error" + error.
Run locally / test
npm installnpm test # unit tests on the pure extractor (node:test)
Publish to Apify (account-holder's step)
npm install -g apify-cliapify login # free Apify accountapify push # from this directory
Keep it free initially; enable pricing later via the adult account-holder once it shows repeat organic usage and clears a margin gate.
Notes / safety
- SSRF-guarded (scheme + private/metadata IP block + redirect re-check), robots-respecting,
rate-limited, cost-capped — all via the shared
src/lib/actor_runner.js. - Stores only derived product fields — no raw page bodies / PII.
- HTML-only: client-rendered shops that inject product JSON via JS will return
found:false(no server-side markup to read). Core logic insrc/extract.js(pure, unit-tested).