Product Data Extractor (price, stock, rating) avatar

Product Data Extractor (price, stock, rating)

Pricing

Pay per usage

Go to Apify Store
Product Data Extractor (price, stock, rating)

Product Data Extractor (price, stock, rating)

Extract clean, normalized product data — name, price, currency, availability, brand, rating, SKU/GTIN, image — from public product pages via JSON-LD, microdata, and OpenGraph. HTML-only, fast, structured output.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Tommy G

Tommy G

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Product Data Extractor (Apify Actor)

Give it public product page URLs, get back clean, normalized product data — name, price, currency, availability, in-stock, brand, rating, SKU/GTIN/MPN, image — pulled from JSON-LD, microdata, and OpenGraph. HTML-only (no headless browser) so it's fast and cheap. Ideal for price monitoring, competitor tracking, catalog enrichment, and feed building.

Why it's useful (and money-first)

Price/stock monitoring is one of the most-demanded scraping jobs. This actor turns messy product markup (which comes in dozens of shapes — Offer vs AggregateOffer, price as string vs number, 1.299,00 vs $1,299.00, availability URLs vs text) into one stable, tidy record.

Input

{ "startUrls": [{ "url": "https://scrapeme.live/shop/Bulbasaur/" }], "maxConcurrency": 5, "maxPages": 100 }

maxPages capped at 200, maxConcurrency at 20 (cost guard).

Output — one STABLE record per URL (ok and error rows share the shape)

{
"status": "ok",
"requested_url": "https://shop.example.com/widget",
"final_url": "https://shop.example.com/widget",
"http_status": 200,
"found": true,
"source": "json-ld",
"name": "Acme Widget",
"brand": "Acme",
"price": 19.99,
"currency": "USD",
"availability": "InStock",
"in_stock": true,
"rating_value": 4.5,
"rating_count": 231,
"sku": "AW-1",
"gtin": "0123456789012",
"mpn": null,
"image": "https://cdn.example.com/w.jpg",
"description": "...",
"offers_count": 1,
"extracted_at": "2026-05-29T..."
}

source is json-ld | microdata | opengraph | none. found:false means no product data was present in the page markup (e.g. a blog or a JS-rendered shop). Failed fetches return the same keys with status:"error" + error.

Run locally / test

npm install
npm test # unit tests on the pure extractor (node:test)

Publish to Apify (account-holder's step)

npm install -g apify-cli
apify login # free Apify account
apify push # from this directory

Keep it free initially; enable pricing later via the adult account-holder once it shows repeat organic usage and clears a margin gate.

Notes / safety

  • SSRF-guarded (scheme + private/metadata IP block + redirect re-check), robots-respecting, rate-limited, cost-capped — all via the shared src/lib/actor_runner.js.
  • Stores only derived product fields — no raw page bodies / PII.
  • HTML-only: client-rendered shops that inject product JSON via JS will return found:false (no server-side markup to read). Core logic in src/extract.js (pure, unit-tested).