Review & Rating Extractor (aggregate + individual) avatar

Review & Rating Extractor (aggregate + individual)

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Review & Rating Extractor (aggregate + individual)

Review & Rating Extractor (aggregate + individual)

Under maintenance

Extract the aggregate rating (value, count, best) AND individual reviews (author, rating, date, title, body) from public product, business, and article pages via JSON-LD Review and AggregateRating. HTML-only, fast, structured output with clean ok/error parity.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Tommy G

Tommy G

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Reviews & Ratings Extractor (Apify Actor)

Give it any public page URL and get back clean, normalized review and rating data — the item being reviewed, its aggregate rating (value/best/count), and the individual reviews on the page (author, rating, body, date) — pulled from schema.org Review, AggregateRating, and review-bearing types in JSON-LD and microdata. HTML-only (no headless browser) so it's fast and cheap. Ideal for reputation monitoring, review datasets, and product/listing research.

What it extracts

For each page it returns one flat record with:

  • item_name, item_type (what the reviews are about, e.g. Product / LocalBusiness / Recipe)
  • aggregate_rating, aggregate_best, aggregate_count (the summary star rating)
  • reviews[] — individual reviews found on the page, reviews_extracted (how many)

Plus control keys present on every row (ok and error alike, for clean buyer tables):

status, requested_url, final_url, http_status, redirected, found, complete, page_type, source, render_required, fields_found, error, extracted_at
.

Input

{ "startUrls": [{ "url": "https://example.com/product/123" }], "maxConcurrency": 5, "maxPages": 100 }

maxPages capped at 200, maxConcurrency at 20 (cost guard).

Output — one STABLE record per URL (ok and error rows share the shape)

{
"status": "ok",
"requested_url": "https://example.com/product/123",
"final_url": "https://example.com/product/123",
"http_status": 200,
"found": true,
"complete": true,
"page_type": "review",
"source": "json-ld",
"item_name": "Acme Widget",
"item_type": "Product",
"aggregate_rating": 4.5,
"aggregate_best": 5,
"aggregate_count": 231,
"reviews": [
{ "author": "Sam", "rating": 5, "body": "Works great.", "date": "2026-04-10" },
{ "author": "Lee", "rating": 4, "body": "Good value.", "date": "2026-04-02" }
],
"reviews_extracted": 2,
"fields_found": ["item_name", "aggregate_rating", "aggregate_count", "reviews"],
"extracted_at": "2026-05-29T..."
}

found:false means no review/rating markup was present (e.g. a page with no schema.org review data, or a JS-rendered review widget). Failed fetches return the same keys with status:"error" + error.

Use cases

  • Reputation monitoring — track aggregate rating and review counts for your listings over time.
  • Review datasets — collect individual reviews across many product/service pages for analysis.
  • Competitor / market research — compare rating distributions and review volume across pages.

Notes / safety

  • Reads only public schema.org review/rating markup — facts-only, no PII beyond what the page itself publicly publishes; no raw page bodies stored.
  • SSRF-guarded (scheme + private/metadata IP block + redirect re-check), robots-respecting, rate-limited, cost-capped — all via the shared src/lib/actor_runner.js.
  • HTML-only: client-rendered review widgets that inject JSON via JS return found:false (no server-side markup to read). Core logic in src/extract.js (pure, unit-tested).

Run locally / test

npm install
npm test # unit tests on the pure extractor (node:test)

Publish to Apify (account-holder's step)

$npm install -g apify-cli && apify login && apify push

Keep it free initially; enable pricing later via the adult account-holder once it shows repeat organic usage and clears a margin gate.