Pricing

from $8.00 / 1,000 successful matches

Product Matcher

Cross-shop product matcher: returns a confident match or nothing, never a wrong pair. Give it a sample product URL and target e-shop domains; it finds the same product on each, verified by EAN check-digit, spec agreement, AI variant judge and photo match, with a 0-1 confidence score.

Pricing

from $8.00 / 1,000 successful matches

Rating

0.0

(0)

Developer

Jan Hilgard

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What a real run looks like

Sample: Apple iPhone 17 512 GB, Lavender. Target shop: one large electronics retailer. The shop sells many things that look almost right. Here is what the matcher does with the candidates it found:

✗  iPhone 17 256 GB Lavender          rejected — different capacity (256 vs 512 GB)
✗  iPhone 17 Pro 512 GB Lavender      rejected — different model (Pro ≠ base 17)
✗  iPhone 17 512 GB (refurbished)     rejected — opened-box / refurbished, not a new-retail listing
✗  iPhone 17 512 GB Orange            rejected — color + photo mismatch (triple camera)
✗  iPhone 17 512 GB (no image)        rejected — placeholder image, photo check inconclusive
✓  iPhone 17 512 GB Lavender          MATCH  confidence 0.97
      consensus: specs + AI variant judge + photo  (3 independent signals)
      reason: same model, capacity and color; product photo confirms the variant

Five rejections with reasons before one confident match — that's the product. Anything can return the sixth line. The value is the five lines above it, each with a reason you can read.

Why it is different

No silent errors. An uncertain candidate is returned as matched_url: null with a reason, never as a maybe-match. The bar to call something a match is deliberately high (precision-first). You can trust a true without re-checking it by hand.
Discovery + match, not just match. It does not need you to already have both catalogs. From the sample alone it finds the candidate on each target shop (web search → sitemap → the shop's own search box), then verifies it. A plain matcher that only compares two finished datasets cannot do the finding part.
Variant-aware. 512 GB vs 256 GB, base vs Pro, lavender vs orange, new vs refurbished — these are different products, and they are exactly where naive name/price matching breaks. The whole design exists to get the variant right.
Every decision is explainable. Confidence score, which signals fired, which layer decided, and a short human reason — per shop, in the output row.

How it works

Per shop, from input to decision:

Find candidates on the shop. The actor goes and finds the products that could be the match on the target shop itself — discovery, not pairing a dataset you already have.
Cheap pre-filter. Obviously-different products are dropped with cheap checks before any expensive verification runs, so effort — and error — is not spent on things that were never going to match.
Verify the survivors with independent signals. Each remaining candidate is checked by several independent signals (the signals below), not one test.
Consensus. The confidence score is how many of those independent signals agree, not one model's opinion (see the scoring below).
Decide or reject. Above the threshold it returns the match with a readable reason; below it, it returns null — never a guess.

Why this is accurate: no decision rests on a single signal, the deterministic EAN/GTIN check takes precedence wherever a code exists, and the visual check catches a wrong variant that text alone would miss. That is why it would rather return nothing than the wrong pair.

For each target domain it discovers candidate URLs, then verifies the best ones with independent signals and takes a consensus — the score is not one model's opinion, it is how many independent checks agree:

Signal	What it checks
`ean`	EAN / GTIN match, including check-digit validity
`specs`	structured spec agreement (brand, model, capacity, …)
`ai_judge`	an AI variant judge reads both listings: same product or not, and why
`photo`	independent product-photo comparison of the matched variant

confidence_score is 0–1 and reflects the consensus:

EAN match ........................ 0.99
3 independent signals ............ 0.97
2 independent signals ............ 0.92
1 signal ......................... 0.85
below threshold → url: null ...... "uncertain", returned but not a match

A candidate that no positive layer confirms is returned with matched_url: null. That is a feature, not a gap.

Mechanically: the actor is a thin client. The heavy work (headed browser, search, LLM judging, photo comparison) runs on the data.hilgard.cz service. The actor starts a resumable job (POST /api/match/jobs → { jobId }), streams its progress over SSE (GET /api/match/jobs/:id/stream?cursor=N), and writes results to the dataset. A match can run minutes to hours, so the job runs server-side independently of the connection: if the stream drops, the actor reconnects to the same job from the last event it saw (cursor) — no lost work, no duplicate logs. If the service has no job API, it falls back to the legacy synchronous POST /api/match stream.

Input

{
  "productPairs": [
    {
      "url": "https://www.alza.cz/apple-iphone-17-512gb",
      "domains": ["datart.cz", "mall.cz"]   // 1–20 target domains
    }
  ],
  "includeDebug": false,        // optional, keep the _debug block
  "requestTimeoutSecs": 3600,   // optional, total per-pair budget across reconnects
  "maxReconnects": 20,          // optional, give up after N stalled reconnects
  "reconnectBackoffSecs": 3     // optional, delay between reconnects
}

Each item in productPairs is one matching job. url must be an http(s) URL and domains must contain 1–20 non-empty domains. Pairs are processed sequentially. requestTimeoutSecs is the total budget for one pair across all reconnects; maxReconnects only bounds consecutive reconnects that make no progress, so a healthy long-running match is not cut short.

Output

One dataset row per target domain, with the sample product's profile inlined:

{
  "source_url": "https://www.alza.cz/apple-iphone-17-512gb",
  "source_domain": "alza.cz",
  "product_name": "Apple iPhone 17 512 GB",
  "brand": "Apple",
  "model": "iPhone 17 512GB",
  "ean": "1949483485734",
  "source_price": 27990,
  "currency": "CZK",
  "image_url": "https://...",

  "matched_domain": "datart.cz",
  "matched_url": "https://datart.cz/iphone-17-512gb",  // null when no confident match
  "confidence_score": 0.99,
  "method": "sitemap",          // sitemap | brave | google | null
  "reason": "EAN match",
  "decided_by": "ean",          // ean | ai_variant_judge | photo | null
  "signals": ["ean"],           // ean | specs | ai_judge | photo
  "matched_fields": [{ "field": "ean", "agree": true }],
  "price_candidate": 26490,     // observed price on the matched shop

  "scraped_at": "2026-06-18T10:00:00.000Z",
  "success": true               // false when matched_url is null
}

When includeDebug is true, each row also carries the service's _debug block. If a pair fails entirely, one placeholder row per requested domain is written with success: false and an error message.

Use cases

Pricing intelligence. Map your product to the same product on competitor shops, then compare prices on pairs you can trust. price_candidate is the observed price on the matched shop; the comparison is yours to make.
Competitor monitoring. Track whether — and where — a given product appears across a set of shops over time.
Catalog deduplication / matching. Link the same product across two catalogs, variant-correct, with a confidence you can threshold on.

It is built for correctness on the hard cases (variants, refurbished, near-misses), not for sheer volume. Pick it when getting the pair right matters more than getting a pair at any cost.

Pricing — two tiers

This actor uses Pay Per Event, with two events:

shop-checked — a small fee for each shop we check.
successful-match — the main fee, charged additionally when a product is confirmed on a shop (a success: true row with a matched_url, confidence above the threshold).

The shop-checked fee pays for discovery, not for a miss. Most matchers take two finished catalogs you already have and pair them. This one does not assume you have the other side — it goes and finds the product on each shop itself: web search, anti-bot fetch, vision, multi-signal verification (EAN/GTIN, specs, AI variant judge, photo). That search is the expensive part, and it runs in full whether or not the product turns out to be on the shop. So you pay a small fee per shop checked for that work, and the main fee only when a shop yields a confirmed match.

To be clear upfront: a shop where we find nothing still costs the shop-checked fee, because the discovery work was done there too. A confirmed match costs shop-checked + successful-match.

No flat fee. No per-SLA pricing. You are billed per event, for work actually done. The current price of each event is in the Apify Console pricing tab.

When NOT to use this

I would rather you not waste money, so:

You already have both full catalogs (your products + theirs), cleanly keyed. Then you do not need discovery — a plain dataset-to-dataset matcher is cheaper. This actor earns its price when you don't have the other side and need it found and verified.
You only have a handful of one-off products. Just paste them into the data.hilgard.cz UI and read the result. No actor run needed.
You want the cheapest possible "good enough" matching at huge volume. This one optimizes for being right on hard variants, not for being the cheapest. If a few wrong pairs in a thousand are acceptable to you, a lighter tool will cost less.

If the correctness story above is not what you need, one of these is the better call.

About

Built on the data.hilgard.cz matching engine — the same self-healing stack that does anti-bot fetching, model-by-meaning extraction, and independent verification, applied to cross-shop matching.

By Jan Hilgard (founder of Hosting90, acquired; core contributor to vllm-mlx). The precision-first stance is deliberate: I would rather ship a matcher you can trust on the hard cases than one that always answers.

Development

npm install
npm run build      # tsc → dist/
npm run start:dev  # tsx src/main.ts, reads .actor/INPUT.json

TikTok Shop Product Scraper

toolzerhub/tiktok-shop-product-scraper

Fetch a full TikTok Shop product detail record by product ID or product page URL and region, including product info, shop info, reviews summary, vouchers, category data, and shop-performance fields.

ToolzerHub

TikTok Shop Reviews Scraper

lurkapi/tiktok-shop-reviews-scraper

Scrape TikTok Shop product reviews by product URL, product ID, or shop URL. Returns review text, rating, date, author, photos, and product context per row.

LurkAPI

Tiktok Shop Product All Data Scraper [No Login Required] ✅

datamagnet/tiktok-shop-product-all-data-scraper-no-login-required

Fetches TikTok Shop product detail, reviews, sales trend, and seller data for a list of product IDs in a given region, and merges them into a single combined record per product.

Datamagnet

Shopee Shop Scraper

xtracto/shopee-shop-scraper

Extract product listings from a Shopee shop by username, shop ID, or shop URL. Optionally enrich each product with full detail data (variants, stock, images, attributes, and seller information) across multiple countries. Fast, lightweight, and no browser or account required

Farhan Febrian Nauval

139

TikTok Shop Product Intelligence API

zentrafoundry/tiktok-shop-product-intelligence-api

Extract public TikTok Shop product, seller, price, rating, sold-count, and category signals for product research.

Zentra

TikTok Shop Product Scraper

oneary/tiktok-shop-scraper

Scrape TikTok Shop product listings: titles, prices, ratings, seller info, images, sold counts. Search by keyword or scrape direct product URLs.

Luan M.

TikTok Shop Product Analytics

datamagnet/tiktok-shop-product-analytics

Get a clean, analytics-enriched list of products from a TikTok Shop seller. Easily review sales performance, recent trends, and key product metrics in a simple format.

Datamagnet

a

tan_asp/danish-grocery-scraper

jens

Entity Deduplication Matcher

junipr/entity-deduplication-matcher

Fuzzy-match and deduplicate company, product, location, or entity rows into canonical records.

junipr

TikTok Shop Product Reviews

vistics/tiktok-shop-product-reviews

Scrape TikTok Shop product reviews. Extract star ratings, review text, verified purchase status, reviewer info, and review images from any TikTok Shop product URL. Bulk scraping supported