E-commerce Product Matching Tool avatar

E-commerce Product Matching Tool

Pricing

from $1.00 / 1,000 vector matching results

Go to Apify Store
E-commerce Product Matching Tool

E-commerce Product Matching Tool

Match products across e-commerce datasets with E-Commerce Product Matching Tool. Use it with E-commerce Scraping Tool datasets to automatically find identical and similar products and power price monitoring or catalog comparison.

Pricing

from $1.00 / 1,000 vector matching results

Rating

0.0

(0)

Developer

Tri⟁angle

Tri⟁angle

Maintained by Apify

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

18 hours ago

Last modified

Categories

Share

🛒 E-Commerce Product Matching Tool

Match and compare products across any two e-commerce datasets. Find identical, similar, and related products between your own catalog and any competitor - with optional AI validation for higher-confidence results.


🧠 What it does

The E-Commerce Product Matching Tool takes two product datasets and automatically finds which products in one match products in the other. It runs each dataset through a three-stage pipeline - converting products into comparable representations, scoring every possible pair for similarity, and optionally using an AI model to validate the results and explain its reasoning.

It is designed to work with datasets collected by the E-Commerce Scraping Tool.

The tool is useful for anyone who needs to reconcile, compare, or deduplicate product information across two sources - without manually reviewing thousands of rows.


⚙️ How it works

The tool runs your two datasets through a three-stage process:

🔢 Stage 1 - Vectorization Five fields from each product are extracted and converted into numerical vectors: title, brand, category, description, and specifications. These vectors are stored in a vector database, which makes it possible to compare thousands of products in seconds based on semantic meaning - not just exact text matches. This means products can be matched even when they use different wording or formatting across retailers.

📐 Stage 2 - Similarity matching Every product in Dataset A is compared against Dataset B and assigned a similarity score from 0 to 100. You can choose to include all evaluated pairs in the output, or filter to only the pairs that meet your similarity threshold.

🤖 Stage 3 - AI validation (optional) If you enable LLM matching, an AI model reviews each candidate pair and gives a final verdict: is this a genuine match? It also provides a reasoning explanation so you can understand why it made each decision. This stage runs only on the pairs that passed the similarity threshold, which keeps costs under control.

Dataset A + Dataset B
Vectorization
Similarity scoring ←── threshold filter (optional)
AI validation ←── enable with "Use LLM matching" (optional)
Output

🚀 Before you start

You need two Apify datasets containing product data. The easiest way to collect them is with the E-Commerce Scraping Tool, which lets you scrape product listings from Amazon, Walmart, eBay, and hundreds of other retailers in a single run.

Once you have your datasets, copy their dataset IDs from Apify Console and paste them into the input fields below.


⚙️ Input

Required

ParameterTypeDescription
datasetIdAstringDataset ID for your first product list (e.g. your own catalog)
datasetIdBstringDataset ID for your second product list (e.g. a competitor's catalog)

Options

ParameterTypeDefaultDescription
useLlmMatchingbooleanfalseRun AI validation on similarity candidates for higher-confidence results with reasoning explanations
vectorMatchesOnlybooleanfalseOnly include product pairs that meet the similarity threshold in the output. When disabled, all evaluated pairs are returned with their scores
maxOutputItemsnumberunlimitedStop processing after this many output items. Use this to cap cost on large datasets

Advanced options

ParameterTypeDefaultDescription
vectorSimilarityThresholdnumber (0-100)70Minimum similarity score for a pair to qualify as a match. Lower values return more results with more potential false positives; higher values return fewer, more precise results

📦 Output

Each output item represents one evaluated product pair. The output always includes the similarity assessment from Stage 2. When LLM matching is enabled, it also includes the AI verdict and reasoning from Stage 3.

📦 Output fields - similarity matching

FieldTypeDescription
productAobjectProduct data from Dataset A
productBobjectProduct data from Dataset B
similarityScorenumberSimilarity score from 0 to 100
is_matchbooleanWhether the pair meets the similarity threshold

Additional fields when LLM matching is enabled

FieldTypeDescription
llm_is_matchbooleanAI verdict: true if the model considers this a genuine product match
llm_reasoningstringThe AI model's explanation of its verdict
llm_relationshipstringThe AI model's classification of the relationship between the two products. Possible values: "same-product", "variant", "different-product"
llm_differencesarrayList of specific differences identified by the AI model between the two products. Empty array when products are identical or near-identical

Example output - similarity matching only

{
"productA": {
"title": "Apple AirPods Pro (2nd Generation)",
"price": 249,
"brand": "Apple",
"url": "https://www.amazon.com/..."
},
"productB": {
"title": "Apple AirPods Pro 2nd Gen - USB-C",
"price": 229,
"brand": "Apple",
"url": "https://www.walmart.com/..."
},
"similarityScore": 94,
"is_match": true
}

Example output - with LLM matching enabled

{
"productA": {
"title": "Apple AirPods Pro (2nd Generation)",
"price": 249,
"brand": "Apple",
"url": "https://www.amazon.com/..."
},
"productB": {
"title": "Apple AirPods Pro 2nd Gen - USB-C",
"price": 229,
"brand": "Apple",
"url": "https://www.walmart.com/..."
},
"similarityScore": 94,
"is_match": true,
"llm_is_match": true,
"llm_reasoning": "Both products are the Apple AirPods Pro 2nd generation. The title variation reflects the USB-C connector variant, which is the same product sold under a slightly different listing title. Brand, model generation, and key features are identical.",
"llm_relationship": "same-product",
"llm_differences": []
}

💼 Use cases

🏷️ Competitive price monitoring

Scrape your own product catalog and a competitor's catalog using the E-Commerce Scraping Tool, then run both datasets through this tool to find where the same products are priced differently. Schedule it to run weekly for ongoing price intelligence.

🗂️ Catalog deduplication

If you manage product feeds from multiple suppliers, run any two feeds through the tool to identify duplicate or near-duplicate listings before merging them into your master catalog.

🛍️ Marketplace comparison

Compare your Amazon listings against your Walmart listings to find products that exist in one place but not the other, or that have mismatched titles, prices, or descriptions across platforms.

🔄 Product feed alignment

Reconcile an internal product database against an external feed (a distributor, a retailer, or a data provider) to verify coverage and spot discrepancies.


💰 Pricing

The tool uses a pay-per-event pricing model - you are charged based on the number of product pairs processed, not for the run itself.

Controlling costs

  • Set maxOutputItems to cap the number of pairs processed in a single run. The tool stops as soon as the limit is reached, so your cost is fully bounded.
  • Use vectorMatchesOnly: true to filter early - only pairs that pass the similarity threshold proceed to output (and to LLM validation if enabled), which reduces cost on datasets with low match rates.
  • LLM matching adds cost per validated item. Disable it if the similarity score alone gives you sufficient signal for your use case.
  • Run a small test with a sample of each dataset to calibrate your similarity threshold before processing the full dataset.

🔗 API integration

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: '<YOUR_API_TOKEN>',
});
const input = {
datasetIdA: '<YOUR_FIRST_DATASET_ID>',
datasetIdB: '<YOUR_SECOND_DATASET_ID>',
useLlmMatching: true,
vectorMatchesOnly: true,
vectorSimilarityThreshold: 70,
maxOutputItems: 1000,
};
const run = await client.actor('tri_angle/e-commerce-product-matching-tool').call(input);
console.log(`Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);

Python

from apify_client import ApifyClient
client = ApifyClient('<YOUR_API_TOKEN>')
run_input = {
'datasetIdA': '<YOUR_FIRST_DATASET_ID>',
'datasetIdB': '<YOUR_SECOND_DATASET_ID>',
'useLlmMatching': True,
'vectorMatchesOnly': True,
'vectorSimilarityThreshold': 70,
'maxOutputItems': 1000,
}
run = client.actor('tri_angle/e-commerce-product-matching-tool').call(run_input=run_input)
print('Check your data here: https://console.apify.com/storage/datasets/' + run['defaultDatasetId'])

CLI

echo '{
"datasetIdA": "<YOUR_FIRST_DATASET_ID>",
"datasetIdB": "<YOUR_SECOND_DATASET_ID>",
"useLlmMatching": true,
"vectorMatchesOnly": true,
"vectorSimilarityThreshold": 70,
"maxOutputItems": 1000
}' |
apify call tri_angle/e-commerce-product-matching-tool --input-file - --silent --output-dataset

🚀 Getting started

  1. Collect two product datasets - use the E-Commerce Scraping Tool or any Apify scraper that returns product data
  2. Find each dataset's ID in Apify Console under Storage > Datasets
  3. Open the E-Commerce Product Matching Tool and paste both dataset IDs into the input
  4. Choose your options: enable LLM matching for higher confidence, or keep it off for faster, lower-cost results
  5. Click Start and wait for results - the run time depends on dataset size and whether LLM matching is enabled
  6. Download your output as JSON, CSV, or Excel, or connect it to your data pipeline via the API

❓ FAQ

Do I have to use E-Commerce Scraping Tool to collect the data? No. Any Apify dataset that contains product data works as input. E-Commerce Scraping Tool output is natively compatible, but you can use data from any source as long as it's stored in an Apify dataset.

How accurate is the matching? Similarity matching works well for products with consistent names, brands, or standard identifiers (like EAN or UPC). For products with ambiguous or highly variable descriptions, enable LLM matching - the AI model reads the full product context and provides a verdict with reasoning, which significantly improves accuracy.

What similarity threshold should I use? The default of 70 is a good starting point for most cases. Lower it (e.g. 50-60) if you want more results and are willing to review some false positives. Raise it (e.g. 85-90) if you want only very high-confidence matches. Test with a small dataset sample first.

Can I match more than two datasets at once? Not in a single run. To compare three datasets, run the tool twice: A vs. B, then B vs. C (or A vs. C). Each run produces a separate output dataset.

How do I control costs on large datasets? Set maxOutputItems to a number that fits your budget. The tool stops processing as soon as that limit is reached, so your cost is fully bounded. You can also use vectorMatchesOnly: true to skip outputting low-similarity pairs, which reduces the number of items LLM matching needs to process.

Can I schedule this to run automatically? Yes. Use Apify's built-in scheduler to run the tool on a recurring basis - daily, weekly, or at any custom interval. Combine it with the E-Commerce Scraping Tool on a matching schedule to keep your match data up to date automatically.

What export formats are available? Output is available as JSON, CSV, Excel, XML, and HTML. You can also connect directly to the output dataset via the Apify API or integrate with tools like Google Sheets, Zapier, n8n, and others.