E-commerce Product Matching Tool

Under maintenance

Pricing

Pay per event

Try for free

Go to Apify Store

E-commerce Product Matching Tool

Under maintenance

Try for free

Quickly find and rank matching products from two sources using intelligent similarity search. This actor works with pre-built product data to identify the best matches. Use it after uploading your dataset to the vector database with the Product Matching Vectorizer.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Tri⟁angle

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

25 days ago

Last modified

Product Vector Matcher - Apify Actor

Matches products between two vectorized indexes using FAISS similarity search. Loads pre-built indexes from Key-Value Stores and generates ranked matches based on cosine similarity.

Overview

This actor takes two FAISS indexes (created by the product-matching-vectorizer) and finds similar products between them. For each product in Index A, it searches for the most similar products in Index B and outputs ranked match results.

Key Features:

Fast FAISS-based similarity search
Configurable top-K results per product
Similarity threshold filtering
Streaming output with batch processing
Migration recovery with automatic checkpoint/resume
Real-time progress tracking with ETA
Detailed performance metrics

How It Works

Load Indexes: Loads FAISS indexes and metadata from two KV stores
Extract Vectors: Extracts vectors from Index A for searching
Match Products: For each product in A, finds top K similar products in B
Filter Results: Applies similarity threshold to identify matches
Save Output: Streams results to dataset in batches

Input Parameters

Required Parameters

kvStoreIdA (string): ID of the first Key-Value Store containing the vectorized products
- Example: "MDhkhfJXV2O3Ir7GE"
- Must contain index.faiss and metadata.json created by product-matching-vectorizer
kvStoreIdB (string): ID of the second Key-Value Store to match against
- Example: "aBcDeFgHiJkLmNoP"
- Must contain index.faiss and metadata.json created by product-matching-vectorizer

Optional Parameters

topK (integer, default: 5): Number of top matches to return per product
- Range: 1-100
- Example: 5 returns the 5 most similar products from Index B for each product in A
similarityThreshold (integer, default: 75): Minimum similarity score (0-100)
- Converted to 0-1 scale internally (75 = 0.75)
- Results are marked as matches if similarity >= threshold
- Example: 75 means 75% similarity or higher
maxItems (integer, optional): Limit number of products to process from Index A
- Useful for testing or partial runs
- If not specified, processes all products
matchesOnly (boolean, default: false): Save only matches above threshold
- false: Save all results (including non-matches) with is_match flag
- true: Save only results where similarity >= threshold

Input Example

{
  "kvStoreIdA": "MDhkhfJXV2O3Ir7GE",
  "kvStoreIdB": "aBcDeFgHiJkLmNoP",
  "topK": 5,
  "similarityThreshold": 75,
  "maxItems": null,
  "matchesOnly": false
}

Output Format

Results are saved to a dataset with the following structure:

{
  "product_a_id": "prod_123",
  "product_a_metadata": {
    "title": "Canvas Tote Bag",
    "brand": "EcoBrand",
    "price": 2499
  },
  "product_b_id": "prod_456",
  "product_b_metadata": {
    "title": "Eco Canvas Tote",
    "brand": "GreenGoods",
    "price": 2599
  },
  "similarity": 0.8234,
  "rank": 1,
  "is_match": true
}

Fields:

product_a_id: Product ID from Index A
product_a_metadata: Metadata from Index A (structure depends on vectorizer config)
product_b_id: Matched product ID from Index B
product_b_metadata: Metadata from Index B
similarity: Cosine similarity score (0-1, higher = more similar)
rank: Rank of this match (1 = best match, 2 = second best, etc.)
is_match: Boolean flag (true if similarity >= threshold)

Output Characteristics:

Each product from Index A generates up to topK result rows
Results are sorted by rank (best matches first)
If matchesOnly=true, only rows with is_match=true are saved
Metadata structure depends on the metadataMapping used in the vectorizer

Migration Recovery

The actor automatically handles Apify server migrations:

State Persistence: Progress is saved on PERSIST_STATE events
Checkpoint Resume: On restart, skips already processed products
No Data Loss: All saved matches are preserved across migrations

State Management:

State is stored in the default Key-Value Store as product-vector-matcher-state
State is automatically cleared on successful completion
Resume is automatic and requires no manual intervention

Usage Examples

Example 1: Basic Matching

Match products between two catalogs with default settings:

{
  "kvStoreIdA": "MDhkhfJXV2O3Ir7GE",
  "kvStoreIdB": "aBcDeFgHiJkLmNoP"
}

This returns the top 5 matches per product, with all results (matches and non-matches).

Example 2: High-Confidence Matches Only

Find only strong matches (80%+ similarity):

{
  "kvStoreIdA": "MDhkhfJXV2O3Ir7GE",
  "kvStoreIdB": "aBcDeFgHiJkLmNoP",
  "topK": 3,
  "similarityThreshold": 80,
  "matchesOnly": true
}

This returns up to 3 matches per product, but only saves results with 80%+ similarity.

Example 3: Testing with Limited Products

Test matching on a small subset:

{
  "kvStoreIdA": "MDhkhfJXV2O3Ir7GE",
  "kvStoreIdB": "aBcDeFgHiJkLmNoP",
  "topK": 5,
  "similarityThreshold": 75,
  "maxItems": 100
}

This processes only the first 100 products from Index A.

Example 4: Comprehensive Search

Find many potential matches per product:

{
  "kvStoreIdA": "MDhkhfJXV2O3Ir7GE",
  "kvStoreIdB": "aBcDeFgHiJkLmNoP",
  "topK": 20,
  "similarityThreshold": 60,
  "matchesOnly": false
}

This returns up to 20 results per product with a lower threshold (60%).

Understanding Similarity Scores

The actor uses cosine similarity between L2-normalized embeddings:

1.0: Perfect match (identical vectors)
0.9-1.0: Extremely similar (likely same product or very close variants)
0.8-0.9: Very similar (likely matching products with minor differences)
0.7-0.8: Similar (related products, same category/brand)
0.6-0.7: Somewhat similar (shared characteristics)
< 0.6: Not very similar

Note: Optimal similarity thresholds can vary significantly depending on your dataset characteristics, product categories, and data quality. It's recommended to analyze a sample of results to determine the appropriate threshold for your specific use case.

Performance Monitoring

The actor logs detailed performance metrics:

Timing Breakdown:

Index A load time
Index B load time
Vector extraction time
Total matching time
Average time per search

Throughput:

Products processed per second
Total runtime

Memory:

Initial and final memory usage
Memory delta

Output:

Number of batches saved
Total matches found
Matches above threshold

Files

Core Actor Files

src/main.py - Main actor with matching logic and migration recovery
.actor/actor.json - Actor metadata
.actor/input_schema.json - Input parameter schema
.actor/output_schema.json - Output result schema
.actor/dataset_schema.json - Dataset structure schema
Dockerfile - Container definition
requirements.txt - Python dependencies

Deployment

Deploy to Apify:

$apify push

Or connect via Git repository in the Apify Console.

Troubleshooting

Error: "index.faiss not found in KV store"

Cause: The specified KV store ID doesn't exist or doesn't contain a vectorized index.

Solution:

Verify the KV store ID is correct
Ensure the product-matching-vectorizer has completed successfully
Check that the KV store contains both index.faiss and metadata.json

Error: "metadata.json not found in KV store"

Cause: The KV store exists but is missing the metadata file.

Solution: Re-run the product-matching-vectorizer to rebuild the index.

Memory Issues

Symptom: Out of memory errors during vector extraction or matching.

Solution:

Use maxItems to process products in chunks
Ensure adequate memory allocation in Actor settings
Consider using smaller indexes or upgrading Actor memory

Product Matching Vectorizer: Creates FAISS indexes from product datasets (required before using this actor)

Workflow Integration

This actor is typically used as the second step in a matching pipeline:

Run product-matching-vectorizer on Dataset A → produces Index A (KV Store)
Run product-matching-vectorizer on Dataset B → produces Index B (KV Store)
Run product-vector-matcher with both KV store IDs → produces match results (Dataset)

Product Matching Vectorizer

tri_angle/product-matching-vectorizer

Builds a FAISS vector database from products in an Apify dataset using an ONNX embedding model. The resulting index is saved to a Key-Value Store for fast similarity search. After uploading your dataset to the vector database, use our E-commerce Product Matching Tool to find matching products.

Tri⟁angle

AI Product Matcher

equidem/ai-product-matcher

Match products across multiple e-commerce websites. Use this AI product matching Actor whenever you need to find matching pairs of products from different online shops for dynamic pricing, competitor analysis or market research.

Matěj Sochor

684

E-commerce Scraping Tool

apify/e-commerce-scraping-tool

Scrape data from e-commerce websites with E-commerce Scraping Tool. Scrape almost any retail site in minutes, extract e-commerce data and use it to monitor price details over time or compare different e-commerce sites’ offerings.

Apify

4.2

Faire Product Details Scraper

tri_angle/faire-product-details-scraper

Use this scraper to collect data from the Faire marketplace. Extract detailed product information, including prices, descriptions, images, and in-stock availability. Download the data in multiple structured formats for easy analysis and integration.

Tri⟁angle

Truth Social Scraper

tri_angle/truth-scraper

Scrape profile info, truths and replies from the Truth social media platform.

Tri⟁angle

226

5.0

Sitemap Change Orchestrator

tri_angle/sitemap-change-orchestrator

Monitor website sitemaps for new, updated, or removed URLs. Integration with the Website Content Crawler (WCC) allows feeding only relevant URLs. This ensures your web crawls are efficient, targeted, and resource-optimized, keeping your datasets fresh for any application.

Tri⟁angle

Woolworths Reviews Scraper

tri_angle/woolworths-reviews-scraper

Scrape product reviews from Woolworths. This actor covers both Australia and New Zealand domains.

Tri⟁angle

Faire Product Scraper

powerai/faire-search-scraper

Scrape wholesale products from Faire.com with automatic pagination and comprehensive product, brand, and review data.

PowerAI

5.0

Truth Social Post Extractor

sandaliaapps/truthsocial-post-extractor

Easily extract and collect data from Truth Social posts using the Truth Social Posts Extractor Apify Actor. This powerful and efficient web scraping tool is designed to help you gather valuable insights from Truth Social quickly and seamlessly.

Sandalia Apps

YellowPages Australia Lead Generator

delicious_zebu/yellowpages-australia-lead-generator

Effortlessly scrape detailed business data from YellowPages.com.au by keyword, location, and filters like “Open Now” or “Popular.” Fast, flexible, and ideal for lead generation or market research.