Pricing

Pay per usage

Go to Store

AI Product Matcher

Try for free

Developed by

Matěj Sochor

Match products across multiple e-commerce websites. Use this AI product matching Actor whenever you need to find matching pairs of products from different online shops for dynamic pricing, competitor analysis or market research.

0.0 (0)

Pricing

Pay per usage

Total users

623

Monthly users

Runs succeeded

86%

Last modified

6 months ago

E-commerce

Open source

Back to issues Create new issue

TypeError: the JSON object must be str, bytes or bytearray, not float

Closed

beneficial_penguin opened this issue

Hi I tried the actor with 2 eshop datasets like : [{ "id": "GZ2244", "name": "Sample 60", "price": "5.90", "short_description": "Découvrez le blahblah.", "long_description": "", "specification": [ { "key": "brand", "value": "" }, { "key": "Degré", "value": "54" }, { "key": "Provenance", "value": "Mexique" }, { "key": "Volume", "value": "6" } ] }, {... }]

I configured the input attributes mapping with : { "eshop1": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" }, "eshop2": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" } }

and i got this error 2023-06-02T05:25:31.016Z Actor failed with an exception 2023-06-02T05:25:31.018Z multiprocessing.pool.RemoteTraceback: 2023-06-02T05:25:31.018Z """ 2023-06-02T05:25:31.019Z Traceback (most recent call last): 2023-06-02T05:25:31.020Z File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker 2023-06-02T05:25:31.021Z result = (True, func(*args, **kwds)) 2023-06-02T05:25:31.022Z File "/usr/local/lib/python3.9/multiproces... [trimmed]

Matěj Sochor (Equidem)

Hi Alex, I will check the issue you are having and fix it if possible tomorrow (Monday, 5.6.2023). Sorry for the long wait, the weekend got in the way :)

beneficial_penguin

Hi Matěj, Thanks ! I have investigated the issue, it seems that the datasets I use contain logs as they come from scraps using the standard cheerio scrapper. I mean that the parameters #error and #debug may cause the TypeError.

"#error": false, "#debug": { "requestId": "xxx", "url": "https://xxx", "loadedUrl": "https:xxx", "method": "GET", "retryCount": 0, "errorMessages": [], "statusCode": 200 }

To come to this conclusion, il modified the Cheerio Scraper Actor page function so that it returns a JSON containing exactly the same contents that are in the sample dataset of your Actor (meaning dataset GYVCj4hEeqnX3dJyu and OmzHV4VEByO4KohMF). The only thing that differs between the outputed dataset form the scraper and the sample datasets are these log keys-values.

Maybe adding to the Actor a "cleaner" to remove hidden fields would be an easy fix ?

Hope this helps :) I am so eager to try you Actor you know

Hope

Matěj Sochor (Equidem)

Hey Alex, the problem should be fixed now. Check on your side please and let me know if it still persists. It was indeed connected to the debug output, thanks for the help with debugging :)

beneficial_penguin

Hi thanks. I tried it with sample dataset on 2023-06-06 09:19 and it worked. But now It tried it again at 2023-06-06 13:29:15 with my dataset, error. And tried again using exactly the same input and dataset as i used at 9:10 and it failed with "TypeError: '<' not supported between instances of 'str' and 'int'"

2023-06-06T11:29:41.272Z File "/usr/src/app/main.py", line 16, in main 2023-06-06T11:29:41.273Z if max_items_to_process < 1: 2023-06-06T11:29:41.274Z TypeError: '<' not supported between instances of 'str' and 'int'

Here is the input I used { "dataset1_ids": [ "DNOqtHmubZf5KiHSc" ], "dataset2_ids": [ "6PGDkddVGhybuJpgv" ], "input_mapping": { "eshop1": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] }, "eshop2": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] } }, "output_mapping": { "eshop1": { "id_source": "url", "name_source": "name" }, "eshop2... [trimmed]

beneficial_penguin

First run succeeded : https://console.apify.com/view/runs/gB5PHNWeTkfwSDnWk

beneficial_penguin

Same input but failed https://console.apify.com/view/runs/7W6ZGqn0247q9D0d0

Matěj Sochor (Equidem)

Hey Alex, this issue should now be fixed as well.

By the way, I noticed that you only put "sku" into the "code" input. In general, I would advise against that. Since SKUs are very often different for the same products in different online shops, it is better to use codes/ids that are more universal, such as the "productModel" in the sample datasets (and very often, these codes are indeed called "Product model" or "Product number" on real online shops as well). Since the current model takes the codes very seriously, putting codes that are always gonna be different such as SKUs there is counterproductive, as can be seen by the results you get from your input. I am currently developing a model that doesn't use the codes for cases when no good identifiers are available, it should be out in a few days. If you wish to be alerted, let me know and I will write to you here when it's finished.

Best regards, Matěj

beneficial_penguin

Hi thanks Matěj The main problem I have with the datasets I work : they don't have ean, nor gtin or any matching code. I wanted to try your actor and see how precise it can be with this context. But it seems that I can't make it work due to the format of my dataset. I tried the actor with the sample datasets and it worked well. But when I change the keys to match my datasets, I get errors again. Would you mind to give a look at the task run : https://console.apify.com/view/runs/CyQSKirHPgU4UKN4L ?

Matěj Sochor (Equidem)

Hey Alex, the issue with your input was that sometimes there were items in specification that only contained key, but no value. Since there is no real reason for us not to accept such an input, I patched it so you should be able to run your input with no problem. Don't use the results to determine accuracy though, using "name" as a code will heavily impact it. I am finishing up on the no-codes model, so I will let you know as soon as possible when it's deployed.

Best, Matěj

beneficial_penguin

Thanks Matej The thing is that for the products need to track the EAN code is consistant when it is available. So if I have the Ean code, I can use it as is to match product and I don’t need IA :) As you may see the brand and spec of a product can be enough to narrow the list to match so that the description and name can be used to finally close the gap. I will wait for your new version :)

Matěj Sochor (Equidem)

Hi Alex, sorry for the delay, vacations got in the way. The version requiring no codes has been uploaded now, so feel free to try it and see if it suits your needs.

Best, Matěj

beneficial_penguin

Hi ! Juste git de newsletter from Apify ! Thanks i will look info it asap

Add comment

E-Commerce Scraper

iglu/e-commerce-scraper

E-Commerce Scraper API employs AI-powered technologies and eliminates the hassle of data collection. Quickly scrape Amazon, eBay, GameStop, Western Digital, and tons of other e-commerce.

IGLU

614

1.0

Scrape product data from any e-commerce site with a dataLayer

eloquent_mountain/scrape-product-data-from-any-e-commerce-site

Scrapes e-commerce product data from any (e-commerce) website that has a dataLayer object (mostly used in google analytics implementations). It returns all product data in multiple data formats. Also available as an API to integrate with your own or other products. Circumvents the Cookie wall.

Paco

269

Google Shopping Insights

epctex/google-shopping-scraper

Unlock valuable insights from Google Shopping with our Data Extractor. Get reviews, descriptions, prices, merchant details, and affiliation links. Export data in JSON, XML, CSV, Excel, and HTML formats with no limits!

epctex

3.6

PriceRunner Product Offers Scraper

m3web/pricerunner-product-offers-scraper

Get detailed retailer offers from PriceRunner product URLs. Great for pricing, competition tracking, and market analysis.

M3Web

Feefo Reviews Scraper

njoylab/feefo-reviews-scraper

A lightweight Apify actor that extracts customer reviews from Feefo.com including ratings, review text, and metadata. Offers flexible filtering options and delivers clean, structured JSON output for easy integration with your pipeline.

njoylab

5.0

Price Detector (Experimental)

equidem/price-detector-experimental

Matěj Sochor

✨ Google Shopping Apify

damilo/google-shopping-apify

⚡ Lightning-fast Google Shopping scraper that captures live product listings straight from Google’s Shopping tab. Extract prices, ratings, sellers, availability, images and more—fully localized by keyword, language, and country with automatic pagination. Ideal for e-commerce price monitoring...

Imad

5.0

PriceRunner Category Products Scraper

m3web/pricerunner-category-products-scraper

Extract product prices & data by category from Pricerunner. Great for retail analysis, market tracking & competitive insights.

M3Web

Free Google Shopping Scraper - Extract offers from any EAN/SKU

s-r/free-google-shopping-scraper---extract-offers-from-any-ean-sku

Grab all offers from all sellers of a Google Shopping EAN/SKU. Whether you're monitoring competitor prices, optimizing your pricing strategy, or tracking market trends, this scraper delivers the insights you need at scale.

Findify Best

gnyselcuk/findify-best

🔍 AI-powered e-commerce scraper that extracts detailed product data from any online store. Uses LLMs (Mistral/Gemini) for intelligent extraction, handles pagination, variants & CAPTCHAs. Perfect for price monitoring, market research & competitive analysis. #webscraping #ecommerce