AI Product Matcher avatar

AI Product Matcher

Try for free

No credit card required

Go to Store
AI Product Matcher

AI Product Matcher

equidem/ai-product-matcher
Try for free

No credit card required

Match products across multiple e-commerce websites. Use this AI product matching Actor whenever you need to find matching pairs of products from different online shops for dynamic pricing, competitor analysis or market research.

Do you want to learn more about this Actor?

Get a demo
BP

TypeError: the JSON object must be str, bytes or bytearray, not float

Closed

beneficial_penguin opened this issue
2 years ago

Hi I tried the actor with 2 eshop datasets like : [{ "id": "GZ2244", "name": "Sample 60", "price": "5.90", "short_description": "Découvrez le blahblah.", "long_description": "", "specification": [ { "key": "brand", "value": "" }, { "key": "Degré", "value": "54" }, { "key": "Provenance", "value": "Mexique" }, { "key": "Volume", "value": "6" } ] }, {... }]

I configured the input attributes mapping with : { "eshop1": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" }, "eshop2": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" } }

and i got this error 2023-06-02T05:25:31.016Z Actor failed with an exception 2023-06-02T05:25:31.018Z multiprocessing.pool.RemoteTraceback: 2023-06-02T05:25:31.018Z """ 2023-06-02T05:25:31.019Z Traceback (most recent call last): 2023-06-02T05:25:31.020Z File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker 2023-06-02T05:25:31.021Z result = (True, func(*args, **kwds)) 2023-06-02T05:25:31.022Z File "/usr/local/lib/python3.9/multiproces... [trimmed]

Equidem avatar

Hi Alex, I will check the issue you are having and fix it if possible tomorrow (Monday, 5.6.2023). Sorry for the long wait, the weekend got in the way :)

BP

beneficial_penguin

2 years ago

Hi Matěj, Thanks ! I have investigated the issue, it seems that the datasets I use contain logs as they come from scraps using the standard cheerio scrapper. I mean that the parameters #error and #debug may cause the TypeError.


"#error": false, "#debug": { "requestId": "xxx", "url": "https://xxx", "loadedUrl": "https:xxx", "method": "GET", "retryCount": 0, "errorMessages": [], "statusCode": 200 }


To come to this conclusion, il modified the Cheerio Scraper Actor page function so that it returns a JSON containing exactly the same contents that are in the sample dataset of your Actor (meaning dataset GYVCj4hEeqnX3dJyu and OmzHV4VEByO4KohMF). The only thing that differs between the outputed dataset form the scraper and the sample datasets are these log keys-values.

Maybe adding to the Actor a "cleaner" to remove hidden fields would be an easy fix ?

Hope this helps :) I am so eager to try you Actor you know

Hope

Equidem avatar

Hey Alex, the problem should be fixed now. Check on your side please and let me know if it still persists. It was indeed connected to the debug output, thanks for the help with debugging :)

BP

beneficial_penguin

2 years ago

Hi thanks. I tried it with sample dataset on 2023-06-06 09:19 and it worked. But now It tried it again at 2023-06-06 13:29:15 with my dataset, error. And tried again using exactly the same input and dataset as i used at 9:10 and it failed with "TypeError: '<' not supported between instances of 'str' and 'int'"

2023-06-06T11:29:41.272Z File "/usr/src/app/main.py", line 16, in main 2023-06-06T11:29:41.273Z if max_items_to_process < 1: 2023-06-06T11:29:41.274Z TypeError: '<' not supported between instances of 'str' and 'int'

Here is the input I used { "dataset1_ids": [ "DNOqtHmubZf5KiHSc" ], "dataset2_ids": [ "6PGDkddVGhybuJpgv" ], "input_mapping": { "eshop1": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] }, "eshop2": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] } }, "output_mapping": { "eshop1": { "id_source": "url", "name_source": "name" }, "eshop2... [trimmed]

BP

beneficial_penguin

2 years ago
BP

beneficial_penguin

2 years ago
Equidem avatar

Hey Alex, this issue should now be fixed as well.

By the way, I noticed that you only put "sku" into the "code" input. In general, I would advise against that. Since SKUs are very often different for the same products in different online shops, it is better to use codes/ids that are more universal, such as the "productModel" in the sample datasets (and very often, these codes are indeed called "Product model" or "Product number" on real online shops as well). Since the current model takes the codes very seriously, putting codes that are always gonna be different such as SKUs there is counterproductive, as can be seen by the results you get from your input. I am currently developing a model that doesn't use the codes for cases when no good identifiers are available, it should be out in a few days. If you wish to be alerted, let me know and I will write to you here when it's finished.

Best regards, Matěj

BP

beneficial_penguin

2 years ago

Hi thanks Matěj The main problem I have with the datasets I work : they don't have ean, nor gtin or any matching code. I wanted to try your actor and see how precise it can be with this context. But it seems that I can't make it work due to the format of my dataset. I tried the actor with the sample datasets and it worked well. But when I change the keys to match my datasets, I get errors again. Would you mind to give a look at the task run : https://console.apify.com/view/runs/CyQSKirHPgU4UKN4L ?

Equidem avatar

Hey Alex, the issue with your input was that sometimes there were items in specification that only contained key, but no value. Since there is no real reason for us not to accept such an input, I patched it so you should be able to run your input with no problem. Don't use the results to determine accuracy though, using "name" as a code will heavily impact it. I am finishing up on the no-codes model, so I will let you know as soon as possible when it's deployed.

Best, Matěj

BP

beneficial_penguin

2 years ago

Thanks Matej The thing is that for the products need to track the EAN code is consistant when it is available. So if I have the Ean code, I can use it as is to match product and I don’t need IA :) As you may see the brand and spec of a product can be enough to narrow the list to match so that the description and name can be used to finally close the gap. I will wait for your new version :)

Equidem avatar

Hi Alex, sorry for the delay, vacations got in the way. The version requiring no codes has been uploaded now, so feel free to try it and see if it suits your needs.

Best, Matěj

BP

beneficial_penguin

2 years ago

Hi ! Juste git de newsletter from Apify ! Thanks i will look info it asap

Developer
Maintained by Apify

Actor Metrics

  • 26 monthly users

  • 10 stars

  • 85% runs succeeded

  • Created in Apr 2023

  • Modified 7 months ago