AI Product Matcher avatar
AI Product Matcher
Try for free

No credit card required

View all Actors
AI Product Matcher

AI Product Matcher

equidem/ai-product-matcher
Try for free

No credit card required

Match products across multiple e-commerce websites. Use this AI product matching Actor whenever you need to find matching pairs of products from different online shops for dynamic pricing, competitor analysis or market research.

User avatar

TypeError: the JSON object must be str, bytes or bytearray, not float

Closed

beneficial_penguin opened this issue
a year ago

Hi I tried the actor with 2 eshop datasets like : [{ "id": "GZ2244", "name": "Sample 60", "price": "5.90", "short_description": "Découvrez le blahblah.", "long_description": "", "specification": [ { "key": "brand", "value": "" }, { "key": "Degré", "value": "54" }, { "key": "Provenance", "value": "Mexique" }, { "key": "Volume", "value": "6" } ] }, {... }]

I configured the input attributes mapping with : { "eshop1": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" }, "eshop2": { "id": "id", "name": "name", "price": "price", "short_description": "short_description", "long_description": "long_description", "specification": "specification", "code": "code" } }

and i got this error 2023-06-02T05:25:31.016Z Actor failed with an exception 2023-06-02T05:25:31.018Z multiprocessing.pool.RemoteTraceback: 2023-06-02T05:25:31.018Z """ 2023-06-02T05:25:31.019Z Traceback (most recent call last): 2023-06-02T05:25:31.020Z File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker 2023-06-02T05:25:31.021Z result = (True, func(*args, **kwds)) 2023-06-02T05:25:31.022Z File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar 2023-06-02T05:25:31.023Z return list(map(*args)) 2023-06-02T05:25:31.024Z File "/usr/src/app/actors/executor/product_mapping_engine/scripts/dataset_handler/preprocessing/texts/text_preprocessing.py", line 245, in multi_run_text_preprocessing_wrapper 2023-06-02T05:25:31.024Z return preprocess_textual_data(*args) 2023-06-02T05:25:31.025Z File "/usr/src/app/actors/executor/product_mapping_engine/scripts/dataset_handler/preprocessing/texts/text_preprocessing.py", line 199, in preprocess_textual_data 2023-06-02T05:25:31.026Z dataset = parse_specifications_and_create_copies(dataset, 'specification') 2023-06-02T05:25:31.027Z File "/usr/src/app/actors/executor/product_mapping_engine/scripts/dataset_handler/preprocessing/texts/text_preprocessing.py", line 355, in parse_specifications_and_create_copies 2023-06-02T05:25:31.028Z dataset[specification_name] = parse_specifications(dataset[specification_name]) 2023-06-02T05:25:31.029Z File "/usr/src/app/actors/executor/product_mapping_engine/scripts/dataset_handler/preprocessing/texts/text_preprocessing.py", line 333, in parse_specifications 2023-06-02T05:25:31.029Z product_specification = json.loads(product_specification) 2023-06-02T05:25:31.030Z File "/usr/local/lib/python3.9/json/init.py", line 339, in loads 2023-06-02T05:25:31.031Z raise TypeError(f'the JSON object must be str, bytes or bytearray, ' 2023-06-02T05:25:31.036Z TypeError: the JSON object must be str, bytes or bytearray, not float 2023-06-02T05:25:31.037Z """

Can somebody help please ?

User avatar

Hi Alex, I will check the issue you are having and fix it if possible tomorrow (Monday, 5.6.2023). Sorry for the long wait, the weekend got in the way :)

User avatar

beneficial_penguin

a year ago

Hi Matěj, Thanks ! I have investigated the issue, it seems that the datasets I use contain logs as they come from scraps using the standard cheerio scrapper. I mean that the parameters #error and #debug may cause the TypeError.


"#error": false, "#debug": { "requestId": "xxx", "url": "https://xxx", "loadedUrl": "https:xxx", "method": "GET", "retryCount": 0, "errorMessages": [], "statusCode": 200 }


To come to this conclusion, il modified the Cheerio Scraper Actor page function so that it returns a JSON containing exactly the same contents that are in the sample dataset of your Actor (meaning dataset GYVCj4hEeqnX3dJyu and OmzHV4VEByO4KohMF). The only thing that differs between the outputed dataset form the scraper and the sample datasets are these log keys-values.

Maybe adding to the Actor a "cleaner" to remove hidden fields would be an easy fix ?

Hope this helps :) I am so eager to try you Actor you know

Hope

User avatar

Hey Alex, the problem should be fixed now. Check on your side please and let me know if it still persists. It was indeed connected to the debug output, thanks for the help with debugging :)

User avatar

beneficial_penguin

a year ago

Hi thanks. I tried it with sample dataset on 2023-06-06 09:19 and it worked. But now It tried it again at 2023-06-06 13:29:15 with my dataset, error. And tried again using exactly the same input and dataset as i used at 9:10 and it failed with "TypeError: '<' not supported between instances of 'str' and 'int'"

2023-06-06T11:29:41.272Z File "/usr/src/app/main.py", line 16, in main 2023-06-06T11:29:41.273Z if max_items_to_process < 1: 2023-06-06T11:29:41.274Z TypeError: '<' not supported between instances of 'str' and 'int'

Here is the input I used { "dataset1_ids": [ "DNOqtHmubZf5KiHSc" ], "dataset2_ids": [ "6PGDkddVGhybuJpgv" ], "input_mapping": { "eshop1": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] }, "eshop2": { "id": "url", "name": "name", "price": "price", "short_description": "shortDescription", "long_description": "longDescription", "specification": "specification", "code": [ "sku" ] } }, "output_mapping": { "eshop1": { "id_source": "url", "name_source": "name" }, "eshop2": { "id_target": "sku", "name_target": "name" } }, "precision_recall": "precision" }

User avatar

beneficial_penguin

a year ago
User avatar

beneficial_penguin

a year ago
User avatar

Hey Alex, this issue should now be fixed as well.

By the way, I noticed that you only put "sku" into the "code" input. In general, I would advise against that. Since SKUs are very often different for the same products in different online shops, it is better to use codes/ids that are more universal, such as the "productModel" in the sample datasets (and very often, these codes are indeed called "Product model" or "Product number" on real online shops as well). Since the current model takes the codes very seriously, putting codes that are always gonna be different such as SKUs there is counterproductive, as can be seen by the results you get from your input. I am currently developing a model that doesn't use the codes for cases when no good identifiers are available, it should be out in a few days. If you wish to be alerted, let me know and I will write to you here when it's finished.

Best regards, Matěj

User avatar

beneficial_penguin

a year ago

Hi thanks Matěj The main problem I have with the datasets I work : they don't have ean, nor gtin or any matching code. I wanted to try your actor and see how precise it can be with this context. But it seems that I can't make it work due to the format of my dataset. I tried the actor with the sample datasets and it worked well. But when I change the keys to match my datasets, I get errors again. Would you mind to give a look at the task run : https://console.apify.com/view/runs/CyQSKirHPgU4UKN4L ?

User avatar

Hey Alex, the issue with your input was that sometimes there were items in specification that only contained key, but no value. Since there is no real reason for us not to accept such an input, I patched it so you should be able to run your input with no problem. Don't use the results to determine accuracy though, using "name" as a code will heavily impact it. I am finishing up on the no-codes model, so I will let you know as soon as possible when it's deployed.

Best, Matěj

User avatar

beneficial_penguin

a year ago

Thanks Matej The thing is that for the products need to track the EAN code is consistant when it is available. So if I have the Ean code, I can use it as is to match product and I don’t need IA :) As you may see the brand and spec of a product can be enough to narrow the list to match so that the description and name can be used to finally close the gap. I will wait for your new version :)

User avatar

Hi Alex, sorry for the delay, vacations got in the way. The version requiring no codes has been uploaded now, so feel free to try it and see if it suits your needs.

Best, Matěj

User avatar

beneficial_penguin

10 months ago

Hi ! Juste git de newsletter from Apify ! Thanks i will look info it asap

Developer
Maintained by Apify
Actor metrics
  • 37 monthly users
  • 47.7% runs succeeded
  • 15.4 days response time
  • Created in Apr 2023
  • Modified 4 months ago