Azure dataset uploader avatar

Azure dataset uploader

Try for free

No credit card required

View all Actors
Azure dataset uploader

Azure dataset uploader

mvolfik/azure-dataset-uploader
Try for free

No credit card required

The easiest way to upload datasets to Azure blob storage.

CS

Azure dataset uploader supporting xlsx or html

Closed

confident_socket opened this issue
13 days ago

Hey, I love your dataset uploader but for indexing in azure I need to use either html or xlxs so it would be amazing to get that supported as I wouldn't have to do the conversion

mvolfik avatar

hi, I will add HTML support today or tomorrow. There are some issues with how the platform handles downloads of large datasets, so I switched to downloading in batches, which made everything other than JSON complicated (and I didn't expect many people to use this, this Actor was made specifically for one customer), but looking at it now, some hacky HTML support shouldn't be too hard, if there's demand for it.

CS

confident_socket

13 days ago

Thank you, I’m just not experienced with Azure so your integration makes it super easy. I’m not planning for large amounts of data.

mvolfik avatar

v0.0.14 which supports HTML upload was just released

CS

confident_socket

9 days ago

Thank you Matēj. Would be actually possible to add support for xlxs as well if it's not too much work?

mvolfik avatar

To be honest, adding support for large XLSX datasets would be quite a bit more work than HTML - XLSX is a zip archive under the hood, so we would have to unzip the xlsx downloaded from Apify platform, extract from it the XML file which contains the data, then append rows to it, and then zip everything again and upload to Azure.

I guess I could add support for xlsx without streaming, just copying the whole thing to Azure at once. Hopefully this week

mvolfik avatar

Added simple xlsx support in version 0.0.15

Developer
Maintained by Community
Actor metrics
  • 3 monthly users
  • 1 star
  • 99.8% runs succeeded
  • 18 hours response time
  • Created in Jun 2023
  • Modified 2 days ago