#106 datashard: Add gzip compression as default for parquet files

closed medium Created 2025-11-27 00:01 · Updated 2025-11-27 00:12

Description

Edit
DataShard library (../datashard/) should use gzip compression by default when writing parquet files. This will reduce storage costs and improve I/O performance for task_logs and workflow_logs tables. Location: ~/develop/datashard/ use python `pip install lz4` for that. it must be it's main dependency. the implementaion should be transparent and users shouldn't even notice it. backward compatiblity is not needed, so you can delete the already available data and test it (also snapshots and meta data will point to new data and just time travel will be broken, anyways you can delete the data). fix the tests in ~/develop/datashard/ as well to support this (again it should work as it's internally will be handled during read and write).

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...