https://github.com/creatingnull/squash-pickle
https://github.com/creatingnull/squash-pickle
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/creatingnull/squash-pickle
- Owner: CreatingNull
- License: mit
- Created: 2024-09-21T22:54:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-25T20:16:47.000Z (about 1 year ago)
- Last Synced: 2024-11-29T03:08:21.364Z (about 1 year ago)
- Language: Python
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.rst
- License: LICENSE.md
Awesome Lists containing this project
README
#  Squash Pickle
[](https://github.com/CreatingNull/Squash-Pickle/blob/master/LICENSE.md)
[](https://pypi.org/project/squashpickle/)
[](https://pypi.org/project/squashpickle/#history)
[](https://results.pre-commit.ci/latest/github/CreatingNull/Squash-Pickle/main)
[](https://github.com/CreatingNull/squash-pickle/actions/workflows/run-tests.yaml)
Like a pickle, only smaller\*.
Tiny python package that compresses your pickles using gzip.
Quacks like a pickle.
\* For small objects (< 100 bytes) gzip overhead can end up increasing size.
Only squash your pickles when you are working with big objects.
______________________________________________________________________
## Getting Started
First install the package, this has no additional dependencies:
```shell
pip install squashpickle
```
Then simply replace your `pickle` calls with `sqaushpickle` ones.
`squashpickle` implements, `dump`, `dumps`, `load`, and `loads` functions.
______________________________________________________________________
## Performance
The GZIP compression can have a **HUGE** impact on large objects.
Say you are pickling something like a polars / pandas dataframe, these pickles may end up being hundreds of MBs.
With squashpickle can get compression ratios exceeding 10x.
For example if we load a large dataframe of australian weather data.
Using pickle this object serialises to `37794198` bytes (~37.8MB).
Dumping the same dataframe with `squashpickle` results in `3370363` bytes (~3.4MB), around 9% of the overall file.
```python
import polars as pl
import pickle
import squashpickle
df = pl.read_csv(r"C:\temp\weatherAUS.csv", null_values=["NA"])
print(len(pickle.dumps(df)), len(squashpickle.dumps(df)))
```
As with any compression, there is a performance cost to achieving the smaller files.
For objects \<1MB this is hardly noticeable, but for objects hundreds of MBs the delay can be significant.
It'll depend on your use case if this is a worthwhile tradeoff.