Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/messa/yalzma
Yet another LZMA Python wrapper - this time with support for LZMA_SYNC_FLUSH
https://github.com/messa/yalzma
compression lzma lzma2 python python3 xz
Last synced: 4 days ago
JSON representation
Yet another LZMA Python wrapper - this time with support for LZMA_SYNC_FLUSH
- Host: GitHub
- URL: https://github.com/messa/yalzma
- Owner: messa
- License: mit
- Created: 2018-12-15T11:12:10.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-12-16T15:58:47.000Z (about 6 years ago)
- Last Synced: 2024-12-27T13:12:19.895Z (7 days ago)
- Topics: compression, lzma, lzma2, python, python3, xz
- Language: Python
- Size: 22.5 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Yet another LZMA Python wrapper
===============================This time with support for LZMA_SYNC_FLUSH :tada:
Works directly with liblzma.so via [ctypes](https://docs.python.org/3/library/ctypes.html).
No other dependencies.What is LZMA?
-------------LZMA is a compression algorithm - like gzip or bzip2.
The default configuration (of both Python [lzma module](https://docs.python.org/3/library/lzma.html) and this library)
is to use the LZMA2 filter and XZ container format.
So you can directly save the compressed data in a file with the `.xz` suffix and it will work with many other
programs, utilities and systems. For example `xzcat` or `xzgrep`.Wikipedia: [Lempel–Ziv–Markov chain algorithm](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm)
Lot of software uses LZMA compression internally. For example many software package managers.
But why? Python standard library already contains lzma module…
--------------------------------------------------------------Yes, but it does not support the SYNC FLUSH operation.
There is [`LZMACompressor.flush()`](https://docs.python.org/3/library/lzma.html#lzma.LZMACompressor.flush)
but it does something different - it
[finishes the compression process](https://github.com/python/cpython/blob/0353b4eaaf451ad463ce7eb3074f6b62d332f401/Modules/_lzmamodule.c#L568)
and closes the compressor.
It is not possible to compress more data after `flush()`.For some of my use cases I need to use "sync flush". The constant `LZMA_SYNC_FLUSH`
[does not even appear in the CPython source code](https://github.com/python/cpython/search?q=LZMA_SYNC_FLUSH&unscoped_q=LZMA_SYNC_FLUSH).What is LZMA_SYNC_FLUSH?
------------------------From `lzma/base.h` (by Lasse Collin, public domain):
```c
LZMA_SYNC_FLUSH = 1,
/**<
* \brief Make all the input available at output
*
* Normally the encoder introduces some latency.
* LZMA_SYNC_FLUSH forces all the buffered data to be
* available at output without resetting the internal
* state of the encoder. This way it is possible to use
* compressed stream for example for communication over
* network.
*
* Only some filters support LZMA_SYNC_FLUSH. Trying to use
* LZMA_SYNC_FLUSH with filters that don't support it will
* make lzma_code() return LZMA_OPTIONS_ERROR. For example,
* LZMA1 doesn't support LZMA_SYNC_FLUSH but LZMA2 does.
*
* Using LZMA_SYNC_FLUSH very often can dramatically reduce
* the compression ratio. With some filters (for example,
* LZMA2), fine-tuning the compression options may help
* mitigate this problem significantly (for example,
* match finder with LZMA2).
*
* Decoders don't support LZMA_SYNC_FLUSH.
*/
```Installation
------------Tested on [Debian Linux](https://www.debian.org) and macOS.
You need to have liblzma installed - that means there should be a file
`liblzma.so` (for Linux) or `liblzma.dylib` (for macOS) somewhere in a library
directory (`/usr/lib` or similar). Usually it is already installed.Install yalzma from current Github master:
```sh
$ pip install git+https://github.com/messa/yalzma
```Install specific version:
```sh
$ pip install git+https://github.com/messa/[email protected]
```Or add this line to your `requirements.txt`:
```
git+https://github.com/messa/[email protected]#egg=yalzma==0.0.4
```Usage
-----```python
from yalzma import LZMAEncoder
import lzmatext = b'Hello, World!'
enc = LZMAEncoder()
xz_data = enc.run(text)
xz_data += enc.finish()
assert lzma.decompress(xz_data) == text
```Demonstration of the flush functionality:
```python
from io import BytesIOenc = LZMAEncoder()
xz_data = enc.run(b'first line\n')
xz_data += enc.sync_flush()
assert lzma.open(BytesIO(xz_data), mode='rb').readline() == b'first line\n'xz_data += enc.run(b'second line\n')
xz_data += enc.finish()
assert lzma.decompress(xz_data) == b'first line\nsecond line\n'
```