Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/messa/yalzma

Yet another LZMA Python wrapper - this time with support for LZMA_SYNC_FLUSH
https://github.com/messa/yalzma

compression lzma lzma2 python python3 xz

Last synced: 4 days ago
JSON representation

Yet another LZMA Python wrapper - this time with support for LZMA_SYNC_FLUSH

Host: GitHub
URL: https://github.com/messa/yalzma
Owner: messa
License: mit
Created: 2018-12-15T11:12:10.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2018-12-16T15:58:47.000Z (about 6 years ago)
Last Synced: 2024-12-27T13:12:19.895Z (7 days ago)
Topics: compression, lzma, lzma2, python, python3, xz
Language: Python
Size: 22.5 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        Yet another LZMA Python wrapper

===============================

This time with support for LZMA_SYNC_FLUSH :tada:

Works directly with liblzma.so via [ctypes](https://docs.python.org/3/library/ctypes.html).

No other dependencies.

What is LZMA?

-------------

LZMA is a compression algorithm - like gzip or bzip2.

The default configuration (of both Python [lzma module](https://docs.python.org/3/library/lzma.html) and this library)

is to use the LZMA2 filter and XZ container format.

So you can directly save the compressed data in a file with the `.xz` suffix and it will work with many other

programs, utilities and systems. For example `xzcat` or `xzgrep`.

Wikipedia: [Lempel–Ziv–Markov chain algorithm](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm)

Lot of software uses LZMA compression internally. For example many software package managers.

But why? Python standard library already contains lzma module…

--------------------------------------------------------------

Yes, but it does not support the SYNC FLUSH operation.

There is [`LZMACompressor.flush()`](https://docs.python.org/3/library/lzma.html#lzma.LZMACompressor.flush)

but it does something different - it

[finishes the compression process](https://github.com/python/cpython/blob/0353b4eaaf451ad463ce7eb3074f6b62d332f401/Modules/_lzmamodule.c#L568)

and closes the compressor.

It is not possible to compress more data after `flush()`.

For some of my use cases I need to use "sync flush". The constant `LZMA_SYNC_FLUSH`

[does not even appear in the CPython source code](https://github.com/python/cpython/search?q=LZMA_SYNC_FLUSH&unscoped_q=LZMA_SYNC_FLUSH).

What is LZMA_SYNC_FLUSH?

------------------------

From `lzma/base.h` (by Lasse Collin, public domain):

```c

        LZMA_SYNC_FLUSH = 1,

                /**<

                 * \brief       Make all the input available at output

                 *

                 * Normally the encoder introduces some latency.

                 * LZMA_SYNC_FLUSH forces all the buffered data to be

                 * available at output without resetting the internal

                 * state of the encoder. This way it is possible to use

                 * compressed stream for example for communication over

                 * network.

                 *

                 * Only some filters support LZMA_SYNC_FLUSH. Trying to use

                 * LZMA_SYNC_FLUSH with filters that don't support it will

                 * make lzma_code() return LZMA_OPTIONS_ERROR. For example,

                 * LZMA1 doesn't support LZMA_SYNC_FLUSH but LZMA2 does.

                 *

                 * Using LZMA_SYNC_FLUSH very often can dramatically reduce

                 * the compression ratio. With some filters (for example,

                 * LZMA2), fine-tuning the compression options may help

                 * mitigate this problem significantly (for example,

                 * match finder with LZMA2).

                 *

                 * Decoders don't support LZMA_SYNC_FLUSH.

                 */

```

Installation

------------

Tested on [Debian Linux](https://www.debian.org) and macOS.

You need to have liblzma installed - that means there should be a file

`liblzma.so` (for Linux) or `liblzma.dylib` (for macOS) somewhere in a library

directory (`/usr/lib` or similar). Usually it is already installed.

Install yalzma from current Github master:

```sh

$ pip install git+https://github.com/messa/yalzma

```

Install specific version:

```sh

$ pip install git+https://github.com/messa/[email protected]

```

Or add this line to your `requirements.txt`:

```

git+https://github.com/messa/[email protected]#egg=yalzma==0.0.4

```

Usage

-----

```python

from yalzma import LZMAEncoder

import lzma

text = b'Hello, World!'

enc = LZMAEncoder()

xz_data = enc.run(text)

xz_data += enc.finish()

assert lzma.decompress(xz_data) == text

```

Demonstration of the flush functionality:

```python

from io import BytesIO

enc = LZMAEncoder()

xz_data = enc.run(b'first line\n')

xz_data += enc.sync_flush()

assert lzma.open(BytesIO(xz_data), mode='rb').readline() == b'first line\n'

xz_data += enc.run(b'second line\n')

xz_data += enc.finish()

assert lzma.decompress(xz_data) == b'first line\nsecond line\n'

```