https://github.com/biojulia/wahvectors.jl
Compress bit vectors using the Word Aligned Hybrid method.
https://github.com/biojulia/wahvectors.jl
Last synced: 5 months ago
JSON representation
Compress bit vectors using the Word Aligned Hybrid method.
- Host: GitHub
- URL: https://github.com/biojulia/wahvectors.jl
- Owner: BioJulia
- License: mit
- Created: 2016-09-30T12:13:31.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-02-27T15:39:19.000Z (over 7 years ago)
- Last Synced: 2025-03-20T21:26:35.359Z (about 1 year ago)
- Language: Julia
- Size: 542 KB
- Stars: 2
- Watchers: 9
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# WAHVectors.jl
## Description
Compress bit vectors using the Word Aligned Hybrid method.
**This package is currently (pre)alpha stage**
Although bit vectors can be compressed with standard run-length encoding (RLE),
If you are using bit vectors to represent some kinds of data, for example genotype data for biology,
bitwise logical operations require that the bits associated with variants be aligned.
This is difficult to ensure with RLE.
The Word Aligned Hybrid encoding strategy, represents run lengths in words rather than bits.
The difference between the RLE and WAH encoding strategies are explained by [R.M. Layer *et al.* (2016)](http://www.nature.com/nmeth/journal/v13/n1/full/nmeth.3654.html) as follows:
> RLE encodes stretches of identical val- ues (‘runs’) as a new value in which the first bit indicates the run value and the remaining bits give the number of bits in the run.
> WAH is similar to RLE, except that it uses two different types of values.
> The ‘fill’ type encodes runs of identical values, and the ‘literal’ type encodes uncompressed binary.
> This hybrid approach addresses an inefficiency in RLE in which short runs map to larger encoded values.
> The first bit in a WAH value indicates whether it is fill (1) or literal (0).
> For a fill value, the second bit gives the run value and the remaining bits give the run length in words (not bits, like in RLE).
> For a literal value, the remaining bits directly encode the uncompressed input.
> As each WAH-encoded value represents some number of words, and as bitwise logical operations are performed between words, these operations can be performed directly on compressed values.
## Badges
Get Help: [](https://gitter.im/BioJulia/WAHVectors.jl)
[](http://biojulia.github.io/WAHVectors.jl/latest/)
Code Quality: [](https://travis-ci.org/BioJulia/WAHVectors.jl)
[](https://ci.appveyor.com/project/Ward9250/wahvectors-jl)
[](http://codecov.io/github/BioJulia/WAHVectors.jl?branch=master)