Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arq5x/grabix

a wee tool for random access into BGZF files.
https://github.com/arq5x/grabix

Last synced: 3 months ago
JSON representation

a wee tool for random access into BGZF files.

Awesome Lists containing this project

README

        

grabix - a wee tool for random access into BGZF files.
======================================================

``grabix`` leverages the fantastic BGZF library in ``samtools`` to provide random access into
text files that have been compressed with ``bgzip``. ``grabix`` creates it's own index (.gbi)
of the bgzipped file. Once indexed, one can extract arbitrary lines from the file with the ``grab`` command. Or choose random lines with the, well, ``random`` command.

There's a ton of room for improvement, but I needed something quickly in support of a side project.

Here's a brief example using the ``simrep.chr1.bed`` file provided in the repository.

# 1. compress the file with bgzip
bgzip simrep.chr1.bed

# 2. create a grabix index of the file.
# creates simrep.chr1.bed.gbi
grabix index simrep.chr1.bed.gz

# 3. now, extract the 100th line in the file.
grabix grab simrep.chr1.bed.gz 100
chr1 401285 401444 trf 218

# 4. extract the 100th through 110th lines in the file.
grabix grab simrep.chr1.bed.gz 100 110
chr1 401285 401444 trf 218
chr1 401573 401748 trf 280
chr1 404661 404707 trf 92
chr1 406202 406274 trf 76
chr1 406227 406286 trf 77
chr1 406776 406819 trf 68
chr1 409821 409866 trf 51
chr1 409865 409900 trf 52
chr1 421245 421285 trf 64
chr1 422395 422435 trf 80
chr1 422560 422588 trf 56

You can also use ``grabix`` to extract random lines from the file

# extract 10 randome lines from the file using reservoir sampling
grabix random simrep.chr1.bed.gz 10

Is a gzipped file bgzipped?

grabix check simrep.chr1.bed.gz