https://github.com/hammerlab/bai-indexer

Build an index for your BAM Index (BAI)
https://github.com/hammerlab/bai-indexer

Last synced: about 1 year ago
JSON representation

Build an index for your BAM Index (BAI)

Host: GitHub
URL: https://github.com/hammerlab/bai-indexer
Owner: hammerlab
License: apache-2.0
Created: 2014-10-24T17:17:36.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2015-04-14T17:35:57.000Z (about 11 years ago)
Last Synced: 2025-04-16T09:27:33.779Z (about 1 year ago)
Language: Python
Size: 207 KB
Stars: 17
Watchers: 8
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          [![Build Status](https://travis-ci.org/hammerlab/bai-indexer.svg?branch=master)](https://travis-ci.org/hammerlab/bai-indexer)

bai-indexer

===========

Build an index for your BAM Index (BAI).

Background

----------

[BAM][1] is a common file format for storing aligned reads from a gene

sequencing machine. These files can get enormous (100+ GB), so it's helpful to

have an index to support fast lookup.

[Samtools][2] defines a file format for a BAM index and provides a simple

command for generating one:

```

samtools index file.bam file.bam.bai

```

Unfortunately, these BAM Index (BAI) files can _also_ grow very large, often to

10 MB or more. When using a genome browser like [IGV][3] or [BioDalliance][4],

loading a large BAI file over a slow network is the unavoidable first step in

displaying alignment tracks.

bai-indexer solves this problem by building an index of your BAM Index. This is

a small JSON file which maps reference ID (i.e. chromosome number) to a byte

range within the BAI file. By loading the BAM index, a viewer can load only the

small subset of the BAM index that it actually needs.

Usage

-----

    pip install bai-indexer

    bai-indexer path/to/file.bam.bai > path/to/file.bam.bai.json

Format

------

The JSON index index looks like this:

```json

{

  "chunks": [

    [8, 716520],

    [716520, 1463832],

    [1463832, 2070072],

    ...

  ],

  "minBlockIndex": 1234

}

```

The first chunk (`[8, 716520]`) specifies the byte range in the BAI file which

describes the first ref (most likely `chr1` for a human genome). This is a

half-open `[start, stop)` interval.

The `minBlockIndex` field specifies the position of the first block in the BAM

file. Everything before this position is headers.

Development

-----------

After setting up a virtualenv, you can get going by running:

```bash

pip install -r requirements.txt

nosetests

```

[1]: https://github.com/samtools/hts-specs

[2]: http://www.htslib.org/

[3]: http://www.broadinstitute.org/igv/

[4]: http://www.biodalliance.org/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hammerlab/bai-indexer

Awesome Lists containing this project

README