Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nvictus/pybbi
Python bindings to UCSC BigWig and BigBed library
https://github.com/nvictus/pybbi
bbi bigbed bigwig bioinformatics cython genomics kent numpy python
Last synced: 6 days ago
JSON representation
Python bindings to UCSC BigWig and BigBed library
- Host: GitHub
- URL: https://github.com/nvictus/pybbi
- Owner: nvictus
- License: mit
- Created: 2016-05-16T19:18:58.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-10-18T15:32:33.000Z (19 days ago)
- Last Synced: 2024-10-18T18:19:13.257Z (19 days ago)
- Topics: bbi, bigbed, bigwig, bioinformatics, cython, genomics, kent, numpy, python
- Language: C
- Homepage:
- Size: 34.4 MB
- Stars: 30
- Watchers: 8
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# pybbi #
![Build Status](https://github.com/nvictus/pybbi/actions/workflows/ci.yml/badge.svg)
[![DOI](https://zenodo.org/badge/58960207.svg)](https://zenodo.org/doi/10.5281/zenodo.10382980)Python interface to Jim Kent's Big Binary Indexed file (BBI) \[[1](#ref1)\] library from the [UCSC Genome Browser source tree](https://github.com/ucscGenomeBrowser/kent) using Cython.
This provides read-level access to local and remote bigWig and bigBed files but no write capabilitites. The main feature is fast retrieval of range queries into numpy arrays.
## Installation ##
Wheels for `pybbi` are available on PyPI for Python 3.8, 3.9, 3.10, 3.11 on Linux (x86_64 and aarch64) and Mac OSX (x86_64/Intel). Apple Silicon (arm64) wheels will be made available once M1 runners are available in GitHub Actions.
```
$ pip install pybbi
```## API ##
The `bbi.open` function returns a `BBIFile` object.
```
bbi.open(path) -> BBIFile
````path` can be a local file path (bigWig or bigBed) or a URL. `BBIFile` objects are context managers and can be used in a `with` statement to clean up resources without calling `BBIFile.close()`.
```python
>>> with bbi.open('bigWigExample.bw') as f:
... x = f.fetch('chr21', 1000000, 2000000, bins=40)
```### Introspection
```
BBIFile.is_bigwig -> bool
BBIFile.is_bigbed -> bool
BBIFile.chromsizes -> OrderedDict
BBIFile.zooms -> list
BBIFile.info -> dict
BBIFile.schema -> dict
BBIFile.read_autosql() -> str
```Note: `BBIFile.schema['dtypes']` provides numpy data types for the fields in a bigWig or bigBed (matched from the autoSql definition).
### Interval output
The actual interval records in a bigWig or bigBed can be retrieved as a pandas dataframe or as an iterator over records as tuples. The pandas output is parsed according to the file's schema.
```
BBIFile.fetch_intervals(chrom, start, end) -> pandas.DataFrame
BBIFile.fetch_intervals(chrom, start, end, iterator=True) -> interval iterator
```Summary bin records at each zoom level are also accessible.
```
BBIFile.fetch_summaries(chrom, start, end, zoom) -> pandas.DataFrame
```### Array output
Retrieve quantitative signal as an array. The signal of a bigWig file is obtained from its "value" field. The signal of a bigBed file is obtained from the genomic coverage of its intervals.
For a single range query:
```
BBIFile.fetch(chrom, start, end, [bins [, missing [, oob, [, summary]]]]) -> 1D numpy array
```To produce a stacked heatmap from a list of (1) equal-length intervals or (2) arbitrary-length intervals with `bins` specified:
```
BBIFile.stackup(chroms, starts, ends, [bins [, missing [, oob, [, summary]]]]) -> 2D numpy array
```* **Summary** querying is supported by specifying the number of `bins` for coarsening. The `summary` statistic can be one of: 'mean', 'min', 'max', 'cov', 'std', 'or 'sum'. (default = 'mean'). Intervals need not have the same length, in which case the data from each interval will be interpolated to the same number of bins (e.g., gene bodies).
* **Missing** data can be filled with a custom fill value, `missing` (default = 0).
* **Out-of-bounds** ranges (i.e. `start` less than zero or `end` greater than the chromosome length) are permitted because of their utility e.g., for generating vertical heatmap stacks centered at specific genomic features. A separate custom fill value, `oob` can be provided for out-of-bounds positions (default = NaN).
### Function API
The original function-based API is still available:
```python
bbi.is_bbi(path: str) -> bool
bbi.is_bigwig(path: str) -> bool
bbi.is_bigbed(path:str) -> bool
bbi.chromsizes(path: str) -> OrderedDict
bbi.zooms(path: str) -> list
bbi.info(path: str) -> dict
bbi.fetch_intervals(path: str, chrom: str, start: int, end: int, iterator: bool) -> Union[Iterable, pd.DataFrame]
bbi.fetch(path: str, chrom: str, start: int, end: int, [bins: int [, missing: float [, oob: float, [, summary: str]]]]) -> np.array[1, 'float64']
bbi.stackup(path: str, chroms: np.array, starts: np.array, ends: np.array, [bins: int [, missing: float [, oob: float, [, summary: str]]]]) -> np.array[2, 'float64']
```See the docstrings for complete documentation.
## Related projects ##
- [libBigWig](https://github.com/dpryan79/libBigWig): Alternative C library for bigWig and bigBed files by Devon Ryan
- [pyBigWig](https://github.com/dpryan79/pyBigWig): Python bindings for `libBigWig` by the same author
- [bw-python](https://github.com/brentp/bw-python): Alternative Python wrapper to `libBigWig` by Brent Pederson
- [bx-python](https://github.com/bxlab/bx-python): Python bioinformatics library from James Taylor's group that includes tools for bbi files.This library provides bindings to the reference UCSC bbi library code. Check out [@dpryan79](https://github.com/dpryan79)'s [libBigWig](https://github.com/dpryan79/libBigWig) for an alternative and dedicated C library for big binary files. pyBigWig also provides numpy-based retrieval and bigBed support.
## References ##
[1]: http://bioinformatics.oxfordjournals.org/content/26/17/2204.full
## From source ##
If wheels for your platform or Python version aren't available or you want to develop, you'll need to install `pybbi` from source. The source distribution on PyPI ships with (slightly modified) kent utils source, which will compile before the extension module is built.
Requires
- Platform: Linux or Darwin (Windows Subsystem for Linux seems to work too)
- pthreads, zlib, libpng, openssl, make, pkg-config
- Python 3.6+
- `numpy` and `cython`For example, on a fresh Ubuntu instance, you'll need `build-essential`, `make`, `pkg-config`, `zlib1g-dev`, `libssl-dev`, `libpng16-dev`.
On a Centos/RedHat (rpm) system you'll need `gcc`, `make`, `pkg-config`, `zlib-devel`, `openssl-devel`, `libpng-devel`.
On a Mac, you'll need Xcode and to `brew install pkg-config openssl libpng`.
For development, clone the repo and install in editable mode:
```
$ git clone https://github.com/nvictus/pybbi.git
$ cd pybbi
$ pip install -e .
```You can use the `ARCH` environment variable to specify a target architecture or `ARCHFLAGS` on a Mac.
### Notes
Unfortunately, Kent's C source is not well-behaved library code, as it is littered with error calls that call `exit()`. `pybbi` will catch and pre-empt common input errors, but if somehow an internal error does get raised, it will terminate your interpreter instance.