https://github.com/open2c/bioframe
Genomic interval operations on Pandas DataFrames
https://github.com/open2c/bioframe
bioinformatics dataframes genomic-intervals genomic-ranges genomics ngs-analysis numpy pandas python spatial-join
Last synced: 15 days ago
JSON representation
Genomic interval operations on Pandas DataFrames
- Host: GitHub
- URL: https://github.com/open2c/bioframe
- Owner: open2c
- License: mit
- Created: 2016-10-03T19:09:54.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2025-03-24T16:29:30.000Z (28 days ago)
- Last Synced: 2025-03-31T02:14:31.081Z (22 days ago)
- Topics: bioinformatics, dataframes, genomic-intervals, genomic-ranges, genomics, ngs-analysis, numpy, pandas, python, spatial-join
- Language: Python
- Homepage:
- Size: 3.1 MB
- Stars: 182
- Watchers: 10
- Forks: 36
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Bioframe: Operations on Genomic Interval Dataframes

[](https://results.pre-commit.ci/latest/github/open2c/bioframe/main)
[](https://bioframe.readthedocs.io/en/latest/)
[](https://doi.org/10.1093/bioinformatics/btae088)
[](https://zenodo.org/badge/latestdoi/69901992)
[](https://bit.ly/open2c-slack)
[](https://www.numfocus.org)Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.
Bioframe is built directly on top of [Pandas](https://pandas.pydata.org/). Bioframe provides:
* A variety of genomic interval operations that work directly on dataframes.
* Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
* Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.Read the [documentation](https://bioframe.readthedocs.io/en/latest/), including the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html), as well as the [publication](https://doi.org/10.1093/bioinformatics/btae088) for more information.
Bioframe is an Affiliated Project of [NumFOCUS](https://www.numfocus.org).
## Installation
Bioframe is available on [PyPI](https://pypi.org/project/bioframe/) and [bioconda](https://bioconda.github.io/recipes/bioframe/README.html):
```sh
pip install bioframe
```## Contributing
Interested in contributing to bioframe? That's great! To get started, check out the [contributing guide](https://github.com/open2c/bioframe/blob/main/CONTRIBUTING.md). Discussions about the project roadmap take place on the [Open2C Slack](https://bit.ly/open2c-slack) and regular developer meetings scheduled there. Anyone can join and participate!
## Interval operations
Key genomic interval operations in bioframe include:
- `overlap`: Find pairs of overlapping genomic intervals between two dataframes.
- `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe.
- `cluster`: Group overlapping intervals in a dataframe into clusters.
- `complement`: Find genomic intervals that are not covered by any interval from a dataframe.Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`, `select`, and `subtract`.
To `overlap` two dataframes, call:
```python
import bioframe as bfbf.overlap(df1, df2)
```For these two input dataframes, with intervals all on the same chromosome:
![]()
`overlap` will return the following interval pairs as overlaps:
![]()
To `merge` all overlapping intervals in a dataframe, call:
```python
import bioframe as bfbf.merge(df1)
```For this input dataframe, with intervals all on the same chromosome:
`merge` will return a new dataframe with these merged intervals:
See the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html) for visualizations of other interval operations in bioframe.
## File I/O
Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is `read_table` which mirrors pandas’s read_csv/read_table but provides a [`schema`](https://github.com/open2c/bioframe/blob/main/bioframe/io/schemas.py) argument to populate column names for common tabular file formats.
```python
jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
```## Tutorials
See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.## Citing
If you use ***bioframe*** in your work, please cite:
```bibtex
@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}
```