https://github.com/daler/pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
https://github.com/daler/pybedtools

Last synced: about 1 year ago
JSON representation

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")

Host: GitHub
URL: https://github.com/daler/pybedtools
Owner: daler
License: other
Created: 2010-05-14T21:09:33.000Z (about 16 years ago)
Default Branch: master
Last Pushed: 2025-03-16T14:28:10.000Z (over 1 year ago)
Last Synced: 2025-04-11T21:13:01.260Z (over 1 year ago)
Language: Python
Homepage: http://daler.github.io/pybedtools
Size: 29 MB
Stars: 318
Watchers: 15
Forks: 106
Open Issues: 18
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt

Awesome Lists containing this project

Awesome-Bioinformatics - pyBedTools - Python wrapper for [bedtools](https://github.com/arq5x/bedtools). [ [paper-2011](https://pubmed.ncbi.nlm.nih.gov/21949271) | [web](http://daler.github.io/pybedtools) ] (Next Generation Sequencing / Python Modules)
awesome-bioinformatics - pyBedTools - Python wrapper for [bedtools](https://github.com/arq5x/bedtools). (Next Generation Sequencing / Python Modules)
awesome-python-fa - pybedtools - ابزارهای پایتونی برای تحلیل داده‌های ژنومی مبتنی بر BEDTools. (زیست شناسی و بیوتکنولوژی / کار با زمان و تقویم)

README

          
Overview

--------

.. image:: https://badge.fury.io/py/pybedtools.svg?style=flat

    :target: https://badge.fury.io/py/pybedtools

.. image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg

    :target: https://bioconda.github.io

The `BEDTools suite of programs `_ is widely

used for genomic interval manipulation or "genome algebra".  `pybedtools` wraps

and extends BEDTools and offers feature-level manipulations from within

Python.

See full online documentation, including installation instructions, at

https://daler.github.io/pybedtools/.

The GitHub repo is at https://github.com/daler/pybedtools.

Why `pybedtools`?

-----------------

Here is an example to get the names of genes that are <5 kb away from

intergenic SNPs:

.. code-block:: python

    from pybedtools import BedTool

    snps = BedTool('snps.bed.gz')  # [1]

    genes = BedTool('hg19.gff')    # [1]

    intergenic_snps = snps.subtract(genes)                       # [2]

    nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

    for gene in nearby:             # [4]

        if int(gene[-1]) < 5000:    # [4]

            print gene.name         # [4]

Useful features shown here include:

* `[1]` support for all BEDTools-supported formats (here gzipped BED and GFF)

* `[2]` wrapping of all BEDTools programs and arguments (here, `subtract` and `closest` and passing

  the `-d` flag to `closest`);

* `[3]` streaming results (like Unix pipes, here specified by `stream=True`)

* `[4]` iterating over results while accessing feature data by index or by attribute

  access (here `[-1]` and `.name`).

In contrast, here is the same analysis using shell scripting.  Note that this

requires knowledge in Perl, bash, and awk.  The run time is identical to the

`pybedtools` version above:

.. code-block:: bash

    snps=snps.bed.gz

    genes=hg19.gff

    intergenic_snps=/tmp/intergenic_snps

    snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`

    gene_fields=9

    distance_field=$(($gene_fields + $snp_fields + 1))

    intersectBed -a $snps -b $genes -v > $intergenic_snps

    closestBed -a $genes -b $intergenic_snps -d \

    | awk '($'$distance_field' < 5000){print $9;}' \

    | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

    rm $intergenic_snps

See the `Shell script comparison `_ in the docs

for more details on this comparison, or keep reading the full documentation at

http://daler.github.io/pybedtools.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/daler/pybedtools

Awesome Lists containing this project

README