https://github.com/owkin/PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://github.com/owkin/PyDESeq2

bioinformatics differential-expression python rna-seq transcriptomics

Last synced: 5 months ago
JSON representation

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.

Host: GitHub
URL: https://github.com/owkin/PyDESeq2
Owner: owkin
License: mit
Created: 2022-11-22T17:14:11.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-10-29T10:49:43.000Z (6 months ago)
Last Synced: 2024-10-29T11:02:38.213Z (6 months ago)
Topics: bioinformatics, differential-expression, python, rna-seq, transcriptomics
Language: Python
Homepage: https://pydeseq2.readthedocs.io/en/latest/
Size: 1.32 MB
Stars: 584
Watchers: 11
Forks: 61
Open Issues: 44
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

top-life-sciences - **owkin/PyDESeq2** - seq DEA.<br>`bioinformatics`, `differential-expression`, `python`, `rna-seq`, `transcriptomics`<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 533 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 58 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> MIT License <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2024-06-06 01:43:52 | (Ranked by starred repositories)

README

        

#

[![pypi version](https://img.shields.io/pypi/v/pydeseq2)](https://pypi.org/project/pydeseq2)

[![pypiDownloads](https://static.pepy.tech/badge/pydeseq2)](https://pepy.tech/project/pydeseq2)

[![condaDownloads](https://img.shields.io/conda/dn/bioconda/pydeseq2?logo=Anaconda)](https://anaconda.org/bioconda/pydeseq2)

[![license](https://img.shields.io/pypi/l/pydeseq2)](LICENSE)

PyDESeq2 is a python implementation of the [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) 

method [1] for differential expression analysis (DEA) with bulk RNA-seq data, originally in R.

It aims to facilitate DEA experiments for python users.

As PyDESeq2 is a re-implementation of [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) from 

scratch, you may experience some differences in terms of retrieved values or available features.

Currently, available features broadly correspond to the default settings of DESeq2 (v1.34.0) for single-factor and 

multi-factor analysis (with categorical or continuous factors) using Wald tests.

We plan to implement more in the future.

In case there is a feature you would particularly like to be implemented, feel free to open an issue.

## Table of Contents

- [PyDESeq2](#pydeseq2)

  - [Table of Contents](#table-of-contents)

  - [Installation](#installation)

    - [Requirements](#requirements)

  - [Getting started](#getting-started)

    - [Documentation](#documentation)

    - [Data](#data)

  - [Contributing](#contributing)

    - [1 - Download the repository](#1---download-the-repository)

    - [2 - Create a conda environment](#2---create-a-conda-environment)

  - [Development roadmap](#development-roadmap)

  - [Citing this work](#citing-this-work)

  - [References](#references)

  - [License](#license)

## Installation

### PyPI

`PyDESeq2` can be installed from PyPI using `pip`:

`pip install pydeseq2`

We recommend installing within a conda environment:

```

conda create -n pydeseq2

conda activate pydeseq2

conda install pip

pip install pydeseq2

```

### Bioconda

`PyDESeq2` can also be installed from Bioconda with `conda`:

`conda install -c bioconda pydeseq2`

If you're interested in contributing or want access to the development version, please see the [contributing](#contributing) section.

### Requirements

The list of package version requirements is available in `setup.py`.

For reference, the code is being tested in a github workflow (CI) with python

3.9 to 3.11 and the following package versions:

```

- anndata 0.8.0

- numpy 1.23.0

- pandas 1.4.3

- scikit-learn 1.1.1

- scipy 1.11.0

```

Please don't hesitate to open an issue in case you encounter any issue due to possible deprecations.

## Getting started

The [Getting Started](https://pydeseq2.readthedocs.io/en/latest/auto_examples/index.html) section of the documentation

contains downloadable examples on how to use PyDESeq2.

### Documentation

The documentation is hosted [here on ReadTheDocs](https://pydeseq2.readthedocs.io/en/latest/). 

If you want to have the latest version of the documentation, you can build it from source.

Please go to the dedicated [README.md](https://github.com/owkin/PyDESeq2/blob/main/docs/README.md) for information on how to do so.

### Data

The quick start examples use synthetic data, provided in this repo (see [datasets](https://github.com/owkin/PyDESeq2/blob/main/datasets/README.md).)

The experiments described in the [PyDESeq2 article](https://academic.oup.com/bioinformatics/article/39/9/btad547/7260507) rely on data

from [The Cancer Genome Atlas](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga),

which may be obtained from this [portal](https://portal.gdc.cancer.gov/).

## Contributing

Please the [Contributing](https://pydeseq2.readthedocs.io/en/latest/usage/contributing.html) section of the

documentation to see how you can contribute to PyDESeq2.

### 1 - Download the repository

`git clone https://github.com/owkin/PyDESeq2.git`

### 2 - Create a conda environment

Run `conda create -n pydeseq2 python=3.9` (or higher python version) to create the `pydeseq2` environment and then activate it:

`conda activate pydeseq2`.

`cd` to the root of the repo and run `pip install -e ."[dev]"` to install in developer mode.

Then, run `pre-commit install`.

The `pre-commit` tool will automatically run [ruff](https://docs.astral.sh/ruff/), [black](https://black.readthedocs.io/en/stable/), and [mypy](https://mypy.readthedocs.io/en/stable/).

PyDESeq2 is a living project and any contributions are welcome! Feel free to open new PRs or issues.

## Development Roadmap

Here are some of the features and improvements we plan to implement in the future:

- [x] Integration to the [scverse](https://scverse.org/) ecosystem:

  * [x] Refactoring to use the [AnnData](https://anndata.readthedocs.io/) data structure

  * [x] Submitting a PR to be listed as an [scverse ecosystem](https://github.com/scverse/ecosystem-packages/) package

- [x] Variance-stabilizing transformation

- [ ] Improving multi-factor analysis:

  * [x] Allowing n-level factors

  * [x] Support for continuous covariates

  * [ ] Implementing interaction terms

## Citing this work

```

@article{muzellec2023pydeseq2,

  title={PyDESeq2: a python package for bulk RNA-seq differential expression analysis},

  author={Muzellec, Boris and Telenczuk, Maria and Cabeli, Vincent and Andreux, Mathieu},

  year={2023},

  doi = {10.1093/bioinformatics/btad547},

  journal={Bioinformatics},

}

```

## References

[1] Love, M. I., Huber, W., & Anders, S. (2014). "Moderated estimation of fold

        change and dispersion for RNA-seq data with DESeq2." Genome biology, 15(12), 1-21.

        

[2] Zhu, A., Ibrahim, J. G., & Love, M. I. (2019).

        "Heavy-tailed prior distributions for sequence count data:

        removing the noise and preserving large differences."

        Bioinformatics, 35(12), 2084-2092.

        

## License

PyDESeq2 is released under an [MIT license](https://github.com/owkin/PyDESeq2/blob/main/LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/owkin/PyDESeq2

Awesome Lists containing this project

README