Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucacappelletti94/crr_labels

Python package wrapping over FANTOM and Roadmap labels for cis regulatory regions.
https://github.com/lucacappelletti94/crr_labels

Last synced: 9 days ago
JSON representation

Python package wrapping over FANTOM and Roadmap labels for cis regulatory regions.

Awesome Lists containing this project

README

        

crr_labels
=========================================================================================
|travis| |sonar_quality| |sonar_maintainability| |codacy| |code_climate_maintainability| |pip| |downloads|

Python package wrapping over FANTOM and Roadmap labels for cis-regulatory regions.

How do I install this package?
----------------------------------------------
As usual, just download it using pip:

.. code:: shell

pip install crr_labels

Tests Coverage
----------------------------------------------
Since some software handling coverages sometimes get slightly different results, here's three of them:

|coveralls| |sonar_coverage| |code_climate_coverage|

Usage examples
-----------------------------------------------
Currently, we support `FANTOM CAGE data `_ and `Roadmap `_ but in the future an additional
cis-regulatory dataset based on open chromatin data will be added.

FANTOM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To retrieve the FANTOM promoters and enhancers you can proceed as follows:

.. code:: python

from crr_labels import fantom

enhancers, promoters = fantom(
cell_lines=["HelaS3", "GM12878"], # list of cell lines to be considered.
window_size=200, # window size to use for the various regions.
genome = "hg19", # considered genome version. Currently supported only "hg19".
center_enhancers = "peak", # how to center the enhancer window, either around "peak" or the "center" of the region.
enhancers_threshold = 0, # activation threshold for the enhancers.
promoters_threshold = 5, # activation threshold for the promoters.
drop_always_inactive_rows = True, # whether to drop the rows where no activation is detected for every row.
binarize = True, # whether to return the data binary-encoded, zero for inactive, one for active.
nrows = None # the number of rows to read, useful when testing pipelines for creating smaller datasets.
)

The library will download and parse the fantom project raw data and return two DataFrames for the required cell lines.
Consider reading the method docstring for more in-depth information about the method.

The main steps are the following:

- The raw files are retrieved from the fantom dataset from the link specified in the `fantom_data.json file `_
- The window for the enhancers and promoters are expanded or compressed to the given window size. In particular:

- The enhancers' window can either be centered on the region center with the "center" mode or around the "peak" with the "peak" mode.
- The promoters' window is upstream in the positive strand from the end of the promoter and downstream on the negative strand from the start of the promoter.
- When multiple experiments are present for a cell line, for instance for "HelaS3", an average of the activation peaks is executed.
- Optionally (and by default) the rows that are always inactive for the chosen cell lines are dropped. You can specify this behaviour using the parameter "drop_always_inactive_rows".

Roadmap
~~~~~~~~~~~~~~~~~~~~~~~~~~~
To retrieve the Roadmap promoters and enhancers you can proceed as follows:

.. code:: python

from crr_labels import roadmap

enhancers, promoters = roadmap(
cell_lines = ["HelaS3", "GM12878"], # List of cell lines to be considered.
window_size = 200, # Window size to use for the various regions.
genome = "hg19", # Considered genome version. Currently supported only "hg19".
states = 18, # Number of the states of the model to consider. Currently supported only "15" and "18".
enhancers_labels = ("7_Enh", "9_EnhA1", "10_EnhA2"), # Labels to encode as active enhancers.
promoters_labels = ("1_TssA",), # Labels to enode as active promoters.
nrows = None # the number of rows to read, useful when testing pipelines for creating smaller datasets.
)

Consider reading the method docstring for more in-depth information about the method.

Rendered datasets
----------------------------------
The following two datasets have labels for 7 common cell lines (GM12878, HelaS3, HepG2, K562, A549, H1, H9) and for various other that were not available in the other dataset.

FANTOM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following datasets contain data for the cell lines GM12878, HelaS3, HepG2, K562, A549, H1, H9, JURKAT, MCF7, HEK293, Caco2, HL60 and PC3.

TODO: Render the datasets

Roadmap
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following datasets contain data for the cell lines GM12878, HelaS3, HepG2, K562, A549, H1, H9, DND41, HUES48, HUES6, HUES64 and IMR90.

TODO: The updated processed labels will be added as soon as we decide on the new states to be used.

.. |travis| image:: https://travis-ci.org/LucaCappelletti94/crr_labels.png
:target: https://travis-ci.org/LucaCappelletti94/crr_labels
:alt: Travis CI build

.. |sonar_quality| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_crr_labels&metric=alert_status
:target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_crr_labels
:alt: SonarCloud Quality

.. |sonar_maintainability| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_crr_labels&metric=sqale_rating
:target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_crr_labels
:alt: SonarCloud Maintainability

.. |sonar_coverage| image:: https://sonarcloud.io/api/project_badges/measure?project=LucaCappelletti94_crr_labels&metric=coverage
:target: https://sonarcloud.io/dashboard/index/LucaCappelletti94_crr_labels
:alt: SonarCloud Coverage

.. |coveralls| image:: https://coveralls.io/repos/github/LucaCappelletti94/crr_labels/badge.svg?branch=master
:target: https://coveralls.io/github/LucaCappelletti94/crr_labels?branch=master
:alt: Coveralls Coverage

.. |pip| image:: https://badge.fury.io/py/crr-labels.svg
:target: https://badge.fury.io/py/crr-labels
:alt: Pypi project

.. |downloads| image:: https://pepy.tech/badge/crr-labels
:target: https://pepy.tech/badge/crr-labels
:alt: Pypi total project downloads

.. |codacy| image:: https://api.codacy.com/project/badge/Grade/c0a7e110045a4d25933c65fe2014a33c
:target: https://www.codacy.com/manual/LucaCappelletti94/crr_labels?utm_source=github.com&utm_medium=referral&utm_content=LucaCappelletti94/crr_labels&utm_campaign=Badge_Grade
:alt: Codacy Maintainability

.. |code_climate_maintainability| image:: https://api.codeclimate.com/v1/badges/7c18ec5176f2ebebef96/maintainability
:target: https://codeclimate.com/github/LucaCappelletti94/crr_labels/maintainability
:alt: Maintainability

.. |code_climate_coverage| image:: https://api.codeclimate.com/v1/badges/7c18ec5176f2ebebef96/test_coverage
:target: https://codeclimate.com/github/LucaCappelletti94/crr_labels/test_coverage
:alt: Code Climate Coverate