https://github.com/bhklab/eacon_cnv_pipeline

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/bhklab/eacon_cnv_pipeline
Owner: bhklab
Created: 2021-10-28T21:59:10.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-09-10T19:48:35.000Z (over 2 years ago)
Last Synced: 2024-12-30T03:23:52.469Z (5 months ago)
Language: R
Size: 129 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Easy Copy Number Analysis (EaCoN) Pipeline

This pipeline has been adapted from https://github.com/gustaveroussy/EaCoN.

It leverages the `EaCoN` R package to conduct preprocessing and normalization,

segmentation and copy number estimation from specific microarray .CEL files.

The `EaCoN` package supports copy number estimation from Cytoscan, Oncoscan,

SNP6 arrays as well as WES data, however, the current pipeline has only

implemented support for microarray data.

## Snakemake

This pipeline leverages the `snakemake` Python package for workflow management.

As a result the pipeline and its dependencies are easily

installed from this repository, allowing quick setup, configuration and

deployment.

For more information on Snakemake, please see:

https://snakemake.readthedocs.io/en/stable/.

## Software Environment

Dependency management for this pipeline is handled via `conda` for Python

and `renv` for R. To get started with setup you can install

miniconda3 using the instructions available here: https://docs.conda.io/en/latest/miniconda.html.

Alternatively you can install it directly from CRAN

as described here: https://cran.r-project.org/.

## Setting Up Your Software Environment

The first step to deploying an analysis pipeline is to install the various

software packages it depends on. We have included the `env/eacon.yml` and

`renv.lock` files here to easily accomplish this.

All commands should be executed from the top level directory of this

repository unless otherwise indicated.

### Python and Snakemake

The first step to deploying an analysis pipeline is to install Python,

Snakemake and Singularity via `conda`. We have included the

`envs/eacon.yml` which specifies all the requisite dependencies to use

`snakemake` for this pipeline, including R.

You can use `conda` to install all Python and OS system dependencies

using:

`conda env create --file env/eacon.yml`

This will take some time to run as it gathers and installs the correct

package versions. The environment it creates should be called `eacon`.

If it is not automatically activated after installation please run

`conda activate eacon` before proceeding to the next step.

### R Dependencies

R dependencies are handled via `renv` and all rules in this

pipeline will use the local R package cache stored in the `renv` directory.

To install all R dependencies run:

`Rscript -e 'library(renv); renv::init();'`

If you wish to isolate the R dependencies from your Conda environment R libraries,

you can use this command instead:

`Rscript -e 'library(renv); renv::isolate(); renv::init(bare=TRUE)'`

If intialization doesn't trigger dependency installation, you can do so manually using:

`Rscript -e 'renv::restore()'`

For more information on `renv` and how it can be used to manage dependencies in

your project, please see: https://rstudio.github.io/renv/articles/renv.html.

## Configuring the Pipeline

This pipeline assumes the following directory structure:

```

.

├── env

├── metadata

├── procdata

├── rawdata

├── renv

├── results

└── scripts

```

Please at minimum create the `rawdata` and `metadata` directories, as they are assumed to hold the raw microarray plate data (.CEL) and the pairs file, respectively. For more information on the correct formatting for your pairs file, please see https://github.com/gustaveroussy/EaCoN.

The remaining missing directories will be created automatically as the pipeline runs.

### config.yaml

This file hold the relevant pipeline documentation. Here you can specify the paths

to all the parameters for your current pipeline use case. Documentation is provided

in the `config.yaml` file on what each field should contain.

## Using the Pipeline

### Deployment

To run the pipeline end-to-end:

```

snakemake --cores  --use-conda

```

Where  is the number of cores to parallelize over.

For details on deploying this pipleine via your local HPC cluster or in

the cloud, please consult the Snakemake documentation. Deployment to these

platforms requires minimal additional configuration.

### Individual Rules

The pipeline can also be run rule by rule using the rule names.

#### Batch Processing and Normalization

`snakemake --cores 2 --use-conda batch_process_rawdata`

#### Segmentation

`snakemake --cores 2 --use-conda segment_processed_data`

#### Copy Number Calling

`snakemake --cores 2 --use-conda estimate_copy_number`

#### Determine Optimal Value for Gamma

`snakemake --cores 2 --use-conda select_optimal_gamma`

#### Build Bioconductor RaggedExperiment Object

`snakemake --cores 2 --use-conda build_ragged_experiment`

#### Filter Samples Based on QC Criteria

`snakemake --cores 2 --use-conda sample_quality_control`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bhklab/eacon_cnv_pipeline

Awesome Lists containing this project

README