https://github.com/bhklab/eacon_cnv_pipeline
https://github.com/bhklab/eacon_cnv_pipeline
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bhklab/eacon_cnv_pipeline
- Owner: bhklab
- Created: 2021-10-28T21:59:10.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-09-10T19:48:35.000Z (over 2 years ago)
- Last Synced: 2024-12-30T03:23:52.469Z (5 months ago)
- Language: R
- Size: 129 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Easy Copy Number Analysis (EaCoN) Pipeline
This pipeline has been adapted from https://github.com/gustaveroussy/EaCoN.
It leverages the `EaCoN` R package to conduct preprocessing and normalization,
segmentation and copy number estimation from specific microarray .CEL files.
The `EaCoN` package supports copy number estimation from Cytoscan, Oncoscan,
SNP6 arrays as well as WES data, however, the current pipeline has only
implemented support for microarray data.## Snakemake
This pipeline leverages the `snakemake` Python package for workflow management.
As a result the pipeline and its dependencies are easily
installed from this repository, allowing quick setup, configuration and
deployment.For more information on Snakemake, please see:
https://snakemake.readthedocs.io/en/stable/.## Software Environment
Dependency management for this pipeline is handled via `conda` for Python
and `renv` for R. To get started with setup you can install
miniconda3 using the instructions available here: https://docs.conda.io/en/latest/miniconda.html.Alternatively you can install it directly from CRAN
as described here: https://cran.r-project.org/.## Setting Up Your Software Environment
The first step to deploying an analysis pipeline is to install the various
software packages it depends on. We have included the `env/eacon.yml` and
`renv.lock` files here to easily accomplish this.All commands should be executed from the top level directory of this
repository unless otherwise indicated.### Python and Snakemake
The first step to deploying an analysis pipeline is to install Python,
Snakemake and Singularity via `conda`. We have included the
`envs/eacon.yml` which specifies all the requisite dependencies to use
`snakemake` for this pipeline, including R.You can use `conda` to install all Python and OS system dependencies
using:`conda env create --file env/eacon.yml`
This will take some time to run as it gathers and installs the correct
package versions. The environment it creates should be called `eacon`.If it is not automatically activated after installation please run
`conda activate eacon` before proceeding to the next step.### R Dependencies
R dependencies are handled via `renv` and all rules in this
pipeline will use the local R package cache stored in the `renv` directory.To install all R dependencies run:
`Rscript -e 'library(renv); renv::init();'`
If you wish to isolate the R dependencies from your Conda environment R libraries,
you can use this command instead:`Rscript -e 'library(renv); renv::isolate(); renv::init(bare=TRUE)'`
If intialization doesn't trigger dependency installation, you can do so manually using:
`Rscript -e 'renv::restore()'`
For more information on `renv` and how it can be used to manage dependencies in
your project, please see: https://rstudio.github.io/renv/articles/renv.html.## Configuring the Pipeline
This pipeline assumes the following directory structure:
```
.
├── env
├── metadata
├── procdata
├── rawdata
├── renv
├── results
└── scripts
```Please at minimum create the `rawdata` and `metadata` directories, as they are assumed to hold the raw microarray plate data (.CEL) and the pairs file, respectively. For more information on the correct formatting for your pairs file, please see https://github.com/gustaveroussy/EaCoN.
The remaining missing directories will be created automatically as the pipeline runs.### config.yaml
This file hold the relevant pipeline documentation. Here you can specify the paths
to all the parameters for your current pipeline use case. Documentation is provided
in the `config.yaml` file on what each field should contain.## Using the Pipeline
### Deployment
To run the pipeline end-to-end:
```
snakemake --cores --use-conda
```
Where is the number of cores to parallelize over.For details on deploying this pipleine via your local HPC cluster or in
the cloud, please consult the Snakemake documentation. Deployment to these
platforms requires minimal additional configuration.### Individual Rules
The pipeline can also be run rule by rule using the rule names.
#### Batch Processing and Normalization
`snakemake --cores 2 --use-conda batch_process_rawdata`
#### Segmentation
`snakemake --cores 2 --use-conda segment_processed_data`
#### Copy Number Calling
`snakemake --cores 2 --use-conda estimate_copy_number`
#### Determine Optimal Value for Gamma
`snakemake --cores 2 --use-conda select_optimal_gamma`
#### Build Bioconductor RaggedExperiment Object
`snakemake --cores 2 --use-conda build_ragged_experiment`
#### Filter Samples Based on QC Criteria
`snakemake --cores 2 --use-conda sample_quality_control`