https://github.com/greenelab/pdx_exomeseq

Pipeline analysis for whole exome sequencing of pancreatic cancer PDX models
https://github.com/greenelab/pdx_exomeseq

analysis cancer methodology mutation notebook pancreatic-cancer pdx pipeline sequencing wes

Last synced: 3 months ago
JSON representation

Pipeline analysis for whole exome sequencing of pancreatic cancer PDX models

Host: GitHub
URL: https://github.com/greenelab/pdx_exomeseq
Owner: greenelab
License: mit
Created: 2016-08-22T12:26:39.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2018-10-22T22:04:30.000Z (over 6 years ago)
Last Synced: 2025-01-13T00:42:06.716Z (5 months ago)
Topics: analysis, cancer, methodology, mutation, notebook, pancreatic-cancer, pdx, pipeline, sequencing, wes
Language: HTML
Homepage:
Size: 133 MB
Stars: 21
Watchers: 6
Forks: 14
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# Whole Exome Sequencing Pipeline for JAX FNA-PDX models of Pancreatic Cancer

**Gregory Way¹, Casey Greene¹, Yolanda Sanchez²**

1. University of Pennsylvania
2. Geisel School of Medicine at Dartmouth

## Summary

Patient derived xenograft (PDX) models were derived from primary and metastatic tumors from patients admitted to Dartmouth-Hitchcock Medical Center (DHMC) with pancreatic adenocarcinoma (PAAD).
The PDX models and tumor samples were whole exome sequenced (WES) to determine how the mutations from primary tissue and metastases propagate and evolve.
The following repository outlines the wes and analysis pipelines.

This is a tumor-only analysis; there were no pooled or patient-matched normal samples available.
The following flowchart summarizes the wes pipeline.

![pdx wes flowchart](figures/pdx_wes_flowchart.png?raw=true)

**Figure 1A** describes the technical replicates and data-types available across tumor and mouse passages.
**Figure 1B** outlines our whole exome sequencing pipeline. We first apply quality control processing to raw reads, then align and remove mouse reads, and finally call and annotate variants.

## WES Pipeline

See [`wes_pipeline.sh`](https://github.com/greenelab/pdx_exomeseq/blob/master/wes_pipeline.sh) for our current variant-calling pipeline for tumor-only WES.
This script was run step-by-step on the Dartmouth Discovery compute cluster.

### WES Compute Environment

All work was performed using the Dartmouth Discovery Cluster Computer with the conda environment specified in [`environment.yml`](https://github.com/greenelab/pdx_exomeseq/blob/master/environment.yml).

### Steps to Reproduce

There are 3 major steps this repository provides to get from raw sequencing reads to annotated variants.

#### 1. Setup reproducible computational environment ([`setup_environment.sh`](https://github.com/greenelab/pdx_exomeseq/blob/master/setup_environment.sh), [`install.sh`](https://github.com/greenelab/pdx_exomeseq/blob/master/install.sh))

```bash
# Setup conda (version 4.5 or greater) environment
bash setup_environment.sh

# NOTE: run `conda activate pdx-exomeseq` at the beginning of each session

# Install dependencies and initialize files
# This includes downloading reference genomes and generating several index files
bash install.sh
```

#### 2. Run data processing pipeline ([`wes_pipeline.sh`](https://github.com/greenelab/pdx_exomeseq/blob/master/wes_pipeline.sh))

```bash
# NOTE: the commands in the following script must be run sequentially
# The script will submit several jobs per specified file that can take upwards of
# 12 hours per sample to run _for each command_. This requires the user to specify
# which command is being run by commenting out all others.
bash wes_pipeline.sh
```

Also note that the configuration file `discovery_variables.yml` includes absolute paths to each tool or resource.
It is sufficient to update this file only if paths to current tools change.

#### 3. Visualize and summarize results ([`analysis_pipeline.sh`](https://github.com/greenelab/pdx_exomeseq/blob/master/analysis_pipeline.sh))

We use Jupyter notebooks and R scripts to visualize and summarize results.
We describe the analysis in the next section.

## Analysis Pipeline

After obtaining the called variants, we perform a series of analyses and visualizations.
These analyses use a separate conda environment which is specified in
[`analysis_environment.yml`](https://github.com/greenelab/pdx_exomeseq/blob/master/analysis_environment.yml).

### Computational Environment

Follow these steps to install and begin using this conda environment:

```bash
# Using conda version 4.5 or greater
conda env create --force --file analysis_environment.yml
conda activate pdx-exomeseq-analysis
```

### Reproduce Results

In order to reproduce the results of the analysis pipeline perform the following steps.
(Note that the variants are expected to be processed before running the pipeline)

```bash
bash analysis_pipeline.sh
```

### Scripts

The following notebooks perform the analysis and obtain figures and results:

| Script | Output |
| :----- | :----- |
| [`1.read-depth-stats.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/1.read-depth-stats.ipynb) | Determine read depth against proportion of genome covered |
| [`2.disambiguate-reads.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/2.disambiguate-reads.ipynb) | Visualizing the separation of mouse and human reads |
| [`3.filter-variants.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/3.filter-variants.ipynb) | Visualize variant filtration and process filtered VCFs |
| [`4.variant-allele-frequency.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/4.variant-allele-frequency.ipynb) | visualize gnomAD by SIFT scores for replicates and filtered merged files |
|[`5.upset-plots.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/5.upset-plots.ipynb) | Generate UpSet plots to visualize variant overlaps across patient sets |
|[`6.generate-oncoprint-data.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/5.generate-oncoprint-data.ipynb) | Wrangle variant calls to generate data for input into oncoprint visualization |
| [`7.visualize-oncoprint.ipynb`](https://github.com/greenelab/pdx_exomeseq/blob/master/6.visualize-oncoprint.ipynb) | Visualize oncoprint diagrams and variant similarity matrices |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/greenelab/pdx_exomeseq

Awesome Lists containing this project

README