Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nci-gdc/gdc-somatic-variant-calling-workflow
CWL for GDC somatic variant calling workflow
https://github.com/nci-gdc/gdc-somatic-variant-calling-workflow
bioinformatics cwl workflow
Last synced: 5 days ago
JSON representation
CWL for GDC somatic variant calling workflow
- Host: GitHub
- URL: https://github.com/nci-gdc/gdc-somatic-variant-calling-workflow
- Owner: NCI-GDC
- License: apache-2.0
- Created: 2017-09-26T15:08:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-11-08T21:12:05.000Z (about 1 year ago)
- Last Synced: 2024-04-14T12:44:39.759Z (9 months ago)
- Topics: bioinformatics, cwl, workflow
- Language: Common Workflow Language
- Homepage:
- Size: 154 KB
- Stars: 1
- Watchers: 10
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# GDC DNA-Seq Somatic Variant Calling Workflow
![Version badge](https://img.shields.io/badge/MuSE-v1.0rc__submission__c039ffa-brightgreen.svg)
![Version badge](https://img.shields.io/badge/GATK3.6-nightly--2016--02--25--gf39d340-brightgreen.svg)
![Version badge](https://img.shields.io/badge/SomaticSniper-1.0.5.0-brightgreen.svg)
![Version badge](https://img.shields.io/badge/VarScan-v2.3.9-brightgreen.svg)
![Version badge](https://img.shields.io/badge/samtools-1.1-yellowgreen.svg)
![Version badge](https://img.shields.io/badge/Picard-2.18.4--SNAPSHOT-yellowgreen.svg)GDC Documentation: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#somatic-variant-calling-workflow
## Submodules
This repo uses individual caller's CWL as its submodule. To fully download:
```
git clone https://github.com/NCI-GDC/gdc-somatic-variant-calling-workflow --recursive
```* [GDC-MuSE-CWL](https://github.com/NCI-GDC/muse-cwl "GDC-MuSE-CWL")
* [GDC-GATK3-MuTect2-CWL](https://github.com/NCI-GDC/mutect2-cwl "GDC-GATK3-MuTect2-CWL")
* [GDC-SomaticSniper-CWL](https://github.com/NCI-GDC/somaticsniper-cwl "GDC-SomaticSniper-CWL")
* [GDC-VarScan2-CWL](https://github.com/NCI-GDC/varscan-cwl "GDC-VarScan2-CWL")
* [GDC-Samtools-mpileup-CWL](https://github.com/NCI-GDC/samtools-mpileup-cwl "GDC-Samtools-mpileup-CWL")
* [GDC-Variant-filtration-CWL](https://github.com/NCI-GDC/variant-filtration-cwl "GDC-Variant-filtration-CWL")## CWL
https://www.commonwl.org/
The CWL are tested under multiple `cwltools` environments. The most tested one is:
* cwltool 1.0.20180306163216## For external users
The repository has only been tested on GDC data and in the particular environment GDC is running in. Some of the reference data required for the workflow production are hosted in [GDC reference files](https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files "GDC reference files"). For any questions related to GDC data, please contact the GDC Help Desk at [email protected].
### Individual caller CWL workflow
You could find individual caller's CWL workflow under `workflows/subworkflows`.
* [MuSE workflow](workflows/subworkflows/muse_workflow.cwl "MuSE workflow")
* [GATK3-MuTect2 workflow](workflows/subworkflows/gatk3_mutect2_workflow.cwl "GATK3-MuTect2 workflow")
* [SomaticSniper workflow](workflows/subworkflows/somaticsniper_workflow.cwl "SomaticSniper workflow")
* [VarScan2 workflow](workflows/subworkflows/varscan2_workflow.cwl "VarScan2 workflow")For more information, please visit the above submodule links.
### GDC workflow entrypoint
GDC workflow contains GATK3 IndelRealignment and all four callers, `workflows/gdc-somatic-variant-calling-workflow.cwl`.The example of input json in `example/main_workflow.example.input.json`.
General inputs
| Name | type | Description |
| ---- | ---- | ----------- |
| project_id | string? | Project id. Served as file name prefix. |
| muse_caller_id | string | MuSE caller id. Served as file name prefix. |
| mutect2_caller_id | string | MuTect2 caller id. Served as file name prefix. |
| somaticsniper_caller_id | string | SomaticSniper caller id. Served as file name prefix. |
| varscan2_caller_id | string | VarScan2 caller id. Served as file name prefix. |
| experimental_strategy | string | Experimental strategy. Used for `MuSE sump` GDC default is WXS. |
| tumor_bam | File | Tumor BAM file with index. |
| normal_bam | File | Normal BAM file with index. |
| reference | File | Human genome reference. GDC default is GRCh38. |
| known_indel | File | INDEL reference file. |
| known_snp | File | dbSNP reference file. GDC default is dbSNP build-144. |
| panel_of_normal | File | Panel of normal reference file. |
| cosmic | File | Cosmic reference file. GDC default is COSMICv75. |
| job_uuid | string | Job id. Served as a prefix for all VCF outputs. |
| java_opts | string | Java `-Xmx` option flags for all the java cmd. GDC default is 3G. |
| threads | int | Threads for internal multithreading dockers. |
| usedecoy | boolean | If specified, it will include all the decoy sequences in the faidx. GDC default is false. |Step inputs
| Step | Name | Description |
| ---- | ---- | ----------- |
| GATK3 coclean | --gatk_\*
--rtc_\*
--ir_\* | GATK3 `RealignerTargetCreator` and `IndelRealigner` parameters |
| MuSE | None | No parameters for MuSE |
| GATK3 MuTect2 | --cont
--duscb | `-contamination` and `-dontUseSoftClippedBases` from GATK3 `MuTect2` parameters |
| SomaticSniper | --map_q
--base_q
--loh
--gor
--psc
--ppa
--pps
--theta
--nhap
--pd
--fout
| All SomaticSniper parameters |
| VarScan2 | --min_coverage
--min_cov_normal
--min_cov_tumor
--min_var_freq
--min_freq_for_hom
--normal_purity
--tumor_purity
--vs_ps_value
--somatic_p_value
--strand_filter
--validation
--output_vcf
--min_tumor_freq
--max_normal_freq
--vps_p_value
|All parameters from VarScan2 `somatic` and `processSomatic` function |## For GDC users
The entrypoint CWL workflow for GDC users is `gpas-somatic-mutation-calling-workflow.cwl`.