https://github.com/nci-gdc/variant-filtration-tool
Support tools for the GDC Variant Filtration Workflow
https://github.com/nci-gdc/variant-filtration-tool
bioinformatics docker workflow-tool
Last synced: 10 months ago
JSON representation
Support tools for the GDC Variant Filtration Workflow
- Host: GitHub
- URL: https://github.com/nci-gdc/variant-filtration-tool
- Owner: NCI-GDC
- License: apache-2.0
- Created: 2016-04-19T18:20:02.000Z (about 10 years ago)
- Default Branch: main
- Last Pushed: 2025-06-20T19:48:47.000Z (12 months ago)
- Last Synced: 2025-06-20T20:49:21.132Z (12 months ago)
- Topics: bioinformatics, docker, workflow-tool
- Language: Python
- Homepage:
- Size: 188 KB
- Stars: 1
- Watchers: 8
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
GDC Variant Filtration Tool
---
This repository contains the source code used in the VCF variant filtration
workflows within the GDC. A single CLI is generated with multiple subcommands.
## Requirements
* Python >= 3.6
* pysam
* defopt
## Subcommands
### `add-oxog-filters`
Adds 'oxog' filter tag to VCFs.
```
usage: gdc-filtration-tools add-oxog-filters [-h]
input_vcf input_dtoxog output_vcf
Adds 'oxog' filter tag to VCFs.
positional arguments:
input_vcf The full input VCF file to filter.
input_dtoxog The dtoxog VCF from dtoxog-maf-to-vcf used to annotate the full input VCF.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `create-dtoxog-maf`
Takes a SNP-only VCF file and converts it to the dToxoG MAF format
which includes the OXOQ value.
```
usage: gdc-filtration-tools create-dtoxog-maf [-h]
input_vcf output_file reference
oxog_file oxoq_score
Takes a SNP-only VCF file and converts it to the dToxoG MAF format
which includes the OXOQ value.
positional arguments:
input_vcf The input SNP-only VCF file to convert to dToxoG MAF.
output_file The output MAF file to create.
reference Faidx indexed reference fasta file.
oxog_file Metrics file output from GATK OxoGMetrics tool.
oxoq_score The oxoQ score.
optional arguments:
-h, --help show this help message and exit
```
### `create-oxog-intervals`
Takes a SNP-only VCF file and creates an interval list for
use by the Broad oxog metrics tool.
```
usage: gdc-filtration-tools create-oxog-intervals [-h] input_vcf output_file
Takes a SNP-only VCF file and creates an interval list for
use by the Broad oxog metrics tool.
positional arguments:
input_vcf The input SNP-only VCF file to extract intervals from.
output_file The output interval list to create.
optional arguments:
-h, --help show this help message and exit
```
### `dtoxog-maf-to-vcf`
Transforms dToxoG MAF to minimal VCF of only dtoxo failures.
```
usage: gdc-filtration-tools dtoxog-maf-to-vcf [-h]
input_maf reference_fa
output_vcf
Transforms dToxoG MAF to minimal VCF of only dtoxo failures.
positional arguments:
input_maf The annotated dtoxog MAF output file.
reference_fa Reference fasta used to make seqdict header.
output_vcf The output minimal VCF with only failed dtoxog records. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `extract-oxoq-from-sqlite`
Extract the OXOQ score for a particular context from the GDC
harmonization metrics SQLite file. The score is printed to
stdout.
```
usage: gdc-filtration-tools extract-oxoq-from-sqlite [-h] [-c CONTEXT]
[-t TABLE]
[-i INPUT_STATE]
db_file
Extract the OXOQ score for a particular context from the GDC
harmonization metrics SQLite file. The score is printed to
stdout.
positional arguments:
db_file Path to the SQLite db file.
optional arguments:
-h, --help show this help message and exit
-c CONTEXT, --context CONTEXT
The nucleotide context of interest.
(default: CCG)
-t TABLE, --table TABLE
The SQLite table name.
(default: picard_CollectOxoGMetrics)
-i INPUT_STATE, --input-state INPUT_STATE
The input state to select for in the input_state column.
(default: markduplicates_readgroups)
```
### `filter-contigs`
Filter out VCF records on chromosomes that are not present
in the contig lines of the VCF header.
```
usage: gdc-filtration-tools filter-contigs [-h] input_vcf output_vcf
Filter out VCF records on chromosomes that are not present
in the contig lines of the VCF header.
positional arguments:
input_vcf The input VCF file to filter.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `filter-nonstandard-variants`
Remove non-ACTG loci from a SNP-ONLY VCF. No validation that
the VCF is SNP-only is done.
```
usage: gdc-filtration-tools filter-nonstandard-variants [-h]
input_vcf output_vcf
Remove non-ACTG loci from a SNP-ONLY VCF. No validation that
the VCF is SNP-only is done.
positional arguments:
input_vcf The input SNP-only VCF file to filter.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `filter-somatic-score`
Filters SomaticSniper VCF files based on the Somatic Score.
```
usage: gdc-filtration-tools filter-somatic-score [-h] [-t TUMOR_SAMPLE_NAME]
[-d DROP_SOMATIC_SCORE]
[-m MIN_SOMATIC_SCORE]
input_vcf output_vcf
Filters SomaticSniper VCF files based on the Somatic Score.
positional arguments:
input_vcf The input VCF file to filter.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
-t TUMOR_SAMPLE_NAME, --tumor-sample-name TUMOR_SAMPLE_NAME
The name of the tumor sample in the VCF.
(default: TUMOR)
-d DROP_SOMATIC_SCORE, --drop-somatic-score DROP_SOMATIC_SCORE
If the somatic score is < this, remove it.
(default: 25)
-m MIN_SOMATIC_SCORE, --min-somatic-score MIN_SOMATIC_SCORE
If the somatic score is > drop_somatic_score and < this value, add ssc filter tag.
(default: 40)
```
### `format-gdc-vcf`
Adds VCF header metadata specific to the GDC.
```
usage: gdc-filtration-tools format-gdc-vcf [-h] [-r REFERENCE_NAME]
input_vcf output_vcf
patient_barcode case_id
tumor_barcode tumor_aliquot_uuid
tumor_bam_uuid normal_barcode
normal_aliquot_uuid normal_bam_uuid
Adds VCF header metadata specific to the GDC.
positional arguments:
input_vcf The input VCF file to format.
output_vcf The output formatted VCF file to create. BGzip and tabix-index created if ends with '.gz'.
patient_barcode The case submitter id.
case_id The case uuid.
tumor_barcode The tumor aliquot submitter id.
tumor_aliquot_uuid The tumor aliquot uuid.
tumor_bam_uuid The tumor bam uuid.
normal_barcode The normal aliquot submitter id.
normal_aliquot_uuid The normal aliquot uuid.
normal_bam_uuid The normal bam uuid.
optional arguments:
-h, --help show this help message and exit
-r REFERENCE_NAME, --reference-name REFERENCE_NAME
Reference name to use in header.
(default: GRCh38.d1.vd1.fa)
```
### `format-pindel-vcf`
Formats Pindel VCFs to work better with GDC downstream workflows.
```
usage: gdc-filtration-tools format-pindel-vcf [-h] input_vcf output_vcf
Formats Pindel VCFs to work better with GDC downstream workflows.
positional arguments:
input_vcf The input VCF file to filter.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `format-sanger-pindel-vcf`
Formats the Sanger Pindel VCF by filling in the `./.` genotypes to `0/0` for normal and
`0/1` for tumor.
```
usage: gdc_filtration_tools format-sanger-pindel-vcf [-h] input_vcf output_vcf
Formats Sanger Pindel VCFs to work better with GDC downstream workflows.
positional arguments:
input_vcf The input VCF file to format.
output_vcf The output formatted VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
### `position-filter-dkfz`
Removes VCF records where the POS-2 is less than 0 which
will cause an Exception to be thrown in DKFZBiasFilter. We
assume that the input VCF only contains SNPs, but no assertions
are made to validate this.
```
usage: gdc-filtration-tools position-filter-dkfz [-h] input_vcf output_vcf
Removes VCF records where the POS-2 is less than 0 which
will cause an Exception to be thrown in DKFZBiasFilter. We
assume that the input VCF only contains SNPs, but no assertions
are made to validate this.
positional arguments:
input_vcf The input VCF file to filter.
output_vcf The output filtered VCF file to create. BGzip and tabix-index created if ends with '.gz'.
optional arguments:
-h, --help show this help message and exit
```
## Docker Tools
**variant-filtration-tool**
Repository: https://quay.io/ncigdc/variant-filtration-tool
Pull by docker.osdc.io/ncigdc/variant-filtration-tool:build-80-ead70e7d