An open API service indexing awesome lists of open source software.

https://github.com/poisonalien/annovar2maf

Tiny python script to generate MAF files from output generated by standard annotation programs
https://github.com/poisonalien/annovar2maf

Last synced: about 1 year ago
JSON representation

Tiny python script to generate MAF files from output generated by standard annotation programs

Awesome Lists containing this project

README

          

## Introduction

This is a tiny python script to generate MAF files from output generated by stadard annotation programs.
Currently, annovar - [table_annovar.pl](https://annovar.openbioinformatics.org/en/latest/user-guide/startup/) output and [bcftools csq](https://samtools.github.io/bcftools/howtos/csq-calling.html) outputs can be converted to [maf](https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/#:~:text=Mutation%20Annotation%20Format%20(MAF)%20is,through%20the%20Somatic%20Aggregation%20Workflow.).

```
$ python annovar2maf.py -h
usage: annovar2maf [-h] [-t TSB] [-b BUILD] [-p {refGene,ensGene}] [-c] input

Convert annovar and bcftools-csq annotations to MAF

positional arguments:
input Annovar anotations file [Ex: myanno.hg19_multianno.txt] or a csq formatted file.

optional arguments:
-h, --help show this help message and exit
-t TSB, --tsb TSB Sample name. Default parses from the file name
-b BUILD, --build BUILD
Reference genome build [Default: hg38]
-p {refGene,ensGene}, --protocol {refGene,ensGene}
Protocol used to generate annovar annotations [Default: refGene]
-c, --csq Input file is a bcftools csq formatted output
```

### annovar2maf

```
python annovar2maf.py -t foo -b GRCh37 tests/test_mutect.refseq.hg19_multianno.txt

# For annovar annotations generated with ensGene as a protocol
python annovar2maf.py -p ensGene -t foo -b GRCh37 tests/test_mutect.ens.hg19_multianno.txt
```

### csq2maf

Similar to VEP, `bcftools csq` command can annotate variants with consequences. The program is lightweight and extremely [fast](https://samtools.github.io/bcftools/howtos/csq-calling.html)
Output can be converted to tsv with [split-vep](https://samtools.github.io/bcftools/howtos/plugin.split-vep.html) and then converted to MAF.

```
ref="Homo_sapiens.GRCh37.dna.primary_assembly.fa"

# Get the GFF files for your ref build
## GRCh38 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.gff3.gz

## GRCh37 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.chr.gff3.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.gff3.gz

## Step-1: Below commands left normalizes the VCF, splits multi-alleleic variants, annotates vcf with variant consequences while prioritizing variants with worst consequences.
bcftools norm -f ${ref} -m -both -Oz tests/test_mutect.vcf.gz | bcftools csq -c CSQ -f ${ref} -g Homo_sapiens.GRCh37.82.gff3.gz -p a | \
bcftools +split-vep /dev/stdin -Oz -o tests/test_mutect.csq.vcf.gz -c - -s worst

## Step-2: Below command converts csq annotated vcf to tsv
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%gene\t%transcript\t%Consequence\t%amino_acid_change\t%dna_change\n' tests/test_mutect.csq.vcf.gz > tests/test_mutect.csq.tsv

## Step-3: Now Covert tsv to maf
python annovar2maf.py -c -t foo -b GRCh37 tests/test_mutect.csq.tsv
```