https://github.com/poisonalien/annovar2maf
Tiny python script to generate MAF files from output generated by standard annotation programs
https://github.com/poisonalien/annovar2maf
Last synced: about 1 year ago
JSON representation
Tiny python script to generate MAF files from output generated by standard annotation programs
- Host: GitHub
- URL: https://github.com/poisonalien/annovar2maf
- Owner: PoisonAlien
- License: mit
- Created: 2023-07-20T13:24:55.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-20T13:31:07.000Z (almost 3 years ago)
- Last Synced: 2025-04-01T00:41:20.811Z (about 1 year ago)
- Language: Python
- Size: 74.2 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Introduction
This is a tiny python script to generate MAF files from output generated by stadard annotation programs.
Currently, annovar - [table_annovar.pl](https://annovar.openbioinformatics.org/en/latest/user-guide/startup/) output and [bcftools csq](https://samtools.github.io/bcftools/howtos/csq-calling.html) outputs can be converted to [maf](https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/#:~:text=Mutation%20Annotation%20Format%20(MAF)%20is,through%20the%20Somatic%20Aggregation%20Workflow.).
```
$ python annovar2maf.py -h
usage: annovar2maf [-h] [-t TSB] [-b BUILD] [-p {refGene,ensGene}] [-c] input
Convert annovar and bcftools-csq annotations to MAF
positional arguments:
input Annovar anotations file [Ex: myanno.hg19_multianno.txt] or a csq formatted file.
optional arguments:
-h, --help show this help message and exit
-t TSB, --tsb TSB Sample name. Default parses from the file name
-b BUILD, --build BUILD
Reference genome build [Default: hg38]
-p {refGene,ensGene}, --protocol {refGene,ensGene}
Protocol used to generate annovar annotations [Default: refGene]
-c, --csq Input file is a bcftools csq formatted output
```
### annovar2maf
```
python annovar2maf.py -t foo -b GRCh37 tests/test_mutect.refseq.hg19_multianno.txt
# For annovar annotations generated with ensGene as a protocol
python annovar2maf.py -p ensGene -t foo -b GRCh37 tests/test_mutect.ens.hg19_multianno.txt
```
### csq2maf
Similar to VEP, `bcftools csq` command can annotate variants with consequences. The program is lightweight and extremely [fast](https://samtools.github.io/bcftools/howtos/csq-calling.html)
Output can be converted to tsv with [split-vep](https://samtools.github.io/bcftools/howtos/plugin.split-vep.html) and then converted to MAF.
```
ref="Homo_sapiens.GRCh37.dna.primary_assembly.fa"
# Get the GFF files for your ref build
## GRCh38 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/Homo_sapiens.GRCh38.110.gff3.gz
## GRCh37 with and without the chr prefix
#wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.chr.gff3.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.gff3.gz
## Step-1: Below commands left normalizes the VCF, splits multi-alleleic variants, annotates vcf with variant consequences while prioritizing variants with worst consequences.
bcftools norm -f ${ref} -m -both -Oz tests/test_mutect.vcf.gz | bcftools csq -c CSQ -f ${ref} -g Homo_sapiens.GRCh37.82.gff3.gz -p a | \
bcftools +split-vep /dev/stdin -Oz -o tests/test_mutect.csq.vcf.gz -c - -s worst
## Step-2: Below command converts csq annotated vcf to tsv
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%gene\t%transcript\t%Consequence\t%amino_acid_change\t%dna_change\n' tests/test_mutect.csq.vcf.gz > tests/test_mutect.csq.tsv
## Step-3: Now Covert tsv to maf
python annovar2maf.py -c -t foo -b GRCh37 tests/test_mutect.csq.tsv
```