{"id":15016261,"url":"https://github.com/mskcc/vcf2maf","last_synced_at":"2025-05-15T12:03:42.197Z","repository":{"id":12040243,"uuid":"14625688","full_name":"mskcc/vcf2maf","owner":"mskcc","description":"Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms","archived":false,"fork":false,"pushed_at":"2024-12-09T21:17:59.000Z","size":16655,"stargazers_count":386,"open_issues_count":92,"forks_count":219,"subscribers_count":77,"default_branch":"main","last_synced_at":"2025-05-12T21:49:40.511Z","etag":null,"topics":["isoforms","maf","perl","vcf","vep"],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mskcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-11-22T18:19:35.000Z","updated_at":"2025-04-15T01:09:17.000Z","dependencies_parsed_at":"2024-12-28T15:01:57.595Z","dependency_job_id":"23d7d804-397a-4b49-84f0-d3f45bfb0b40","html_url":"https://github.com/mskcc/vcf2maf","commit_stats":{"total_commits":339,"total_committers":19,"mean_commits":"17.842105263157894","dds":0.1327433628318584,"last_synced_commit":"f6d0c40cbe4578f4a4abb450b5da33e81900cc00"},"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Fvcf2maf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Fvcf2maf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Fvcf2maf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Fvcf2maf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mskcc","download_url":"https://codeload.github.com/mskcc/vcf2maf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254337612,"owners_count":22054253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["isoforms","maf","perl","vcf","vep"],"created_at":"2024-09-24T19:48:37.450Z","updated_at":"2025-05-15T12:03:37.169Z","avatar_url":"https://github.com/mskcc.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"vcf\u003cimg src=\"https://i.giphy.com/R6X7GehJWQYms.gif\" width=\"28\"\u003emaf\n=======\n\nTo convert a [VCF](https://samtools.github.io/hts-specs//) into a [MAF](https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format), each variant must be mapped to only one of all possible gene transcripts/isoforms that it might affect. But even within a single isoform, a `Missense_Mutation` close enough to a `Splice_Site`, can be labeled as either in MAF format, but not as both. **This selection of a single effect per variant, is often subjective. And that's what this project attempts to standardize.** The `vcf2maf` and `maf2maf` scripts leave most of that responsibility to [Ensembl's VEP](http://ensembl.org/info/docs/tools/vep/index.html), but allows you to override their \"canonical\" isoforms, or use a custom ExAC VCF for annotation. Though the most useful feature is the **extensive support in parsing a wide range of crappy MAF-like or VCF-like formats** we've seen out in the wild.\n\nQuick start\n-----------\n\nFind the [latest release](https://github.com/mskcc/vcf2maf/releases), download it, and view the detailed usage manuals for `vcf2maf` and `maf2maf`:\n\n    export VCF2MAF_URL=`curl -sL https://api.github.com/repos/mskcc/vcf2maf/releases | grep -m1 tarball_url | cut -d\\\" -f4`\n    curl -L -o mskcc-vcf2maf.tar.gz $VCF2MAF_URL; tar -zxf mskcc-vcf2maf.tar.gz; cd mskcc-vcf2maf-*\n    perl vcf2maf.pl --man\n    perl maf2maf.pl --man\n\nIf you don't have VEP installed, then [follow this gist](https://gist.github.com/ckandoth/4bccadcacd58aad055ed369a78bf2e7c). Of the many annotators out there, VEP is preferred for its large team of active coders, and its CLIA-compliant [HGVS formats](http://www.hgvs.org/mutnomen/recs.html). After installing VEP, test out `vcf2maf` like this:\n\n    perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf\n\nTo fill columns 16 and 17 of the output MAF with tumor/normal sample IDs, and to parse out genotypes and allele counts from matched genotype columns in the VCF, use options `--tumor-id` and `--normal-id`. Skip option `--normal-id` if you didn't have a matched normal:\n\n    perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --tumor-id WD1309 --normal-id NB1308\n\nVCFs from variant callers like [VarScan](http://varscan.sourceforge.net/somatic-calling.html#somatic-output) use hardcoded sample IDs TUMOR/NORMAL to name genotype columns. To have `vcf2maf` correctly locate the columns to parse genotypes, while still printing proper sample IDs in the output MAF:\n\n    perl vcf2maf.pl --input-vcf tests/test_varscan.vcf --output-maf tests/test_varscan.vep.maf --tumor-id WD1309 --normal-id NB1308 --vcf-tumor-id TUMOR --vcf-normal-id NORMAL\n\nIf VEP is installed under `/opt/vep` and the VEP cache is under `/srv/vep`, there are options available to tell `vcf2maf` where to find them:\n\n    perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --vep-path /opt/vep --vep-data /srv/vep\n\nIf you want to skip running VEP and need a minimalist MAF-like file listing data from the input VCF only, then use the `--inhibit-vep` option. If your input VCF contains VEP annotation, then `vcf2maf` will try to extract it. But be warned that the accuracy of your resulting MAF depends on how VEP was operated upstream. In standard operation, `vcf2maf` runs VEP with very specific parameters to make sure everyone produces comparable MAFs. So, it is strongly recommended to avoid `--inhibit-vep` unless you know what you're doing.\n\nmaf2maf\n-------\n\nIf you have a MAF or a MAF-like file that you want to reannotate, then use `maf2maf`, which simply runs `maf2vcf` followed by `vcf2maf`:\n\n    perl maf2maf.pl --input-maf tests/test.maf --output-maf tests/test.vep.maf\n\nAfter tests on variant lists from many sources, `maf2vcf` and `maf2maf` are quite good at dealing with formatting errors or \"MAF-like\" files. It even supports VCF-style alleles, as long as `Start_Position == POS`. But it's OK if the input format is imperfect. Any variants with a reference allele mismatch are kept aside in a separate file for debugging. The bare minimum columns that `maf2maf` expects as input are:\n\n    Chromosome\tStart_Position\tReference_Allele\tTumor_Seq_Allele2\tTumor_Sample_Barcode\n    1\t3599659\tC\tT\tTCGA-A1-A0SF-01\n    1\t6676836\tA\tAGC\tTCGA-A1-A0SF-01\n    1\t7886690\tG\tA\tTCGA-A1-A0SI-01\n\nSee `data/minimalist_test_maf.tsv` for a sampler. Addition of `Tumor_Seq_Allele1` will be used to determine zygosity. Otherwise, it will try to determine zygosity from variant allele fractions, assuming that arguments `--tum-vad-col` and `--tum-depth-col` are set correctly to the names of columns containing those read counts. Specifying the `Matched_Norm_Sample_Barcode` with its respective columns containing read-counts, is also strongly recommended. Columns containing normal allele read counts can be specified using argument `--nrm-vad-col` and `--nrm-depth-col`.\n\nDocker\n------\n\nAssuming you have a recent version of docker, clone the main branch and build an image as follows:\n\n    git clone git@github.com:mskcc/vcf2maf.git\n    cd vcf2maf\n    docker build -t vcf2maf:main .\n    docker builder prune -f\n\nNow you run the scripts in docker as follows:\n\n    docker run --rm vcf2maf:main perl vcf2maf.pl --help\n    docker run --rm vcf2maf:main perl maf2maf.pl --help\n\nTesting\n-------\n\nA small standalone test dataset was created by restricting VEP v112 cache/fasta to chr21 in GRCh38 and hosting that on a private server for download by CI services. We can manually fetch those as follows:\n\n    wget -P tests https://data.cyri.ac/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz\n    gzip -d tests/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz\n    wget -P tests https://data.cyri.ac/homo_sapiens_vep_112_GRCh38_chr21.tar.gz\n    tar -zxf tests/homo_sapiens_vep_112_GRCh38_chr21.tar.gz -C tests\n\nAnd the following scripts test the docker image on predefined inputs and compare outputs against expected outputs:\n\n    perl tests/vcf2maf.t\n    perl tests/vcf2vcf.t\n    perl tests/maf2vcf.t\n\nLicense\n-------\n\n    Apache-2.0 | Apache License, Version 2.0 | https://www.apache.org/licenses/LICENSE-2.0\n\nCitation\n--------\n\n    Cyriac Kandoth. mskcc/vcf2maf: vcf2maf v1.6. (2020). doi:10.5281/zenodo.593251\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Fvcf2maf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmskcc%2Fvcf2maf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Fvcf2maf/lists"}