{"id":13639250,"url":"https://github.com/brentp/vcfanno","last_synced_at":"2025-04-12T17:45:22.616Z","repository":{"id":31242188,"uuid":"34803627","full_name":"brentp/vcfanno","owner":"brentp","description":"annotate a VCF with other VCFs/BEDs/tabixed files ","archived":false,"fork":false,"pushed_at":"2023-11-23T09:13:28.000Z","size":38902,"stargazers_count":360,"open_issues_count":34,"forks_count":56,"subscribers_count":26,"default_branch":"master","last_synced_at":"2024-10-13T02:21:14.461Z","etag":null,"topics":["annotation","bioinformatics","genomics","vcf"],"latest_commit_sha":null,"homepage":"https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brentp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-04-29T16:00:43.000Z","updated_at":"2024-10-02T00:49:18.000Z","dependencies_parsed_at":"2024-01-03T06:48:07.312Z","dependency_job_id":"db243dec-d9c7-4a0d-877b-ef04cfb7d743","html_url":"https://github.com/brentp/vcfanno","commit_stats":null,"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fvcfanno","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fvcfanno/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fvcfanno/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fvcfanno/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brentp","download_url":"https://codeload.github.com/brentp/vcfanno/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248609541,"owners_count":21132915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotation","bioinformatics","genomics","vcf"],"created_at":"2024-08-02T01:00:59.004Z","updated_at":"2025-04-12T17:45:22.592Z","avatar_url":"https://github.com/brentp.png","language":"Go","funding_links":[],"categories":["Variant Analysis and Manipulation","Next Generation Sequencing"],"sub_categories":["VCF File Utilities"],"readme":"vcfanno\n=======\n\u003c!--\nbuild:\n CGO_ENABLED=0 GOARCH=amd64 go build -o vcfanno_linux64 --ldflags '-extldflags \"-static\"' vcfanno.go\n GOOS=darwin GOARCH=amd64 CGO_ENABLED=0 go build -o vcfanno_osx --ldflags '-extldflags \"-static\"' vcfanno.go\n--\u003e\n\n\n[![Build Status](https://app.travis-ci.com/brentp/vcfanno.svg?branch=master)](https://app.travis-ci.com/brentp/vcfanno)\n[![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](http://brentp.github.io/vcfanno/)\n\nIf you use vcfanno, please cite [the paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5)\n\nOverview\n========\n\nvcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files.\nIt uses a simple conf file to allow the user to specify the source annotation files and fields and how they will be\nadded to the info of the query VCF.\n\n+ For VCF, values are pulled by name from the INFO field with special-cases of *ID* and *FILTER* to pull from those VCF columns.\n+ For BED, values are pulled from (1-based) column number.\n+ For BAM, depth (`count`), \"mapq\" and \"seq\" are currently supported.\n\n`vcfanno` is written in [go](http://golang.org) and it supports custom user-scripts written in lua.\nIt can annotate more than 8,000 variants per second with 34 annotations from 9 files on a modest laptop and over 30K variants per second using 12 processes on a server.\n\nWe are actively developing `vcfanno` and appreciate feedback and bug reports.\n\n\u003cimg src=\"https://raw.githubusercontent.com/brentp/vcfanno/master/docs/img/vcfanno-overview-final.png\" width=\"676\" height=\"367\" /\u003e\n\nUsage\n=====\n\nAfter downloading the [binary for your system](https://github.com/brentp/vcfanno/releases/) (see section below) usage looks like:\n\n```Shell\n  ./vcfanno -lua example/custom.lua example/conf.toml example/query.vcf.gz\n```\n\nWhere conf.toml looks like:\n\n```\n[[annotation]]\nfile=\"ExAC.vcf\"\n# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF\nfields = [\"AC_AFR\", \"AC_AMR\", \"AC_EAS\", \"ID\", \"FILTER\"]\nops=[\"self\", \"self\", \"min\", \"self\", \"self\"]\nnames=[\"exac_ac_afr\", \"exac_ac_amr\", \"exac_ac_eas\", \"exac_id\", \"exac_filter\"]\n\n[[annotation]]\nfile=\"fitcons.bed\"\ncolumns = [4, 4]\nnames=[\"fitcons_mean\", \"lua_sum\"]\n# note the 2nd op here is lua that has access to `vals`\nops=[\"mean\", \"lua:function sum(t) local sum = 0; for i=1,#t do sum = sum + t[i] end return sum / #t end\"]\n\n[[annotation]]\nfile=\"example/ex.bam\"\nnames=[\"ex_bam_depth\"]\nfields=[\"depth\", \"mapq\", \"seq\"]\nops=[\"count\", \"mean\", \"concat\"]\n```\n\nSo from `ExAC.vcf` we will pull the fields from the info field and apply the corresponding\n`operation` from the `ops` array. Users can add as many `[[annotation]]` blocks to the\nconf file as desired. Files can be local as above, or available via http/https.\n\nSee the additional usage section at the bottom for more.\n\n\nExample\n-------\n\nThe example directory contains the data and conf for a full example. To run, download\nthe [appropriate binary](https://github.com/brentp/vcfanno/releases/) for your system.\n\nThen, you can annotate with:\n\n```Shell\n./vcfanno -p 4 -lua example/custom.lua example/conf.toml example/query.vcf.gz \u003e annotated.vcf\n```\n\nAn example INFO field row before annotation (pos 98683):\n```\nAB=0.282443;ABP=56.8661;AC=11;AF=0.34375;AN=32;AO=45;CIGAR=1X;TYPE=snp\n```\n\nand after:\n```\nAB=0.2824;ABP=56.8661;AC=11;AF=0.3438;AN=32;AO=45;CIGAR=1X;TYPE=snp;AC_AFR=0;AC_AMR=0;AC_EAS=0;fitcons_mean=0.061;lua_sum=0.061\n```\n\nTypecasting values\n------------------\n\nBy default, using `ops` of `mean`,`max`,`sum`,`div2` or `min` will result in `type=Float`,\nusing `self` will get the type from the annotation VCF and other fields will have `type=String`.\nIt's possible to add field type info to the field name. To change the field type add `_int`\nor `_float` to the field name. This suffix will be parsed and removed, and your field\nwill be of the desired type. \n\nOperations\n==========\n\nIn most cases, we will have a single annotation entry for each entry (variant)\nin the query VCF, in which case the `self` op is the best choice. However, it is\npossible that there will be multiple annotations from a single annotation file--in\nthis case, the op determines how the many values are `reduced`. Valid operations are:\n\n + `lua:$lua` // see section below for more details\n + `self`     // pull directly from the annotation and handle multi-allelics\n + `concat`   // comma delimited list of output\n + `count`    // count the number of overlaps\n + `div2`     // given two values a and b, return a / b\n + `first`   // take only the first value\n + `flag`   // presense/absence via VCF flag\n + `max`   // numbers only\n + `mean`   // numbers only\n + `min`   // numbers only\n + `sum`   // numbers only\n + `uniq`   // comma-delimited list of uniq values\n + `by_alt`   // comma-delimited by alt (Number=A), pipe-delimited (|) for multiple annos for the same alt.\n\nThere are some operations that are only for `postannotation`:\n \n + `delete`   // remove fields from the query VCF's INFO\n + `setid`    // set the ID file of the query VCF with values from its INFO\n\nIn nearly all cases, **if you are annotating with a VCF, use `self`**\n\nNote that when the file is a BAM, the operation is determined by the field name ('seq', 'mapq', 'DP2', 'coverage' are supported).\n\nPostAnnotation\n==============\nOne of the most powerful features of `vcfanno` is the embedded scripting language, lua, combined with *postannotation*.\n`[[postannotation]]` blocks occur after all the annotations have been applied. They are similar, but in the fields\ncolumn, they request a number of columns from the query file (including the new columns added in annotation). For example\nif we have AC and AN columns indicating the alternate count and the number of chromosomes, respectively, we could create\na new allele frequency column, *AF*, with this block:\n\n```\n[[postannotation]]\nfields=[\"AC\", \"AN\"]\nop=\"lua:AC / AN\"\nname=\"AF\"\ntype=\"Float\"\n```\n\nwhere `type` is one of the types accepted in VCF format, `name` is the name of the field that is created, `fields`\nindicates the fields (from the INFO) that will be available to the op, and `op` indicates the action to perform. This can be quite\npowerful. For an extensive example that demonstrates the utility of this type of approach, see\n[docs/examples/clinvar_exac.md](http://brentp.github.io/vcfanno/examples/clinvar_exac/).\n\nA user can set the ID field of the VCF in a `[[postannotation]]` block by using `name=ID`. For example:\n\n```\n[[postannotation]]\nname=\"ID\"\nfields=[\"other_field\", \"ID\"]\nop=\"lua:other_field .. ';' .. ID\"\ntype=\"String\"\n```\n\nwill take the value in `other_field`, concatenate it with the existing ID, and set the ID to that value.\n\nsee the `setid` function in `examples/custom.lua` for a more robust method of doing this.\n\nAdditional Usage\n================\n\n-ends\n-----\n\nFor annotating large variants, such as CNVs or structural variants (SVs), it can be useful to\nannotate the *ends* of the variant in addition to the region itself. To do this, specify the `-ends`\nflag to `vcfanno`. e.g.:\n```Shell\nvcfanno -ends example/conf.toml example/query.vcf.gz\n```\nIn this case, the `names` field in the `conf` file contains \"fitcons\\_mean\". The output will contain\n`fitcons_mean` as before along with `left_fitcons_mean` and `right_fitcons_mean` for any variants\nthat are longer than 1 base. The *left* end will be for the single-base at the lowest base of the variant\nand the *right* end will be for the single base at the higher numbered base of the variant.\n\n-permissive-overlap\n-------------------\n\nBy default, when annotating with a variant, in addition to the overlap requirement, the variants must share\nthe same position, the same reference allele and at least one alternate allele (this is only used for\nvariants, not for BED/BAM annotations). If this flag is specified, only overlap testing is used and shared\nREF/ALT are not required.\n\n-p\n--\n\nSet to the number of processes that `vcfanno` can use during annotation. `vcfanno` parallelizes well\nup to 15 or so cores.\n\n-lua\n----\n\nCustom in ops (lua). For use when the built-in `ops` don't supply the needed reduction.\n\nWe embed the lua engine [go-lua](https://github.com/yuin/gopher-lua) so that it's\npossible to create a custom op if it is not provided. For example if the user wants to\n\n    \"lua:function sum(t) local sum = 0; for i=1,#t do sum = sum + t[i] end return sum end\"\n\nwhere the last value (in this case sum) is returned as the annotation value. It is encouraged\nto instead define lua functions in a separate `.lua` file and point to it when calling\n`vcfanno` using the `-lua` flag. So, in an external file, \"some.lua\", instead put:\n\n```lua\nfunction sum(t)\n    local sum = 0\n    for i=1,#t do\n        sum = sum + t[i]\n    end\n    return sum\nend\n```\n\nAnd then the above custom op would be: \"lua:sum(vals)\". (note that there's a sum op provided\nby `vcfanno` which will be faster).\n\nThe variables `vals`, `chrom`, `start`, `stop`, `ref`, `alt` from the currently\nvariant will all be available in the lua code. `alt` will be a table with length\nequal to the number of alternate alleles. Example usage could be:\n```\nop=\"lua:ref .. '/' .. alt[1]\"\n```\n\n\nSee [example/conf.toml](https://github.com/brentp/vcfanno/blob/master/example/conf.toml)\nand [example/custom.lua](https://github.com/brentp/vcfanno/blob/master/example/custom.lua)\nfor more examples.\n\nMailing List\n============\n[Mailing List](https://groups.google.com/forum/#!forum/vcfanno)[![Mailing List](http://www.google.com/images/icons/product/groups-32.png)](https://groups.google.com/forum/#!forum/vcfanno)\n\nInstallation\n============\n\nPlease download a static binary (executable) from [here](https://github.com/brentp/vcfanno/releases) and copy it into your '$PATH'.\nThere are no dependencies.\n\nIf you use [bioconda](https://bioconda.github.io/), you can install with: `conda install -c bioconda vcfanno`\n\n\nMulti-Allelics\n==============\n\nA multi-allelic variant is simply a site where there are multiple, non-reference alleles seen in the population. These will\nappear as e.g. `REF=\"A\", ALT=\"G,C\"`. As of version 0.2, `vcfanno` will handle these fully with op=\"self\" when the Number from\nthe VCF header is A (Number=A)\n\nFor example this table lists Alt columns query and annotation (assuming the REFs and position match) along with the values from\nthe annotation, and shows how the query INFO will be filled:\n\n| query ALTS  | anno ALTS | anno vals from INFO  | result |\n| ------ | ---- | ---------- | ------- |\n| C,G    | C,G  | 22,23      | 22,23   |\n| C,G    | C,T  | 22,23      | 22,.    |\n| C,G    | T,G  | 22,23      | .,23    |\n| G,C    | C,G  | 22,23      | 23,22   |\n| C,G    | C    | YYY        | YYY,.   |\n| G,C,T  | C    | YYY        | .,YYY,. |\n| C,T    | G    | YYY        | .,.     |\n| T,C    | C,T  | AA,BB      | BB,AA   |\n\nNote the flipped values in the result column, and that values that are not present in the annotation are filled with '.' as a place-holder.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrentp%2Fvcfanno","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrentp%2Fvcfanno","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrentp%2Fvcfanno/lists"}