{"id":13751422,"url":"https://github.com/vgteam/vg","last_synced_at":"2026-02-09T23:06:44.188Z","repository":{"id":21409794,"uuid":"24727800","full_name":"vgteam/vg","owner":"vgteam","description":"tools for working with genome variation graphs","archived":false,"fork":false,"pushed_at":"2026-02-06T03:54:32.000Z","size":171934,"stargazers_count":1295,"open_issues_count":891,"forks_count":214,"subscribers_count":41,"default_branch":"master","last_synced_at":"2026-02-06T05:18:53.469Z","etag":null,"topics":["dna","genome-graph","genomics","graph","variation-graph"],"latest_commit_sha":null,"homepage":"https://biostars.org/tag/vg/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vgteam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2014-10-02T16:54:27.000Z","updated_at":"2026-02-06T03:26:22.000Z","dependencies_parsed_at":"2024-03-25T16:30:08.913Z","dependency_job_id":"f7f9d4e4-c7d5-4a1a-ae37-9355163c249a","html_url":"https://github.com/vgteam/vg","commit_stats":{"total_commits":12762,"total_committers":81,"mean_commits":"157.55555555555554","dds":0.6672935276602414,"last_synced_commit":"8494a52db473416cbb41230d49f12ede8604cfd5"},"previous_names":["ekg/vg"],"tags_count":87,"template":false,"template_full_name":null,"purl":"pkg:github/vgteam/vg","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgteam%2Fvg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgteam%2Fvg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgteam%2Fvg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgteam%2Fvg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vgteam","download_url":"https://codeload.github.com/vgteam/vg/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgteam%2Fvg/sbom","scorecard":{"id":919640,"data":{"date":"2025-08-11","repo":{"name":"github.com/vgteam/vg","commit":"c462d7fce601e8e926e584d2c574685691f6235c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.4,"checks":[{"name":"Code-Review","score":1,"reason":"Found 1/8 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":10,"reason":"30 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/testmac.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v1.67.0 not signed: https://api.github.com/repos/vgteam/vg/releases/232324497","Warn: release artifact v1.66.0 not signed: https://api.github.com/repos/vgteam/vg/releases/222478958","Warn: release artifact v1.65.0 not signed: https://api.github.com/repos/vgteam/vg/releases/213785836","Warn: release artifact v1.64.1 not signed: https://api.github.com/repos/vgteam/vg/releases/207960248","Warn: release artifact v1.64.0 not signed: https://api.github.com/repos/vgteam/vg/releases/204822240","Warn: release artifact v1.67.0 does not have provenance: https://api.github.com/repos/vgteam/vg/releases/232324497","Warn: release artifact v1.66.0 does not have provenance: https://api.github.com/repos/vgteam/vg/releases/222478958","Warn: release artifact v1.65.0 does not have provenance: https://api.github.com/repos/vgteam/vg/releases/213785836","Warn: release artifact v1.64.1 does not have provenance: https://api.github.com/repos/vgteam/vg/releases/207960248","Warn: release artifact v1.64.0 does not have provenance: https://api.github.com/repos/vgteam/vg/releases/204822240"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Info: Possibly incomplete results: error parsing shell code: not a valid arithmetic operator: seq: test/t/05_vg_find.t:0","Info: Possibly incomplete results: error parsing shell code: not a valid arithmetic operator: seq: test/t/16_vg_msga.t:0","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/testmac.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/vgteam/vg/testmac.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/testmac.yml:39: update your workflow using https://app.stepsecurity.io/secureworkflow/vgteam/vg/testmac.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/testmac.yml:42: update your workflow using https://app.stepsecurity.io/secureworkflow/vgteam/vg/testmac.yml/master?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:6","Warn: containerImage not pinned by hash: Dockerfile:17","Warn: containerImage not pinned by hash: Dockerfile:50","Warn: containerImage not pinned by hash: Dockerfile:83","Warn: containerImage not pinned by hash: Dockerfile:105","Warn: containerImage not pinned by hash: Dockerfile.static:3: pin your Docker image by updating ubuntu:22.04 to ubuntu:22.04@sha256:1aa979d85661c488ce030ac292876cf6ed04535d3a237e49f61542d8e5de5ae0","Warn: downloadThenRun not pinned by hash: Dockerfile:88","Warn: npmCommand not pinned by hash: Dockerfile:88","Warn: npmCommand not pinned by hash: doc/test-docs.sh:10","Warn: pipCommand not pinned by hash: vgci/post-report:217","Warn: pipCommand not pinned by hash: vgci/vgci.sh:318","Warn: pipCommand not pinned by hash: vgci/vgci.sh:319","Warn: pipCommand not pinned by hash: vgci/vgci.sh:345","Warn: pipCommand not pinned by hash: vgci/vgci.sh:350","Warn: pipCommand not pinned by hash: vgci/vgci.sh:354","Warn: pipCommand not pinned by hash: vgci/vgci.sh:365","Warn: pipCommand not pinned by hash: vgci/vgci.sh:456","Warn: npmCommand not pinned by hash: .github/workflows/testmac.yml:64","Info:   0 out of   3 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 npmCommand dependencies pinned","Info:   0 out of   8 pipCommand dependencies pinned","Info:   0 out of   6 containerImage dependencies pinned","Info:   0 out of   1 downloadThenRun dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}}]},"last_synced_at":"2025-08-25T00:30:42.406Z","repository_id":21409794,"created_at":"2025-08-25T00:30:42.406Z","updated_at":"2025-08-25T00:30:42.406Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29284863,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T21:57:15.303Z","status":"ssl_error","status_checked_at":"2026-02-09T21:57:11.537Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dna","genome-graph","genomics","graph","variation-graph"],"created_at":"2024-08-03T09:00:44.625Z","updated_at":"2026-02-09T23:06:44.181Z","avatar_url":"https://github.com/vgteam.png","language":"C++","funding_links":[],"categories":["C++","A list of software capable of analyzing mainly **eukaryotic** genomes for pangenomics.","Genomics Software","Ranked by starred repositories"],"sub_categories":["Articles and References"],"readme":"\u003c!-- !test program bash -eo pipefail --\u003e\n# vg\n\n[![Join the chat at https://gitter.im/vgteam/vg](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/vgteam/vg?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge) [![Latest Release](https://img.shields.io/github/release/vgteam/vg.svg)](https://github.com/vgteam/vg/releases/latest) \n[![Doxygen API Documentation](https://img.shields.io/badge/doxygen-docs-firebrick.svg)](https://vgteam.github.io/vg/)\n[![vg man page](https://img.shields.io/badge/manpage-seagreen.svg)](https://github.com/vgteam/vg/wiki/vg-manpage)\n\n## variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods\n\n![Variation graph](https://raw.githubusercontent.com/vgteam/vg/master/doc/figures/vg_logo_small.png)\n\n_Variation graphs_ provide a succinct encoding of the sequences of many genomes. A variation graph (in particular as implemented in vg) is composed of:\n\n* _nodes_, which are labeled by sequences and ids\n* _edges_, which connect two nodes via either of their respective ends\n* _paths_, which describe genomes, sequence alignments, and annotations (such as gene models and transcripts) as walks through nodes connected by edges\n\nThis model is similar to sequence graphs that have been used in assembly and multiple sequence alignment.\n\nPaths provide coordinate systems relative to genomes encoded in the graph, allowing stable mappings to be produced even if the structure of the graph is changed.\nThe variation graph model makes this embedding explicit and essential.\nTools in vg maintain paths as immutable during transformations of the graph.\nThey use paths to project graph-relative data into reference-relative coordinate spaces.\nPaths provide stable coordinates for graphs built in different ways from the same input sequences.\n\n![example variation graph](https://raw.githubusercontent.com/vgteam/vg/master/doc/figures/smallgraph.png)\n\n## Citing VG\n\nPlease cite:\n\n* [The VG Paper](https://doi.org/10.1038/nbt.4227) when using `vg`\n* [The VG Giraffe Paper](https://doi.org/10.1126/science.abg8871) when using `vg giraffe`\n* [The Long Read Giraffe Paper](https://doi.org/10.1101/2025.09.29.678807) when using `vg giraffe`'s chaining modes (`hifi`, `r10`, `chaining-sr`)\n* [The VG Call Paper](https://doi.org/10.1186/s13059-020-1941-7) when SV genotyping with `vg call`\n* [The GBZ Paper](https://doi.org/10.1093/bioinformatics/btac656) when using GBZ\n* [The HPRC Paper](https://doi.org/10.1038/s41586-023-05896-x) when using `vg deconstruct`\n* [The Snarls Paper](https://doi.org/10.1089/cmb.2017.0251) when using `vg snarls`\n* [The Personalized Pangenome Paper](https://doi.org/10.1101/2023.12.13.571553) when using `vg haplotypes` and/or `vg giraffe --haplotype-name`\n\n## Support \n\nWe maintain a support forum on biostars: https://www.biostars.org/tag/vg/\n\n## Installation\n\n### Download Releases\n\nThe easiest way to get vg is to download one of our release builds for Linux. We have a 6-week release cadence, so our builds are never too far out of date.\n\n**[![Download Button](doc/figures/download-linux.png)](https://github.com/vgteam/vg/releases/latest)**  \n**[Download the latest vg release for Linux](https://github.com/vgteam/vg/releases/latest)**\n\n**For MacOS**, see [Building on MacOS](#building-on-macos).\n\n### Building on Linux\n\nIf you don't want to or can't use a pre-built release of vg, or if you want to become a vg developer, you can build it from source instead.\n\n#### Linux: Clone VG\n\nFirst, obtain the repo and its submodules:\n\n    git clone --recursive https://github.com/vgteam/vg.git\n    cd vg\n\n#### Linux: Install Dependencies\n    \nThen, install VG's dependencies. You'll need the Protobuf and Jansson development libraries installed, and to run the tests you will need:\n* `jq`, `bc`, `rs`, and `parallel`\n* `hexdump` and `column` from `bsdmainutils`\n* [`npm` for testing documentation examples](https://github.com/anko/txm).\n\nOn Ubuntu 22.04, you should be able to do:\n\n    make get-deps\n\nIf you get complaints that `sudo` is not found, install it:\n\n    apt update\n    apt install sudo\n\nIf you get a bunch of errors like `E: Unable to locate package build-essential`, make sure your package index files are up to date by running:\n\n    sudo apt update\n    \nOn other distros, or if you do not have root access, you will need to perform the equivalent of:\n\n    sudo apt-get install build-essential git cmake pkg-config libncurses-dev libbz2-dev  \\\n                         protobuf-compiler libprotoc-dev libprotobuf-dev libjansson-dev \\\n                         automake gettext autopoint libtool jq bsdmainutils bc rs parallel \\\n                         npm curl unzip redland-utils librdf-dev bison flex gawk lzma-dev \\\n                         liblzma-dev liblz4-dev libffi-dev libcairo-dev libboost-all-dev \\\n                         libzstd-dev pybind11-dev python3-pybind11 libssl-dev\n                         \nAt present, you will need GCC version 9 or greater, with support for C++17, to compile vg. (Check your version with `gcc --version`.) GCC up to 11.4.0 is supported.\n\nOther libraries may be required. Please report any build difficulties.\n\nNote that a 64-bit OS is required.\n\n#### Linux: Build\n\nWhen you are ready, build with `make`. You can use `make -j16` to run 16 build threads at a time, which greatly accelerates the process. If you have more CPU cores, you can use higher numbers.\n\nNote that vg can take anywhere from 10 minutes to more than an hour to compile depending on your machine and the number of threads used. \n\nYou can also produce a static binary with `make static`, assuming you have static versions of all the dependencies installed on your system.\n\n#### Linux: Run\n\nOnce vg is built, the binary will be at `bin/vg` inside the vg repository directory. You can run it with:\n\n```\n./bin/vg\n```\n\nYou can also add its directory to your `PATH` enviornment variable, so that you can invoke `vg` from any directory. To do that on Bash, use this command from the vg repository directory:\n\n```\necho 'export PATH=\"${PATH}:'\"$(pwd)\"'/bin\"' \u003e\u003e~/.bashrc\n```\n\nThen close your terminal and open a new one. Run `vg` to make sure it worked.\n\nIf it did not work, make sure that you have a `.bash_profile` file in your home directory that will run your `.bashrc`:\n```\nif [ -f ~/.bashrc ]; then\n   source ~/.bashrc\nfi\n```\n\n### Building on MacOS\n\n#### Mac: Clone VG\n\nThe first step is to clone the vg repository:\n\n    git clone --recursive https://github.com/vgteam/vg.git\n    cd vg\n\n#### Mac: Install Dependencies\n\nVG depends on a number of packages being installed on the system where it is being built. Dependencies can be installed using either [MacPorts](https://www.macports.org/install.php) or [Homebrew](http://brew.sh/).\n\n##### Using MacPorts\n\nYou can use MacPorts to install VG's dependencies:\n\n    sudo port install libtool protobuf3-cpp jansson jq cmake pkgconfig autoconf automake libtool coreutils samtools redland bison gperftools md5sha1sum rasqal gmake autogen cairo libomp boost zstd pybind11 openssl\n    \n\n##### Using Homebrew\n\nHomebrew provides another package management solution for OSX, and may be preferable to some users over MacPorts. VG ships a `Brewfile` describing its Homebrew dependencies, so from the root vg directory, you can install dependencies, and expose them to vg, like this:\n\n    # Install all the dependencies in the Brewfile\n    brew bundle\n    \n#### Mac: Build\n\nWith dependencies installed, VG can now be built:\n\n    make\n\nAs with Linux, you can add `-j16` or other numbers at the end to run multiple build tasks at once, if your computer can handle them.\n    \n**Note that static binaries cannot yet be built for Mac.**\n\nThe vg Mac build targets whatever the current version of Apple Clang is, and whatever version of Apple Clang is provided by our Github Actions Mac CI system. If your Clang is up to date and vg does not build for you, please open an issue.\n\n#### Mac: Run\n\nOnce vg is built, the binary will be at `bin/vg` inside the vg repository directory. You can run it with:\n\n```\n./bin/vg\n```\n\nYou can also add its directory to your `PATH` enviornment variable, so that you can invoke `vg` from any directory. To do that on the default `zsh` Mac shell, use this command from the vg repository directory:\n\n```\necho 'export PATH=\"${PATH}:'\"$(pwd)\"'/bin\"' \u003e\u003e~/.zshrc\n```\n\nThen close your terminal and open a new one. Run `vg` to make sure it worked.\n\n##### Migrate a VG installation from x86 to ARM\n\nThe Mac platform is moving to ARM, with Apple's M1, M1 Pro, M1 Max, and subsequent chip designs. The vg codebase supports ARM on Mac as well as on Linux. **The normal installation instructions work on a factory-fresh ARM Mac**.\n\nHowever, it is easy to run into problems when **migrating a working vg build environment** or **migrating MacPorts or Homebrew** from x86_64 to ARM. The ARM machine can successfully run x86_64 tools installed via Macports or Homebrew on the old machine, but vg can only build properly on ARM if you are using ARM versions of the build tools, like `make` and CMake.\n\nSo, after migrating to an ARM Mac using e.g. Apple's migration tools:\n\n1. Uninstall MacPorts and its packages, if they were migrated from the old machine. Only an ARM MacPorts install can be used to provide dependencies for vg on ARM.\n2. Uninstall Homebrew and its packages, if they were migrated. Similarly, only an ARM Homebrew install will work.\n3. Reinstall one of MacPorts or Homebrew. Make sure to use the M1 or ARM version.\n4. Use the package manager you installed to install system dependencies of vg, such as CMake, [as documented above](#install-dependencies).\n5. Clean vg with `make clean`. This *should* remove all build artefacts.\n6. Build vg again with `make`.\n\nIf you still experience build problems after this, delete the whole checkout and check out the code again; `make clean` is not under CI test and is not always up to date with the rest of the build system.\n\nWhether or not that helps, please then [open an issue](https://github.com/vgteam/vg/issues/new) so we can help fix the build or fix `make clean`.\n\n## Usage\n\n### Variation graph construction\n\n#### From VCF\n\n\u003e [!NOTE]\n\u003e See the `vg autoindex` examples below for how to use that tool in place of `vg construct` to build and index graphs in a single step.\n\nOne way to build a graph with `vg` is to `construct` it from variant calls using a reference FASTA file and VCF file. If you're working in vg's `test/` directory:\n\n\u003c!-- !test check Construct the small graph --\u003e\n```sh\nvg construct -r small/x.fa -v small/x.vcf.gz \u003ex.vg\n```\n\nNote that to build a graph, an index of the VCF file is required. The VCF index file can be generated using the `tabix` command provided by SAMtools (e.g. `tabix -p vcf x.vcf.gz` on the command line).\n\n#### From Assemblies\n\nYou can also build a graph (and indexes for mapping with vg) from a set of genome assemblies (FASTA), as opposed to variant calls as described above, using [Minigraph-Cactus](https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md).\n\n### Importing and exporting different graph formats\n\n`vg` supports [many formats](https://github.com/vgteam/vg/wiki/File-Formats), the three most important are:\n\n* `PackedGraph (.vg)` : This is `vg`'s native format. It supports edits of all kinds (to topology and paths), but can be inefficient at large scales, especially with many paths.\n* `GFA (.gfa)` : [GFA](https://github.com/GFA-spec/GFA-spec) is a standard text-based format and usually the best way to exchange graphs between `vg` and other pangenome tools. `vg` can also operate on (**uncompressed**) GFA files directly, by way of using a `PackedGraph` representation in memory (and therefore sharing that format's scaling concerns and edit-ability).\n* `GBZ (.gbz)` : [GBZ](https://github.com/jltsiren/gbwtgraph/blob/master/SERIALIZATION.md) is a highly-compressed format that uses much less space to store paths than the above formats, but at the cost of not allowing general edits to the graph.\n\nYou can query the format of any graph using `vg stats -F`.\n\n#### Importing\n\nIn general, you will build and index `vg` graphs using `vg autoindex` (from GFA or VCF) or Minigraph-Cactus (FASTAs). You can also import `GFA` files from other tools such as [ODGI](https://github.com/pangenome/odgi) and [PGGB](https://github.com/pangenome/pggb) using `vg convert -g`.\n\n#### Exporting\n\nYou can convert any graph to `GFA` using `vg convert -f`.  By default, `vg` uses [GFA v1.1](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md#w-walk-line-since-v11) where paths are represented as W-lines. To use P-lines instead (GFA v1.0), use `vg convert -fW`.\n\n#### Path Types\n\nThe `GBZ` format makes a distinction between `REFERENCE` and `HAPLOTYPE` paths. `REFERENCE` paths can be used as coordinate systems but are more expensive to store. `HAPLOTYPE` paths are highly compressed but cannot be used for position lookups. In the [HPRC](https://github.com/human-pangenomics/hpp_pangenome_resources/) graphs for example, contigs from `GRCh38` and `CHM13(T2T)` are `REFERENCE` paths and all other samples are `HAPLOTYPE` paths.\n\nThe distinction between `REFERENCE` and `HAPLOTYPE` paths is carried over into the other formats such as `.vg` and `.gfa` to facilitate conversion and inter-operation. In `.gfa`, `REFERENCE` paths are P-Lines, or W-lines whose sample names are flagged in the header. W-lines whose names are not flagged in the header are `HAPLOTYPE` paths. In `.vg` they are denoted using a naming convention.  \n\nSee the [Path Metadata WIKI](https://github.com/vgteam/vg/wiki/Path-Metadata-Model) for more details.\n\n\u003e [!WARNING]\n\u003e `GBZ` is the only format that supports efficiently loading large numbers of `HAPLOTYPE` paths in `vg`.  You may run into issues trying to load whole-genome graphs with thousands of `HAPLOTYPE` paths from `.vg` or `.gfa` files.  `vg convert -H` can be used to drop `HAPLOTYPE` paths, allowing the graph to be more easily loaded in other formats. \n\n### Viewing\n\n\u003e [!NOTE]\n\u003e It is best to use the newer `vg convert` tool (described above) for GFA conversion\n\n`vg view` provides a way to convert the graph into various formats:\n\n\u003c!-- !test check Convert the small graph to different formats --\u003e\n```sh\n# GFA output\nvg view x.vg \u003ex.gfa\n\n# dot output suitable for graphviz\nvg view -d x.vg \u003ex.dot\n\n# And if you have a GAM file\ncp small/x-s1337-n1.gam x.gam\n\n# json version of binary alignments\nvg view -a x.gam \u003ex.json\n```\n\n### Mapping\n\nIf you have more than one sequence, or you are working on a large graph, you will want to map rather than merely aligning.\n\nThere are multiple read mappers in `vg`:\n\n* `vg giraffe` is designed to be fast for highly accurate short reads, against graphs with haplotype information. It also now has a chaining mode to use for long reads.\n* `vg map` is a general-purpose read mapper.\n* `vg mpmap` does \"multi-path\" mapping, to allow describing local alignment uncertainty. [This is useful for transcriptomics.](#Transcriptomic-analysis)\n\nThe graph alignment output format of these mappers (GAM/GAMP) may be [QC'ed by `vg filter --tsv-out`](https://github.com/vgteam/vg/wiki/Getting-alignment-statistics-with-vg-filter).\n\n#### Mapping with `vg giraffe`\n\nTo use `vg giraffe` to map reads, you will first need to prepare indexes. This is best done using `vg autoindex`. In order to get `vg autoindex` to use haplotype information from a VCF file, you can give it the VCF and the associated linear reference directly.\n\n\u003c!-- !test check Simulate and map back with surjection with Giraffe --\u003e\n```sh\n# construct the graph and indexes (paths below assume running from `vg/test` directory)\nvg autoindex --workflow giraffe -r small/x.fa -v small/x.vcf.gz -p x\n\n# simulate a bunch of 150bp reads from the graph, into a GAM file of reads aligned to a graph\nvg sim -n 1000 -l 150 -x x.giraffe.gbz -a \u003e x.sim.gam\n# now re-map these reads against the graph, and get BAM output in linear space\n# FASTQ input uses -f instead of -G.\nvg giraffe -Z x.giraffe.gbz -G x.sim.gam -o BAM \u003e aln.bam\n```\n\n[More information on using `vg giraffe` can be found on the `vg` wiki.](https://github.com/vgteam/vg/wiki/Mapping-short-reads-with-Giraffe)\n\n#### Mapping with `vg map`\n\nIf your graph is large, you will want to use `vg index` to store the graph and `vg map` to align reads. `vg map` implements a kmer based seed and extend alignment model that is similar to that used in aligners like novoalign or MOSAIK. First an on-disk index is built with `vg index` which includes the graph itself and kmers of a particular size. When mapping, any kmer size shorter than that used in the index can be employed, and by default the mapper will decrease the kmer size to increase sensitivity when alignment at a particular _k_ fails.\n\n\u003c!-- !test check Simulate and map back with surjection with map --\u003e\n```sh\n# construct the graph (paths below assume running from `vg/test` directory)\nvg construct -r small/x.fa -v small/x.vcf.gz \u003e x.vg\n\n# store the graph in the xg/gcsa index pair\nvg index -x x.xg -g x.gcsa -k 16 x.vg\n\n# align a read to the indexed version of the graph\n# note that the graph file is not opened, but x.vg.index is assumed\nvg map -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -x x.xg -g x.gcsa \u003e read.gam\n\n# simulate a bunch of 150bp reads from the graph, one per line\nvg sim -n 1000 -l 150 -x x.xg \u003e x.sim.txt\n# now map these reads against the graph to get a GAM\nvg map -T x.sim.txt -x x.xg -g x.gcsa \u003e aln.gam\n\n# surject the alignments back into the reference space of sequence \"x\", yielding a BAM file\nvg surject -x x.xg -b aln.gam \u003e aln.bam\n\n# or alternatively, surject them to BAM in the call to map\nvg sim -n 1000 -l 150 -x x.xg \u003e x.sim.txt\nvg map -T x.sim.txt -x x.xg -g x.gcsa --surject-to bam \u003e aln.bam\n```\n\n### Augmentation\n\nVariation from alignments can be embedded back into the graph.  This process is called augmentation and can be used for *de novo* variant calling, for example (see below).\n\n\u003e [!WARNING]\n\u003e Using `vg augment` for variant calling remains very experimental. It is not at all recommended for structural variant calling, and even for small variants, you will often get much more accurate results (at least on human) by projecting your alignment to BAM and running a linear variant caller such as DeepVariant. \n\n\u003c!-- !test check Augment a graph --\u003e\n```sh\n# augment the graph with all variation from the GAM except that implied by soft clips, saving to aug.vg.  aug.gam contains the same reads as aln.gam but mapped to aug.vg\nvg augment x.vg aln.gam -A aug.gam \u003e aug.vg\n\n# augment the graph with all variation from the GAM, saving each mapping as a path in the graph.\n# softclips of alignment paths are preserved (`-S`).\n# Note, this can be much less efficient than the above example if there are many alignments in the GAM\nvg augment x.vg aln.gam -i -S \u003e aug_with_paths.vg\n```\n\n### Variant Calling\n\n\u003e [!NOTE]\n\u003e More information can be found in the [WIKI](https://github.com/vgteam/vg/wiki/SV-Genotyping-and-variant-calling).\n\n#### Calling variants using read support\n\nThe following examples show how to generate a VCF with vg using read support.  They depend on output from the Mapping and Augmentation examples above.  Small variants and SVs can be called using the same approach.  **Currently, it is more accuracte for SVs**.  \n\nCall only variants that are present in the graph:\n\n\u003c!-- !test check Pack and call --\u003e\n```sh\n# Compute the read support from the GAM\n# -Q 5: ignore mapping and base qualitiy \u003c 5\nvg pack -x x.xg -g aln.gam -Q 5  -o aln.pack\n\n# Generate a VCF from the support.  \nvg call x.xg -k aln.pack \u003e graph_calls.vcf\n```\n\nBy default, `vg call` omits `0/0` variants and tries to normalize alleles to make the VCF more compact.  Both these steps can make it difficult to compare the outputs from different samples as the VCFs will have different coordinates even though they were created using the same graph.  The `-a` option addresses this by calling every snarl using the same coordinates and including reference calls.  Outputs for different samples can be combined with `bcftools merge -m all`.   \n\u003c!-- !test check Call from pack without normalizing --\u003e\n```\nvg call x.xg -k aln.pack -a \u003e snarl_genotypes.vcf\n```\n\nIn order to also consider *novel* variants from the reads, use the augmented graph and GAM (as created in the \"Augmentation\" example using `vg augment -A`):\n\n\u003e [!WARNING]\n\u003e Using `vg augment` for variant calling remains very experimental. It is not at all recommended for structural variant calling, and even for small variants, you will often get much more accurate results (at least on human) by projecting your alignment to BAM and running a linear variant caller such as DeepVariant. \n\n\u003c!-- !test check Call from augmentation --\u003e\n```sh\n# Index our augmented graph\nvg index aug.vg -x aug.xg\n\n# Compute the read support from the augmented GAM (ignoring qualitiy \u003c 5, and 1st and last 5bp of each read)\nvg pack -x aug.xg -g aug.gam -Q 5 -s 5 -o aln_aug.pack\n\n# Generate a VCF from the support\nvg call aug.xg -k aln_aug.pack \u003e calls.vcf\n```\n\nA similar process can by used to *genotype* known variants from a VCF. To do this, the graph must be constructed from the VCF with `vg construct -a` (graphs from other sources such as `vg autoindex` and Minigraph-Cactus cannot be used):\n\n\u003c!-- !test check Genotype --\u003e\n```sh\n# Re-construct the same graph as before but with `-a`\nvg construct -r small/x.fa -v small/x.vcf.gz -a \u003e xa.vg\n\n# Index the graph with `-L' to preserve alt paths in the xg\nvg index xa.vg -x xa.xg -L\n\n# Compute the support (we could also reuse aln.pack from above)\nvg pack -x xa.xg -g aln.gam -o aln.pack\n\n# Genotype the VCF (use -v)\nvg call xa.xg -k aln.pack -v small/x.vcf.gz \u003e genotypes.vcf\n```\n\nPre-filtering the GAM before computing support can improve precision of SNP calling:\n\n\u003c!-- !test check Pre-filter GAM and call --\u003e\n```sh\n# filter secondary and ambiguous read mappings out of the GAM\nvg filter aln.gam -r 0.90 -fu -m 1 -q 15 -D 999 -x x.xg \u003e aln.filtered.gam\n\n# then compute the support from aln.filtered.gam instead of aln.gam in above etc.\nvg pack -x xa.xg -g aln.filtered.gam -o aln.pack\nvg call xa.xg -k aln.pack -v small/x.vcf.gz \u003e genotypes.vcf\n```\n\nFor larger graphs, it is recommended to compute snarls separately:\n\n\u003c!-- !test check Pre-compute snarls and call --\u003e\n```sh\nvg snarls x.xg \u003e x.snarls\n\n# load snarls from a file instead of computing on the fly\nvg call x.xg -k aln.pack -r x.snarls \u003e calls.vcf\n```\n\nNote: `vg augment`, `vg pack`, `vg call` and `vg snarls` can now all be run on directly on any graph format (ex `.gbz`, `.gfa`, `.vg`, `.xg` (except `augment`) or anything output by `vg convert`).  Operating on `.vg` or '.gfa' uses the most memory and is not recommended for large graphs.  The output of `vg pack` can only be read in conjunction with the same graph used to create it, so `vg pack x.vg -g aln.gam -o x.pack` then `vg call x.xg -k x.pack` will not work.\n\n#### Calling variants from paths in the graph\n\nInfer variants from alignments implied by paths in the graph.  This can be used, for example, to call SVs directly from a variation graph that was constructed from a multiple alignment of different assemblies:\n\n\u003c!-- !test check MSGA and deconstruct --\u003e\n```sh\n# create a graph from a multiple alignment of HLA haplotypes (from vg/test directory)\nvg msga -f GRCh38_alts/FASTA/HLA/V-352962.fa -t 1 -k 16 | vg mod -U 10 - | vg mod -c - \u003e hla.vg\n\n# index it\nvg index hla.vg -x hla.xg\n\n# generate a VCF using gi|568815592:29791752-29792749 as the reference contig.  The other paths will be considered as haploid samples\nvg deconstruct hla.xg -e -p \"gi|568815592:29791752-29792749\" \u003e hla_variants.vcf\n```\n\nHaplotype paths from `.gbz` or `.gbwt` indexes input can be considered using `-z` and `-g`, respectively.\n\nAs with `vg call`, it is best to compute snarls separately and pass them in with `-r` when working with large graphs.\n\n### Transcriptomic analysis\n\n`vg` has a number of tools to support transcriptomic analyses with spliced graphs (i.e. graphs that have annotated splice junctions added as edges into the graph). These edges can be added into an existing graph using `vg rna`. We can then perform splice-aware mapping to these graphs using `vg mpmap`. `vg` developers have also made a tool for haplotype-aware transcript quantification based on these tools in [`rpvg`](https://github.com/jonassibbesen/rpvg). The easiest way to start this pipeline is to use the `vg autoindex` subcommand to make indexes for `vg mpmap`. `vg autoindex` creates indexes for mapping from common interchange formats like FASTA, VCF, and GTF. \n\nMore information is available in the [wiki page on transcriptomics](https://github.com/vgteam/vg/wiki/Transcriptomic-analyses).\n\nWorking from the `test/` directory the following example shows how to create a spliced pangenome graph and indexes using `vg autoindex` with 4 threads:\n\n\u003c!-- !test check Autoindex for transcriptomic analysis --\u003e\n```sh\n# Create spliced pangenome graph and indexes for vg mpmap\nvg autoindex --workflow mpmap -t 4 --prefix vg_rna --ref-fasta small/x.fa --vcf small/x.vcf.gz --tx-gff small/x.gtf\n```\n\nRNA-seq reads can be mapped to the spliced pangenome graph using `vg mpmap` with 4 threads:\n\n\u003c!-- !test check Mapping using mpmap for transcriptomic analysis --\u003e\n```sh\n# Map simulated RNA-seq reads using vg mpmap\nvg mpmap -n rna -t 4 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.spliced.dist -f small/x_rna_1.fq -f small/x_rna_2.fq \u003e mpmap.gamp\n```\n\nThis will produce alignments in the multipath format. For more information on the multipath alignment format and `vg mpmap` see the [wiki page on mpmap](https://github.com/vgteam/vg/wiki/Multipath-alignments-and-vg-mpmap). Running the two commands on the small example data using 4 threads should on most machines take less than a minute.  \n\n### Alignment\n\nIf you have a small graph, you can align a sequence to the whole graph, using a full-length partial order alignment:\n\n\u003c!-- !test check Align a string to a graph --\u003e\n```sh\nvg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG x.vg\n```\n\nNote that you don't have to store the graph on disk at all, you can simply pipe it into the local aligner:\n\n\u003c!-- !test check Align a string to a piped graph --\u003e\n```sh\nvg construct -r small/x.fa -v small/x.vcf.gz | vg align -s CTACTGACAGCAGAAGTTTGCTGTGAAGATTAAATTAGGTGATGCTTG -\n```\n\nMost commands allow the streaming of graphs into and out of `vg`.\n\n### Command line interface\n\nSee the [man-page](https://github.com/vgteam/vg/wiki/vg-manpage)\n\nA variety of commands are available:\n\n- *autoindex*: construct graphs and indexes for other tools from common interchange file formats\n- *construct*: graph construction\n- *index*: index features of a graph in a disk-backed key/value store\n- *map*: map reads to a graph\n- *giraffe*: fast, haplotype-based mapping of reads to a graph\n- *mpmap*: short read mapping and multipath alignment (optionally spliced)\n- *surject*: project graph alignments onto a linear reference\n- *augment*: add variation from aligned reads into a graph\n- *call*: call variants from an augmented graph\n- *rna*: construct splicing graphs and pantranscriptomes\n- *convert*: convert graph and alignment formats\n- *combine*: combine graphs\n- *chunk*: extract or break into subgraphs\n- *ids*: node ID manipulation\n- *sim*: simulate reads by walking paths in a graph\n- *prune*: prune graphs to restrict their path complexity\n- *snarls*: find bubble-like motifs in a graph\n- *mod*: various graph transformations\n- *filter*: filter reads out of an alignment\n- *deconstruct*: create a VCF from variation in a graph\n- *paths*: traverse paths in a graph\n- *stats*: metrics describing graph properties\n\n## Implementation notes\n\n`vg` is a collection of tools based on a common data model (the variation graph) that is described by a protobuf schema (vg.proto). Data objects defined in vg.proto may be serialized via a stream pattern defined in stream.hpp. It is not necessary to write code in vg in order to interface with the algorithms defined here. Rather, it is sometimes simpler to write an external algorithm that reads and writes the same data formats.\n\n## License\n\nMIT\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgteam%2Fvg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvgteam%2Fvg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgteam%2Fvg/lists"}