{"id":44371001,"url":"https://github.com/maize-genetics/phg_v2","last_synced_at":"2026-02-11T20:11:05.316Z","repository":{"id":196648940,"uuid":"696835682","full_name":"maize-genetics/phg_v2","owner":"maize-genetics","description":"Practical Haplotype Graph (PHG) version 2","archived":false,"fork":false,"pushed_at":"2026-02-04T20:28:54.000Z","size":133360,"stargazers_count":30,"open_issues_count":14,"forks_count":3,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-02-05T08:35:18.466Z","etag":null,"topics":["imputation","pangenome","pangenome-graph"],"latest_commit_sha":null,"homepage":"https://phg.maizegenetics.net/","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maize-genetics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"docs/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-09-26T14:18:35.000Z","updated_at":"2026-01-14T16:22:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"6a2e7666-3764-431b-8dad-cd12798a66f0","html_url":"https://github.com/maize-genetics/phg_v2","commit_stats":null,"previous_names":["maize-genetics/phg_v2"],"tags_count":195,"template":false,"template_full_name":null,"purl":"pkg:github/maize-genetics/phg_v2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maize-genetics%2Fphg_v2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maize-genetics%2Fphg_v2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maize-genetics%2Fphg_v2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maize-genetics%2Fphg_v2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maize-genetics","download_url":"https://codeload.github.com/maize-genetics/phg_v2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maize-genetics%2Fphg_v2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29343683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T18:58:20.535Z","status":"ssl_error","status_checked_at":"2026-02-11T18:56:44.814Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["imputation","pangenome","pangenome-graph"],"created_at":"2026-02-11T20:11:04.536Z","updated_at":"2026-02-11T20:11:05.304Z","avatar_url":"https://github.com/maize-genetics.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PHG version 2\n[![PHGv2 CI](https://github.com/maize-genetics/phg_v2/actions/workflows/phgv2_ci.yml/badge.svg)](https://github.com/maize-genetics/phg_v2/actions/workflows/phgv2_ci.yml) [![codecov](https://codecov.io/gh/maize-genetics/phg_v2/graph/badge.svg?token=4BVD2QXQ1A)](https://codecov.io/gh/maize-genetics/phg_v2) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nThe Practical Haplotype Graph (PHG) is a powerful tool for \nrepresenting pangenomes. The PHG is optimized for plant breeding \nand genetics, where genomic diversity can be high, phased haplotypes \nare common (e.g. inbred lines), and imputation with low density \nmarkers is essential for breeding efficiency. This complements \nother imputation tools (e.g. [BEAGLE](https://faculty.washington.edu/browning/beagle/beagle.html)) \ndesigned explicitly for handling samples from unphased species \ncharacterized by low genetic diversity and high-density genotyping.\n\nThe PHG is a trellis graph based representation of consecutive genic \nand intergenic regions (called reference ranges) which represent \ndiversity across and between samples. It can be used to:\n\n* Create custom genomes for alignment\n* Call rare alleles\n* Impute genotypes \n* Efficiently store genomic data from many samples (i.e. reference, \n  assemblies, and other lines)\n\nThe PHG also works well with community \nstandards including the Breeding API ([BrAPI](https://brapi.org)) and efficient \ntools for R such as [rPHG2](https://github.com/maize-genetics/rPHG2) for pangenome extraction and \n[rTASSEL](https://github.com/maize-genetics/rTASSEL) for connecting genotype to phenotype.\n\n## Table of contents\n* [Quick start](#quick-start)\n* [Referencing the PHG](#referencing-the-phg)\n* [Design and history](#design-and-history)\n* [Terminology](#terminology)\n* [Long-form documentation](#long-form-documentation)\n\n\n## Quick start\n\n### Installation\n\nUsing a Linux distribution, download the latest release\n[here](https://github.com/maize-genetics/phg_v2/releases/latest) or\nuse the command line:\n\n```shell\ncurl -s https://api.github.com/repos/maize-genetics/phg_v2/releases/latest \\\n| awk -F': ' '/browser_download_url/ \u0026\u0026 /\\.tar/ {gsub(/\"/, \"\", $(NF)); system(\"curl -LO \" $(NF))}'\n```\n\nUntar and add the wrapper script to your `PATH` variable. Detailed\ninformation about these steps can be found [here](https://phg.maizegenetics.net/installation/).\n### Build and load data\n\n_Long-form documentation for this section can be found \n[here](https://phg.maizegenetics.net/build_and_load/). Additional information about **QC \nmetrics** can be found [here](https://phg.maizegenetics.net/qc_metrics/)._\n\n\u003e [!NOTE]\n\u003e As of version `2.4.X`, the PHG utilizes a new version of AnchorWave (`1.2.3`).\n\u003e This changes how ASM coordinates are handled. \n\u003e If you are using old MAF files generated either from AnchorWave `1.2.2` or from PHGv2 version `2.3` or eariler, \n\u003e please use the `--legacy-maf-file flag` for the `create-maf-vcf` command.\n\u003e It is recommended that you remove your `phgv2-conda` Conda environment and rerun the `setup-environment` command.\n\u003e More information can be found [here](https://phg.maizegenetics.net/build_and_load/).\n\n```shell\n## Setup conda environment\n./phg setup-environment\n\n## Initialize TileDB DataSets\n./phg initdb --db-path /path/to/dbs\n\n## Preprocessing data\n./phg prepare-assemblies --keyfile /path/to/keyfile --output-dir data/updated_assemblies --threads numberThreadstoRun\n\n## Build VCF data\n./phg create-ranges --reference-file data/updated_assemblies/Ref.fa --gff my.gff --boundary gene --pad 500 --range-min-size 500 -o /path/to/bed/file.bed\n./phg align-assemblies --gff anchors.gff --reference-file data/updated_assemblies/Ref.fa --assembly-file-list assembliesList.txt --total-threads 20 --in-parallel 4 -o /path/for/generatedFiles\n./phg agc-compress --db-path /path/to/dbs --reference-file data/updated_assemblies/Ref.fa --fasta-list /my/assemblyFastaList.txt \n./phg create-ref-vcf --bed /my/bed/file.bed --reference-file data/updated_assemblies/Ref.fa --reference-url https://url-for-ref --reference-name B73 --db-path /path/to/tiled/dataset folder\n./phg create-maf-vcf --db-path /path/to/dbs --bed /my/bed/file.bed --reference-file data/updated_assemblies/Ref.fa --maf-dir /my/maf/files -o /path/to/vcfs\n\n## OPTIONAL: Convert GVCF to HVCF: use this instead of create-maf-vcf if you have GVCF files created by PHG, but do not have MAF or h.vcf files\n./phg gvcf2hvcf --bed /my/bin/file.bed --gvcf-dir /my/gvcf/dir --reference-file data/updated_assemblies/Ref.fa --db-path /path/to/dbs\n \n## Load data into DBs\n./phg load-vcf --vcf /my/vcf/dir --dbpath /path/to/dbs\n```\n\n### Imputation\n\n_Long-form documentation for this section can be found [here](https://phg.maizegenetics.net/imputation_ropebwt/)_\n\n\n```shell\n## Export\n./phg export-vcf --db-path /my/db/uri --dataset-type hvcf --sample-names LineA,LineB --output-dir /my/hvcf/dir\n\n## Index\n./phg rope-bwt-index --db-path /my/db/uri --hvcf-dir /my/hvcf/dir --output-dir /my/index/dir --index-file-prefix myindex\n\n## Map\n./phg map-reads --hvcf-dir /my/hvcf/dir --index /my/index/dir/myindex.fmd --key-file /my/path/keyfile --output-dir /my/mapping/dir\n\n## Find paths (impute)\n./phg find-paths --path-keyfile /my/path/keyfile --hvcf-dir /my/hvcf/dir --reference-genome /my/ref/genome --path-type haploid --output-dir /my/imputed/hvcfs\n\n## Load in DB\n./phg load-vcf --vcf /my/imputed/hvcfs --dbpath /my/db/uri\n```\n\n### Data retrieval\n\n\u003e [!NOTE]\n\u003e This section is currently in progress and command input may be\n\u003e subject to change. The following pseudocode is a possible\n\u003e representation of the retrieval workflow:\n\n```shell\n## Export from Tiledb\n./phg export-vcf --db-path /my/db/uri --dataset-type hvcf --sample-Names LineA,LineB --output-dir /my/output/dir\n```\n\n## Referencing the PHG\nTo reference the PHG, please use the following citation:\n\n\u003e Bradbury, P J and Casstevens, T and Jensen, S E and Johnson, L C and Miller, Z R and Monier, B and Romay, M C and Song, B and Buckler, E S (2022). **The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation.** *Bioinformatics*. DOI: [10.1093/bioinformatics/btac410](https://doi.org/10.1093/bioinformatics/btac410)\n\nMore references to other PHG articles can be found [here](https://phg.maizegenetics.net/citations/).\n\n\n## Design and history\n\n[PHGv1](https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/Home) was [published in 2022](https://doi.org/10.1093/bioinformatics/btac410). It addressed many\nchallenges related to aligning diverse genomes, efficient storage,\nand imputation across a pangenome. However, it depended on a custom\nrelational database that necessitated unique formats, and database\nqueries did not scale effectively with a large number of taxa and\nrare alleles. Moreover, after developing PHGs for six species, we\nidentified significant opportunities to refine and streamline the\nplatform for curation.\n\nThe redesign leverages the performant TileDB-VCF database, which is\nwidely used in human genetics for extensive medical applications and\nis highly proficient for rapid querying and storage of rare variants.\nThe PHG is now backed by two TileDB-VCF databases: one for tracking\nhaplotypes across all samples (`.h.vcf`), and another for tracking\nvariants relative to either the reference genomes or the closest\nhaplotype (`.g.vcf`). Our implementation of haplotype encoding in VCF\nheavily relies on the VCF ALT haplotype specification defined in\n[v4.2](http://samtools.github.io/hts-specs/VCFv4.2.pdf).\n\nOther important things to note:\n* **High-quality phased genome assemblies** (or similar) are available to\n  initialize the PHG.\n* Ancestral haplotypes are aligned to the reference genome for the\n  identification of haplotypes.\n* All PHG tools rely on public file standards - FASTA, VCF, BCF, BED,\n  and MAF.\n* We rely on public tools like TileDB, Minimap2, GATK, AnchorWave,\n  BioKotlin, and HTSJDK.\n* Genotyping with low-density markers is now done using a memory- and\n  speed-efficient k-mer approach, followed by pathfinding (imputation)\n  with [hidden Markov model](https://en.wikipedia.org/wiki/Hidden_Markov_model) methods. \n* Rare allele discovery with short reads is based on the above path,\n  involving short read alignment to the inferred haplotype path\n  genome and the GATK haplotype caller.\n\n\n## Terminology\n\nWhen describing components used in the PHG, certain terms are used to \nefficiently communicate more complicated ideas. Some common terms you \nmay find are:\n\n| Term             | Definition                                                |\n|------------------|-----------------------------------------------------------|\n| haplotype        | The sequence of part of an individual chromosome.         |\n| path             | The phased set of haplotypes that represent a chromosome. |\n| reference genome | A genome used for initial alignment and base coordinates. |\n| reference range  | A segment of the reference genome.                        |\n\nMore commonly used terms can be found [here](https://phg.maizegenetics.net/terminology/).\n\n\n## Long-form documentation\n\n### PHG workflows\n1. [Installation](https://phg.maizegenetics.net/installation/)\n2. [Building and loading](https://phg.maizegenetics.net/build_and_load/)\n3. [Imputation](https://phg.maizegenetics.net/imputation_ropebwt/)\n4. [Resequencing](https://phg.maizegenetics.net/resequencing/)\n5. [Export data](https://phg.maizegenetics.net/export_data/)\n\n### Reference\n* [Convenience methods](https://phg.maizegenetics.net/convenience_commands/)\n* [haplotype region handling](https://phg.maizegenetics.net/hvcf_region_handling/)\n* [hVCF format specifications](https://phg.maizegenetics.net/hvcf_specifications/)\n* [Ktor specifications](https://phg.maizegenetics.net/ktor_specifications/)\n* [PHGv2 terminology](https://phg.maizegenetics.net/terminology/)\n* [PHGv2 architecture](docs/img/architecture/phg_v2_architecture_20240411.svg)\n* [QC metrics](https://phg.maizegenetics.net/qc_metrics/)\n* [SLURM Usage with `align-assemblies`](https://phg.maizegenetics.net/slurm_usage/)\n* [Terminology](https://phg.maizegenetics.net/terminology/)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaize-genetics%2Fphg_v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaize-genetics%2Fphg_v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaize-genetics%2Fphg_v2/lists"}