{"id":16073285,"url":"https://github.com/aprilweilab/grgl","last_synced_at":"2025-03-17T17:30:26.333Z","repository":{"id":233015668,"uuid":"785687495","full_name":"aprilweilab/grgl","owner":"aprilweilab","description":"Genotype Representation Graph Library","archived":false,"fork":false,"pushed_at":"2025-03-14T15:56:50.000Z","size":954,"stargazers_count":27,"open_issues_count":10,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-14T16:41:30.996Z","etag":null,"topics":["c-plus-plus","comp-bio","popgen","population-genetics","python","statgen","statistical-genetics"],"latest_commit_sha":null,"homepage":"https://grgl.readthedocs.io/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aprilweilab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-12T12:10:23.000Z","updated_at":"2025-02-26T14:19:23.000Z","dependencies_parsed_at":"2024-06-27T22:48:08.580Z","dependency_job_id":"fd192af9-c2f4-497f-a5c1-da6eae208d38","html_url":"https://github.com/aprilweilab/grgl","commit_stats":{"total_commits":33,"total_committers":3,"mean_commits":11.0,"dds":0.06060606060606055,"last_synced_commit":"253b167fff61f2e49a19e17fbb451570a8236019"},"previous_names":["aprilweilab/grgl"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fgrgl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fgrgl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fgrgl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fgrgl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aprilweilab","download_url":"https://codeload.github.com/aprilweilab/grgl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243871888,"owners_count":20361379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","comp-bio","popgen","population-genetics","python","statgen","statistical-genetics"],"created_at":"2024-10-09T08:05:47.520Z","updated_at":"2025-03-17T17:30:26.320Z","avatar_url":"https://github.com/aprilweilab.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](https://github.com/aprilweilab/grgl/actions/workflows/cmake-multi-platform.yml/badge.svg)\n![](https://readthedocs.org/projects/grgl/badge/?version=latest)\n\n# Genotype Representation Graphs\n\nA Genotype Representation Graph (GRG) is a compact way to store reference-aligned genotype data for large\ngenetic datasets. These datasets are typically stored in tabular formats (VCF, BCF, BGEN, etc.) and then\ncompressed using off-the-shelf compression. In contrast, a GRG contains Mutation nodes (representing variants)\nand Sample nodes (representing haploid samples), where there is a path from a Mutation node to a Sample\nnode if-and-only-if that sample contains that mutation. These paths go through internal nodes that represent\ncommon ancestry between multiple samples, and this can result in significant compression **(30-50x smaller than\n.vcf.gz)**. Calculations on the whole dataset can be performed very quickly on GRG, using GRGL. See our paper\n[\"Enabling efficient analysis of biobank-scale data with genotype representation graphs\"](https://www.nature.com/articles/s43588-024-00739-9)\nfor more details.\n\nSince the publication of the paper, [version 2.0](https://github.com/aprilweilab/grgl/releases/tag/v2.0) has been released,\nwhich further reduced the GRG size (by about half) and significantly sped up graph load time (by about 20x).\n\n# Genotype Representation Graph Library (GRGL)\n\nGRGL can be used as a library in both C++ and Python. Support is currently limited to Linux and MacOS.\nIt contains both an API [(see docs)](https://grgl.readthedocs.io/) and a [set of command-line tools](https://github.com/aprilweilab/grgl/blob/main/GettingStarted.md).\n\n## Installing from pip\n\nIf you just want to use the tools (e.g., constructing GRG or converting tree-sequence to GRG) and the Python API then you can install via pip (from [PyPi](http://pypi.org/project/pygrgl/)).\n\n```\npip install pygrgl\n```\n\nThis will use prebuilt packages for most modern Linux situations, and will build from source for MacOS. In order to build from source it will require CMake (at least v3.14), zlib development headers, and a clang or GCC compiler that supports C++11.\n\n## Building (Python)\n\nThe Python installation installs the command line tools and Python libraries (the C++ executables are packaged as part of this). Make sure you clone with `git clone --recursive`!\n\nRequires Python 3.7 or newer to be installed (including development headers). It is recommended that you build/install in a virtual environment.\n```\npython3 -m venv /path/to/MyEnv\nsource /path/to/MyEnv/bin/activate\npython setup.py bdist_wheel               # Compiles C++, builds a wheel in the dist/ directory\npip install --force-reinstall dist/*.whl  # Install from wheel\n```\n\nBuild and installation should take at most a few minutes on the typical computer. For more details on build options, see [DEVELOPING.md](https://github.com/aprilweilab/grgl/blob/main/DEVELOPING.md).\n\n## Building (C++ only)\n\nThe C++ build is only necessary for folks who want to include GRGL as a library in their C++ project. Typically, you would include our\nCMake into your project via [add\\_subdirectory](https://cmake.org/cmake/help/latest/command/add_subdirectory.html), but you can also build\nstandalone as below. Make sure you clone with `git clone --recursive`!\n\nIf you only intend to use GRGL from C++, you can just build it via `CMake`:\n```\nmkdir build \u0026\u0026 cd build\ncmake .. -DCMAKE_BUILD_TYPE=Release\nmake -j4\n```\n\nSee below to install the libraries to your system. It is recommended to install it to a custom location (prefix) since removing packages installed via `make install` is a pain otherwise. Example:\n```\nmkdir /path/to/grgl_installation/\nmkdir build \u0026\u0026 cd build\ncmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/grgl_installation/\nmake -j4\nmake install\n# There should now be bin/, lib/, etc., directories under /path/to/grgl_installation/\n```\n\n## Building (Docker)\n\nWe've included a Dockerfile if you want to use GRGL in a container.\n\nExample to build:\n```\ndocker build . -t grgl:latest\n```\n\nExample to run, constructing a GRG from an example VCF file:\n```\ndocker run -v $PWD:/working -it grgl:latest bash -c \"cd /working \u0026\u0026 grg construct /working/test/inputs/msprime.example.vcf\"\n```\n\n## Usage (Command line)\n\nThere is a command line tool that is mostly for file format conversion and performing common computations on the GRG. For more flexibility, use the Python or C++ APIs.\nAfter building and installing the Python version, run `grg --help` to see all the command options. Some examples are below.\n\nConvert a [tskit](https://tskit.dev/software/tskit.html) tree-sequence into a GRG. This creates `my_arg_data.grg` from `my_arg_data.trees`:\n```\ngrg convert /path/to/my_arg_data.trees my_arg_data.grg\n```\n\nLoad a GRG and emit some simple statistics about the GRG itself:\n```\ngrg process stats my_arg_data.grg\n```\n\nTo construct a GRG from a VCF file, use the `grg construct` command:\n```\ngrg construct --parts 20 -j 1 path/to/foo.vcf\n```\n\n**WARNING:** VCF access for GRG is not indexed, and in general really slow. For anything beyond toy datasets, it is recommended to convert\nVCF files to [IGD](https://github.com/aprilweilab/picovcf) first. You can use the `grg convert` tool (available as part of GRGL)\n or `igdtools` from [picovcf](https://github.com/aprilweilab/picovcf).\n\nTo convert a VCF(.gz) to an IGD and then build a GRG:\n```\ngrg convert path/to/foo.vcf foo.igd\ngrg construct --parts 20 -j 1 foo.igd\n```\n\nConstruction for small datasets (such as those included as tests in this repository) should be very fast, a few minutes at most. Really large datasets (such as Biobank-scale) can take on the order of a day when using lots of threads (e.g., 70).\n\n## Usage (Python API)\n\nSee the provided [jupyter notebooks](https://github.com/aprilweilab/grgl/tree/main/jupyter) and [GettingStarted.md](https://github.com/aprilweilab/grgl/blob/main/GettingStarted.md) for more examples.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faprilweilab%2Fgrgl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faprilweilab%2Fgrgl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faprilweilab%2Fgrgl/lists"}