{"id":19459527,"url":"https://github.com/aprilweilab/pyigd","last_synced_at":"2025-04-25T07:31:57.432Z","repository":{"id":244997513,"uuid":"816955778","full_name":"aprilweilab/pyigd","owner":"aprilweilab","description":"Python-only parser for Indexable Genotype Data (IGD) format.","archived":false,"fork":false,"pushed_at":"2024-09-19T18:51:35.000Z","size":27,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-09-22T23:28:20.447Z","etag":null,"topics":["comp-bio","python","variant-calling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aprilweilab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T18:10:24.000Z","updated_at":"2024-09-19T18:51:36.000Z","dependencies_parsed_at":"2024-06-18T23:28:11.588Z","dependency_job_id":"1120d753-f7da-4ca0-996b-c791f6a6ade1","html_url":"https://github.com/aprilweilab/pyigd","commit_stats":null,"previous_names":["aprilweilab/pyigd"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fpyigd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fpyigd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fpyigd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aprilweilab%2Fpyigd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aprilweilab","download_url":"https://codeload.github.com/aprilweilab/pyigd/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223988523,"owners_count":17236921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comp-bio","python","variant-calling"],"created_at":"2024-11-10T17:32:56.963Z","updated_at":"2025-04-25T07:31:57.417Z","avatar_url":"https://github.com/aprilweilab.png","language":"Python","readme":"![Python build and test](https://github.com/aprilweilab/pyigd/actions/workflows/python-package.yml/badge.svg)\n\n# pyigd\n\nPyIGD is a Python-only parser the [Indexable Genotype Data (IGD) format](https://github.com/aprilweilab/picovcf/blob/main/IGD.FORMAT.md). We have a short\n[preprint paper](https://www.biorxiv.org/content/10.1101/2025.02.05.636549v1.abstract) that describes the format and some of its advantages.\n\nFor a C++ library that supports creating and parsing IGD, see [picovcf](https://github.com/aprilweilab/picovcf) (which also supports VCF -\u003e IGD conversion).\n\n## Installation\n\nYou can install the latest release of PyIGD from pypi, via `pip install pyigd`.\n\nFor development, you can clone the code install it directly from the directory (this will automatically reflect any code changes you make):\n```\npip install -e pyigd/\n```\n\nor build and install via the wheel:\n```\ncd pyigd/ \u0026\u0026 python setup.py bdist_wheel\npip install --force-reinstall dist/*.whl\n```\n\n## Usage\n\nThe `pyigd.IGDReader` class reads IGD data from a buffer. See [the example script](https://github.com/aprilweilab/pyigd/blob/main/examples/igdread.py) that loads an IGD file, prints out some meta-data, and then iterates the genotype data for all variants. Generally the usage pattern is:\n```\nwith open(filename, \"rb\") as f:\n  igd_reader = pyigd.IGDReader(f)\n```\n\nThere is also the `pyigd.IGDWriter` class to construct IGD files. Related is `pyigd.IGDTransformer`, which is a way to create a copy of an IGD while modifying its contents. See the IGDTransformer [sample list example](https://github.com/aprilweilab/pyigd/blob/main/examples/xform.py) and [bitvector example](https://github.com/aprilweilab/pyigd/blob/main/examples/xform_bv.py).\n\nIGD can be highly performant for a few reasons:\n1. It stores sparse data sparsely. Low-frequency variants are stored as sample lists. Medium/high frequency variants are stored as bit vectors.\n2. It is indexable (you can jump directly to data for the `ith` variant). Since the index is stored in its own section of the file, scanning the index is extremely fast. So only looking at variants for a particular range of the genome is very fast (in this case you would use `pyigd.IGDFile.get_position_and_flags()` to find the first variant index within the range, and then use `pyigd.IGDFile.get_samples()` after that).\n3. The genotype data is stored in one of two very simple binary formats. This makes parsing fast, and the compact nature of the file makes reading from disk/memory fast as well.\n\n## How do I use IGD in my project?\n\n* Clone [picovcf](https://github.com/aprilweilab/picovcf) and follow the instructions in its [README](https://github.com/aprilweilab/picovcf/blob/main/README.md) to build the tools for that library.\n  * If you want to be able to convert `.vcf.gz` (compressed VCF) to IGD, make sure you build with `-DENABLE_VCF_GZ=ON`\n* One of the built tools will be `igdtools`, which can converts from VCF to IGD, among other things (such as filtering IGD files).\n* Do one of the following:\n  * If your project is C++, copy [picovcf.hpp](https://github.com/aprilweilab/picovcf/blob/main/picovcf.hpp) into your project, `#include` it somewhere and then use according to the [documentation](https://picovcf.readthedocs.io/en/latest/)\n  * If your project is Python, clone [pyigd](https://github.com/aprilweilab/pyigd/) and install it per the [README instructions](https://github.com/aprilweilab/pyigd/blob/main/README.md).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faprilweilab%2Fpyigd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faprilweilab%2Fpyigd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faprilweilab%2Fpyigd/lists"}