{"id":13755651,"url":"https://github.com/mcvickerlab/GenVarLoader","last_synced_at":"2025-05-10T02:32:53.444Z","repository":{"id":37833483,"uuid":"478368965","full_name":"mcvickerlab/GenVarLoader","owner":"mcvickerlab","description":"Dataloader for applying sequence models to personalized genomics","archived":false,"fork":false,"pushed_at":"2025-05-02T23:25:46.000Z","size":139034,"stargazers_count":25,"open_issues_count":14,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-03T00:20:04.873Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://genvarloader.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mcvickerlab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-04-06T02:07:58.000Z","updated_at":"2025-05-02T23:25:49.000Z","dependencies_parsed_at":"2023-10-15T23:14:04.559Z","dependency_job_id":"9ba67813-080e-40ee-a85a-8c4b1c4056b5","html_url":"https://github.com/mcvickerlab/GenVarLoader","commit_stats":null,"previous_names":["mcvickerlab/genvarloader","mcvickerlab/genome-loader"],"tags_count":91,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FGenVarLoader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FGenVarLoader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FGenVarLoader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FGenVarLoader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mcvickerlab","download_url":"https://codeload.github.com/mcvickerlab/GenVarLoader/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253166473,"owners_count":21864469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T11:00:17.832Z","updated_at":"2025-05-10T02:32:53.434Z","avatar_url":"https://github.com/mcvickerlab.png","language":"Python","readme":"\u003cimg src=docs/source/_static/gvl_logo.png width=\"200\"\u003e\n\n[![PyPI version](https://badge.fury.io/py/genvarloader.svg)](https://pypi.org/project/genvarloader/)\n[![Documentation Status](https://readthedocs.org/projects/genvarloader/badge/?version=latest)](https://genvarloader.readthedocs.io)\n[![Downloads](https://static.pepy.tech/badge/genvarloader)](https://pepy.tech/project/genvarloader)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/genvarloader)](https://img.shields.io/pypi/dm/genvarloader)\n[![GitHub stars](https://badgen.net/github/stars/mcvickerlab/GenVarLoader)](https://github.com/mcvickerlab/GenVarLoader)\n[![bioRxiv](https://img.shields.io/badge/bioRxiv-2025.01.15.633240-b31b1b.svg)](https://www.biorxiv.org/content/10.1101/2025.01.15.633240)\n\n## Features\n\nGenVarLoader provides a fast, memory efficient data structure for training sequence models on genetic variation. For example, this can be used to train a DNA language model on human genetic variation (e.g. [Dalla-Torre et al.](https://www.biorxiv.org/content/10.1101/2023.01.11.523679)) or train sequence to function models with genetic variation (e.g. [Celaj et al.](https://www.biorxiv.org/content/10.1101/2023.09.20.558508v1), [Drusinsky et al.](https://www.biorxiv.org/content/10.1101/2024.07.27.605449v1), [He et al.](https://www.biorxiv.org/content/10.1101/2024.10.15.618510v1), and [Rastogi et al.](https://www.biorxiv.org/content/10.1101/2024.09.23.614632v1)).\n\n- Avoid writing any sequences to disk (can save \u003e2,000x storage vs. writing personalized genomes with bcftools consensus)\n- Generate haplotypes up to 1,000 times faster than reading a FASTA file\n- Generate tracks up to 450 times faster than reading a BigWig\n- **Supports indels** and re-aligns tracks to haplotypes that have them\n- Extensible to new file formats: drop a feature request! Currently supports VCF, PGEN, and BigWig\n\nDocumentation is available [here](https://genvarloader.readthedocs.io/). See our [preprint](https://www.biorxiv.org/content/10.1101/2025.01.15.633240) for benchmarking and implementation details.\n\n## Installation\n\n```bash\npip install genvarloader\n```\n\nA PyTorch dependency is **not** included since it may require [special instructions](https://pytorch.org/get-started/locally/).\n\n## Contributing\n\n1. Clone the repo.\n2. Assuming you have [Pixi](https://pixi.sh/latest/), install pre-commit hooks `pixi run -e dev pre-commit`\n3. Activate and use the appropriate Pixi environment for your needs. A decent catch-all is `dev` but you might need a different environment if using a GPU.\n\nAll the tests are designed to use pytest and live under `tests/`. These tests ensure the code works as intended so they must all pass before any features are merged into `main` and subsequently released.\n","funding_links":[],"categories":["Software packages"],"sub_categories":["Data wrangling"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcvickerlab%2FGenVarLoader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmcvickerlab%2FGenVarLoader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcvickerlab%2FGenVarLoader/lists"}