{"id":18610612,"url":"https://github.com/tiledb-inc/tiledb-vcf","last_synced_at":"2025-04-05T17:07:18.171Z","repository":{"id":39859291,"uuid":"206372614","full_name":"TileDB-Inc/TileDB-VCF","owner":"TileDB-Inc","description":"Efficient variant-call data storage and retrieval library using the TileDB storage library.","archived":false,"fork":false,"pushed_at":"2025-03-03T15:55:14.000Z","size":36358,"stargazers_count":93,"open_issues_count":13,"forks_count":16,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-03-29T16:07:29.494Z","etag":null,"topics":["bioinformatics","data-science","genomics","gwas","python","spark","tiledb","variant-calling","vcf"],"latest_commit_sha":null,"homepage":"https://tiledb-inc.github.io/TileDB-VCF/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TileDB-Inc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-04T17:15:47.000Z","updated_at":"2025-03-27T12:43:54.000Z","dependencies_parsed_at":"2023-02-19T09:31:10.620Z","dependency_job_id":"42bb0914-15fa-43c0-936f-e730388dcb8b","html_url":"https://github.com/TileDB-Inc/TileDB-VCF","commit_stats":null,"previous_names":[],"tags_count":137,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TileDB-Inc%2FTileDB-VCF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TileDB-Inc%2FTileDB-VCF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TileDB-Inc%2FTileDB-VCF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TileDB-Inc%2FTileDB-VCF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TileDB-Inc","download_url":"https://codeload.github.com/TileDB-Inc/TileDB-VCF/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247369952,"owners_count":20927928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","data-science","genomics","gwas","python","spark","tiledb","variant-calling","vcf"],"created_at":"2024-11-07T03:11:16.146Z","updated_at":"2025-04-05T17:07:18.130Z","avatar_url":"https://github.com/TileDB-Inc.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"https://tiledb.com\"\u003e\u003cimg src=\"https://github.com/TileDB-Inc/TileDB/raw/dev/doc/source/_static/tiledb-logo_color_no_margin_@4x.png\" alt=\"TileDB logo\" width=\"400\"\u003e\u003c/a\u003e\n\n[![Build Status](https://img.shields.io/azure-devops/build/tiledb-inc/836549eb-f74a-4986-a18f-7fbba6bbb5f0/8/main?label=Azure%20Pipelines\u0026logo=azure-pipelines\u0026style=flat-square)](https://dev.azure.com/TileDB-Inc/CI/_build/latest?definitionId=8\u0026branchName=main)\n[![Docker-CLI](https://img.shields.io/static/v1?label=Docker\u0026message=tiledbvcf-cli\u0026color=099cec\u0026logo=docker\u0026style=flat-square)](https://hub.docker.com/repository/docker/tiledb/tiledbvcf-cli)\n[![Docker-Py](https://img.shields.io/static/v1?label=Docker\u0026message=tiledbvcf-py\u0026color=099cec\u0026logo=docker\u0026style=flat-square)](https://hub.docker.com/repository/docker/tiledb/tiledbvcf-py)\n\n# TileDB-VCF\n\nA C++ library for efficient storage and retrieval of genomic variant-call data using [TileDB Embedded][tiledb].\n\n## Features\n\n- Easily ingest large amounts of variant-call data at scale\n- Supports ingesting single sample VCF and BCF files\n- New samples are added *incrementally*, avoiding computationally expensive merging operations\n- Allows for highly compressed storage using TileDB sparse arrays\n- Efficient, parallelized queries of variant data stored locally or remotely on S3\n- Export lossless VCF/BCF files or extract specific slices of a dataset\n\n## What's Included?\n\n- Command line interface (CLI)\n- APIs for C, C++, Python, and Java\n\n## Quick Start\n\nThe [documentation website][vcf] provides comprehensive usage examples but here are a few quick exercises to get you started.\n\nWe'll use a dataset that includes 20 synthetic samples, each one containing over 20 million variants. We host a publicly accessible version of this dataset on S3, so if you have TileDB-VCF installed and you'd like to follow along just swap out the `uri`'s below for `s3://tiledb-inc-demo-data/tiledbvcf-arrays/v4/vcf-samples-20`. And if you *don't* have TileDB-VCF installed yet, you can use our [Docker images](docker/README.md) to test things out.\n\n### CLI\n\nExport complete chr1 BCF files for a subset of samples:\n\n```sh\ntiledbvcf export \\\n  --uri vcf-samples-20 \\\n  --regions chr1:1-248956422 \\\n  --sample-names v2-usVwJUmo,v2-WpXCYApL\n```\n\nCreate a TSV file containing all variants within one or more regions of interest:\n\n```sh\ntiledbvcf export \\\n  --uri vcf-samples-20 \\\n  --sample-names v2-tJjMfKyL,v2-eBAdKwID \\\n  -Ot --tsv-fields \"CHR,POS,REF,S:GT\" \\\n  --regions \"chr7:144000320-144008793,chr11:56490349-56491395\"\n```\n\n### Python\n\nRunning the same query in python\n\n```py\nimport tiledbvcf\n\nds = tiledbvcf.Dataset(uri = \"vcf-samples-20\", mode=\"r\")\n\nds.read(\n    attrs = [\"sample_name\", \"pos_start\", \"fmt_GT\"],\n    regions = [\"chr7:144000320-144008793\", \"chr11:56490349-56491395\"],\n    samples = [\"v2-tJjMfKyL\", \"v2-eBAdKwID\"]\n)\n```\n\nreturns results as a pandas `DataFrame`\n\n```\n     sample_name  pos_start    fmt_GT\n0    v2-nGEAqwFT  143999569  [-1, -1]\n1    v2-tJjMfKyL  144000262  [-1, -1]\n2    v2-tJjMfKyL  144000518  [-1, -1]\n3    v2-nGEAqwFT  144000339  [-1, -1]\n4    v2-nzLyDgYW  144000102  [-1, -1]\n..           ...        ...       ...\n566  v2-nGEAqwFT   56491395    [0, 0]\n567  v2-ijrKdkKh   56491373    [0, 0]\n568  v2-eBAdKwID   56491391    [0, 0]\n569  v2-tJjMfKyL   56491392  [-1, -1]\n570  v2-nzLyDgYW   56491365  [-1, -1]\n```\n\n## Want to Learn More?\n\n\n* [Blog \"Population Genomics is a Data Management Problem\"][blog]\n* [Check out the full documentation][vcf]\n  * [Why use TileDB-VCF?][docswhytile]\n  * [Data Model][docsdatamodel]\n  * [Installation][docsinstallation]\n  * [How To][docshowto]\n  * [Reference][docsreference]\n\n\n# Code of Conduct\n\nAll participants in TileDB spaces are expected to adhere to high standards of\nprofessionalism in all interactions. This repository is governed by the\nspecific standards and reporting procedures detailed in depth in the\n[TileDB core repository Code Of Conduct](\nhttps://github.com/TileDB-Inc/TileDB/blob/dev/CODE_OF_CONDUCT.md).\n\n\u003c!-- links --\u003e\n[tiledb]: https://github.com/TileDB-Inc/TileDB\n[vcf]: https://docs.tiledb.com/main/integrations-and-extensions/population-genomics\n[docswhytile]: https://docs.tiledb.com/main/integrations-and-extensions/genomics/population-genomics#why-use-tiledb-vcf\n[docsdatamodel]: https://docs.tiledb.com/main/integrations-and-extensions/population-genomics/data-model\n[docsinstallation]: https://docs.tiledb.com/main/integrations-and-extensions/genomics/population-genomics/installation\n[docshowto]: https://docs.tiledb.com/main/integrations-and-extensions/genomics/population-genomics/how-to\n[docsreference]: https://docs.tiledb.com/main/integrations-and-extensions/genomics/population-genomics/api-reference\n[blog]: https://tiledb.com/blog/population-genomics-is-a-data-management-problem\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiledb-inc%2Ftiledb-vcf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftiledb-inc%2Ftiledb-vcf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiledb-inc%2Ftiledb-vcf/lists"}