{"id":37071310,"url":"https://github.com/varda/varda2_preprocessing","last_synced_at":"2026-01-14T08:20:02.537Z","repository":{"id":57456004,"uuid":"243268535","full_name":"varda/varda2_preprocessing","owner":"varda","description":"Extract coverage information from gVCF and variants from VCF files and output in tabular format.","archived":false,"fork":false,"pushed_at":"2024-01-24T14:18:17.000Z","size":83,"stargazers_count":4,"open_issues_count":5,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-28T11:47:28.624Z","etag":null,"topics":["bed","coverage","gvcf","ngs"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/varda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-26T13:25:37.000Z","updated_at":"2025-08-11T02:30:57.000Z","dependencies_parsed_at":"2022-09-02T08:32:04.626Z","dependency_job_id":null,"html_url":"https://github.com/varda/varda2_preprocessing","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/varda/varda2_preprocessing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varda%2Fvarda2_preprocessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varda%2Fvarda2_preprocessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varda%2Fvarda2_preprocessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varda%2Fvarda2_preprocessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/varda","download_url":"https://codeload.github.com/varda/varda2_preprocessing/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varda%2Fvarda2_preprocessing/sbom","scorecard":{"id":916375,"data":{"date":"2025-08-11","repo":{"name":"github.com/varda/varda2_preprocessing","commit":"5fe1f218484595d52fd6b6c53fa4d332c518b73e"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-24T21:09:07.195Z","repository_id":57456004,"created_at":"2025-08-24T21:09:07.195Z","updated_at":"2025-08-24T21:09:07.195Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:16:59.381Z","status":"ssl_error","status_checked_at":"2026-01-14T08:13:45.490Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bed","coverage","gvcf","ngs"],"created_at":"2026-01-14T08:20:01.791Z","updated_at":"2026-01-14T08:20:02.536Z","avatar_url":"https://github.com/varda.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Varda2 preprocessing\n\nThe Varda2 database stores genomic variants and coverage information. To enable\nefficient and meaningful insertion into the database we have defined a set of\npreprocessing steps that all participating centers should follow. Note that\nthese steps are not cast in stone and hopefully will converge as a set of best\npractices between the centers.\n\nThe information about variants comes from VCF files and the information about\ncoverage comes from gVCF files. The following sections describe the process\nin more detail.\n\n## Workflow\n\nThe following figure depicts the process DAG generated from the Snakemake workflow at https://git.lumc.nl/klinische-genetica/capture-lumc/vcf-to-varda. The rules are described below. For all the details, look at the workflow itself.\n\n![Process Flow](dag.png)\n\n## Variants\n\nTo extract variants from the VCF file in a way that Varda can process them, multiple steps are involved:\n\n- trim_alt_and_uncalled:\n  - `bcftools view --trim-alt-alleles --exclude-uncalled --output-file {output} {input}`\n- split_multi:\n  - `bcftools norm --multiallelics - --output {output} {input}`\n- exclude_alt_star:\n  - `bcftools view --exclude 'ALT==\"*\"' --output-file {output} {input}`\n\nThe first step is a pipeline of `bcftools` filtering and normalisation to get\nrid of alt-alleles and multi-allelic entries so that we end up with a single\nvariant per line.\n\nThe second step is to take the filtered VCF file and convert it into a Varda\nvariant file.\n\n- vcf2varda:\n  - `vcf2variants \u003c {input} \u003e {output}`\n\nThe last step is only required if there are no proper refseq id's used, i.e. only `chr1` or even `1` instead of `NC_000001.10`.\n\n- var_cthreepo:\n  - `cthreepo --mapfile h37 --infile {input} --id_from uc --outfile {output} --id_to rs`\n\nThis outputs the following tab separated format:\n`\u003cCHROM\u003e \u003cSTART\u003e \u003cEND\u003e \u003cPLOIDY\u003e \u003cPHASE SET\u003e \u003cINSERTED LENGTH\u003e \u003cINSERTED SEQUENCE\u003e`\n\ne.g.:\n```\nNC_000001.10    13656   13658   1       0       0       .\nNC_000001.10    13895   13896   1       0       1       A\nNC_000001.10    14164   14165   1       0       1       G\nNC_000001.10    14672   14673   1       0       1       C\nNC_000001.10    14698   14699   1       0       1       G\nNC_000001.10    14906   14907   1       0       1       G\n```\n\nNB:\n- `-1` in `PHASE SET` is homozygous, `0` is unphased\n- `.` in `INSERTED SEQUENCE` is no insertion (thus deletion only)\n\n\n## Coverage\n\nTo extract the coverage from gVCF files, the following steps are required.\n\n- gvcf2coverage:\n  - `gvcf2coverage \u003c {input} \u003e {output}`\n- cov_cthreepo:\n  - `cthreepo --mapfile h37 --infile {input} --id_from uc --outfile {output} --id_to rs`\n\nThe second step is only required if there are no proper refseq id's used, i.e. only `chr1` or even `1` instead of `NC_000001.10`.\n\nThis outputs the following tab separated format:\n\n`\u003cCHROM\u003e \u003cSTART\u003e \u003cEND\u003e \u003cPLOIDY\u003e`\n\ne.g.:\n```\nNC_000001.10    10033   10038   2\nNC_000001.10    10038   10043   2\nNC_000001.10    10043   10044   2\nNC_000001.10    10044   10048   2\nNC_000001.10    10048   10049   2\nNC_000001.10    10049   10050   2\nNC_000001.10    10050   10051   2\nNC_000001.10    10051   10054   2\n```\n\n\nN.B. By default the tools merge the resulting entries with a default merging\ndistance of 0. If merging is disabled, it is recommended to immediately pipe\nthe results of gvcf2coverage(.py) to `bedtools merge` to merge all the\nindividual adjecent entries. Note that bedtools will also merge the entries\nwith a different value in the ploidy column, therefore we opted to do the\nmerging in the gvcf2coverage tool.\n\nN.B. This repository contains two functionally similar implementations of a coverage\nextractor from gVCF files. The Python version is more readable and apt for modification, but the C version\nis roughly 12x faster.\n\n\n## Submitting to the Varda database\n\nAfter the variants and coverage files are created per sample, they need to be submitted using the varda2-client in the following way. The `varda2-client` expects the supplied access token to be present in the `VARDA_TOKEN` environment variable.\n\n```\nvarda2-client submit \\\n--disease-code {params.disease_code} \\\n--lab-sample-id {params.sample_id} \\\n--coverage {input.coverage} \\\n--variants-file {input.variants} \\\n\u003e {output}\n```\n\nThis results in a JSON file with the following format:\n```\n{\n\"\u003cLAB_SAMPLE_ID\u003e\": {\n    \"message\": \"Sample being inserted ...\",\n    \"sample\": \"\u003cVARDA_SAMPLE_ID\u003e\",\n    \"task\": \"\u003cTASK_SAMPLE_ID\u003e\"\n  }\n}\n```\n\n## Software\n\n- gvcf2coverage:\n  - repo: https://github.com/varda/varda2_preprocessing\n  - conda: https://anaconda.org/bioconda/gvcf2coverage\n  - container: https://quay.io/biocontainers/gvcf2coverage\n- pygvcf2coverage:\n  - repo: https://github.com/varda/varda2_preprocessing\n  - pypi: https://pypi.org/project/pygvcf2coverage/\n  - conda: https://anaconda.org/bioconda/pygvcf2coverage\n  - container: https://quay.io/biocontainers/pygvcf2coverage  \n- vcf2variants:\n  - repo: https://github.com/varda/varda2_preprocessing\n  - pypi: https://pypi.org/project/vcf2variants/\n  - conda: https://anaconda.org/bioconda/vcf2variants\n  - container: https://quay.io/biocontainers/vcf2variants\n- cthreepo:\n  - repo: https://github.com/vkkodali/cthreepo\n  - pypi: https://pypi.org/project/cthreepo/\n  - conda: https://anaconda.org/bioconda/chtreepo \n  - container: https://quay.io/biocontainers/cthreepo\n- varda2-client:\n  - repo: https://github.com/varda/varda2-client\n  - pypi: https://pypi.org/project/varda2-client\n  - conda: https://anaconda.org/bioconda/varda2-client\n  - container: https://quay.io/biocontainers/varda2-client\n- bcftools:\n  - repo: https://github.com/samtools/bcftools\n  - conda: https://anaconda.org/bioconda/bcftools\n  - container: https://quay.io/biocontainers/bcftools\n\n\n  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarda%2Fvarda2_preprocessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvarda%2Fvarda2_preprocessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarda%2Fvarda2_preprocessing/lists"}