{"id":47497804,"url":"https://github.com/mcvickerlab/varca","last_synced_at":"2026-04-01T21:05:06.991Z","repository":{"id":37906297,"uuid":"197674437","full_name":"mcvickerlab/varCA","owner":"mcvickerlab","description":"Use an ensemble of variant callers to call variants from ATAC-seq data","archived":false,"fork":false,"pushed_at":"2025-05-14T22:25:25.000Z","size":366,"stargazers_count":23,"open_issues_count":20,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-28T00:36:36.803Z","etag":null,"topics":["atac-seq-data","machine-learning","random-forest","snakemake","variant-calling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mcvickerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-07-19T00:29:55.000Z","updated_at":"2025-05-14T22:25:28.000Z","dependencies_parsed_at":"2025-05-14T23:36:27.293Z","dependency_job_id":null,"html_url":"https://github.com/mcvickerlab/varCA","commit_stats":null,"previous_names":["mcvickerlab/varca","aryarm/varca"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/mcvickerlab/varCA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FvarCA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FvarCA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FvarCA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FvarCA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mcvickerlab","download_url":"https://codeload.github.com/mcvickerlab/varCA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcvickerlab%2FvarCA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31291984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atac-seq-data","machine-learning","random-forest","snakemake","variant-calling"],"created_at":"2026-03-27T03:28:59.920Z","updated_at":"2026-04-01T21:05:06.974Z","avatar_url":"https://github.com/mcvickerlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Snakemake](https://img.shields.io/badge/snakemake-5.18.0-brightgreen.svg?style=flat-square)](https://snakemake.readthedocs.io/)\n\n# varCA\nA pipeline for running an ensemble of variant callers to predict variants from ATAC-seq reads.\n\nThe entire pipeline is made up of two smaller subworkflows. The `prepare` subworkflow calls each variant caller and prepares the resulting data for use by the `classify` subworkflow, which uses an ensemble classifier to predict the existence of variants at each site.\n\n\u003e [!NOTE]  \n\u003e VarCA does not output genotypes (GT fields) because of the possibility of inaccuracy in the presence of allele-specific open chromatin. Please refer to https://github.com/mcvickerlab/varCA/issues/43#issuecomment-1088028758\n\n### [Code Ocean](https://codeocean.com/capsule/6980349/tree/v1)\nUsing [our Code Ocean compute capsule](https://codeocean.com/capsule/6980349/tree/v1), you can execute [VarCA v0.2.1](https://github.com/mcvickerlab/varCA/releases/tag/v0.2.1) on example data without downloading or setting up the project. To interpret the output of VarCA, see the output sections of the [`prepare` subworkflow](rules#output) and the [`classify` subworkflow](rules#output-1) in the [rules README](rules/README.md).\n\n# download\nExecute the following command or download the [latest release](https://github.com/mcvickerlab/varCA/releases/latest) manually.\n```\ngit clone https://github.com/mcvickerlab/varCA.git\n```\nAlso consider downloading the [example data](https://github.com/mcvickerlab/varCA/releases/latest/download/data.tar.gz).\n```\ncd varCA\nwget -O- -q https://github.com/mcvickerlab/varCA/releases/latest/download/data.tar.gz | tar xvzf -\n```\n\n# setup\nThe pipeline is written as a Snakefile which can be executed via [Snakemake](https://snakemake.readthedocs.io). We recommend installing version 5.18.0:\n```\nconda create -n snakemake -c bioconda -c conda-forge --no-channel-priority 'snakemake==5.18.0'\n```\nWe highly recommend you install [Snakemake via conda](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html#installation-via-conda) like this so that you can use the `--use-conda` flag when calling `snakemake` to let it [automatically handle all dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) of the pipeline. Otherwise, you must manually install the dependencies listed in the [env files](envs).\n\n# execution\n1. Activate snakemake via `conda`:\n    ```\n    conda activate snakemake\n    ```\n2. Execute the pipeline on the example data\n\n    Locally:\n    ```\n    ./run.bash \u0026\n    ```\n    __or__ on an SGE cluster:\n    ```\n    ./run.bash --sge-cluster \u0026\n    ```\n#### Output\nVarCA will place all of its output in a new directory (`out/`, by default). Log files describing the progress of the pipeline will also be created there: the `log` file contains a basic description of the progress of each step, while the `qlog` file is more detailed and will contain any errors or warnings. You can read more about the pipeline's output in the [rules README](rules/README.md).\n\n#### Executing the pipeline on your own data\nYou must modify [the config.yaml file](configs#configyaml) to specify paths to your data. The config file is currently configured to run the pipeline on the example data provided.\n\n#### Executing each portion of the pipeline separately\nThe pipeline is made up of [two subworkflows](rules). These are usually executed together automatically by the master pipeline, but they can also be executed on their own for more advanced usage. See the [rules README](rules/README.md) for execution instructions and a description of the outputs. You will need to execute the subworkflows separately [if you ever want to create your own trained models](rules#training-and-testing-varca).\n\n#### Reproducing our results\nWe provide the example data so that you may quickly (in ~1 hr, excluding dependency installation) verify that the pipeline can be executed on your machine. This process does not reproduce our results. Those with more time can follow [these steps](rules#testing-your-model--reproducing-our-results) to create all of the plots and tables in our paper.\n\n### If this is your first time using Snakemake\nWe recommend that you run `snakemake --help` to learn about Snakemake's options. For example, to check that the pipeline will be executed correctly before you run it, you can call Snakemake with the `-n -p -r` flags. This is also a good way to familiarize yourself with the steps of the pipeline and their inputs and outputs (the latter of which are inputs to the first rule in each workflow -- ie the `all` rule).\n\nNote that Snakemake will not recreate output that it has already generated, unless you request it. If a job fails or is interrupted, subsequent executions of Snakemake will just pick up where it left off. This can also apply to files that *you* create and provide in place of the files it would have generated.\n\nBy default, the pipeline will automatically delete some files it deems unnecessary (ex: unsorted copies of a BAM). You can opt to keep these files instead by providing the `--notemp` flag to Snakemake when executing the pipeline.\n\n# files and directories\n\n### [Snakefile](Snakefile)\nA [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline for calling variants from a set of ATAC-seq reads. This pipeline automatically executes two subworkflows:\n\n1. the [`prepare` subworkflow](rules/prepare.smk), which prepares the reads for classification and\n2. the [`classify` subworkflow](rules/classify.smk), which creates a VCF containing predicted variants\n\n### [rules/](rules)\nSnakemake rules for the `prepare` and `classify` subworkflows. You can either execute these subworkflows from the [master Snakefile](#snakefile) or individually as their own Snakefiles. See the [rules README](rules/README.md) for more information.\n\n### [configs/](configs)\nConfig files that define options and input for the pipeline and the `prepare` and `classify` subworkflows. If you want to predict variants from your own ATAC-seq data, you should start by filling out [the config file for the pipeline](/configs#configyaml).\n\n### [callers/](callers)\nScripts for executing each of the variant callers which are used by the `prepare` subworkflow. Small pipelines can be written for each caller by using a special naming convention. See the [caller README](callers/README.md) for more information.\n\n### [breakCA/](breakCA)\nScripts for calculating posterior probabilities for the existence of an insertion or deletion, which can be used as features for the classifier. These scripts are an adaptation from [@Arkosen](https://github.com/Arkosen)'s [BreakCA code](https://www.biorxiv.org/content/10.1101/605642v1.abstract).\n\n### [scripts/](scripts)\nVarious scripts used by the pipeline. See the [script README](scripts/README.md) for more information.\n\n### [run.bash](run.bash)\nAn example bash script for executing the pipeline using `snakemake` and `conda`. Any arguments to this script are passed directly to `snakemake`.\n\n# citation\nThere is an option to _\"Cite this repository\"_ on the right sidebar of [the repository homepage](https://github.com/mcvickerlab/varCA).\n\u003e Massarat, A. R., Sen, A., Jaureguy, J., Tyndale, S. T., Fu, Y., Erikson, G., \u0026 McVicker, G. (2021). Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq. Nucleic Acids Research, gkab621. https://doi.org/10.1093/nar/gkab621\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcvickerlab%2Fvarca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmcvickerlab%2Fvarca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcvickerlab%2Fvarca/lists"}