{"id":50193889,"url":"https://github.com/HKU-BAL/Clair3","last_synced_at":"2026-06-11T08:00:33.763Z","repository":{"id":41066614,"uuid":"352969947","full_name":"HKU-BAL/Clair3","owner":"HKU-BAL","description":"Clair3 - Symphonizing pileup and full-alignment for deep learning-based long-read variant calling","archived":false,"fork":false,"pushed_at":"2026-06-09T03:06:38.000Z","size":39952,"stargazers_count":369,"open_issues_count":16,"forks_count":38,"subscribers_count":12,"default_branch":"main","last_synced_at":"2026-06-09T05:08:32.115Z","etag":null,"topics":["computational-biology","deep-learning","genomics","illumina","long-reads","nanopore","ont-data","ont-models","pacbio","variant-calling","variant-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKU-BAL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-03-30T11:02:58.000Z","updated_at":"2026-06-09T03:06:42.000Z","dependencies_parsed_at":"2024-04-19T05:29:44.819Z","dependency_job_id":"b3b75226-b486-4aeb-9569-1bd78313736f","html_url":"https://github.com/HKU-BAL/Clair3","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/HKU-BAL/Clair3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-BAL%2FClair3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-BAL%2FClair3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-BAL%2FClair3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-BAL%2FClair3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKU-BAL","download_url":"https://codeload.github.com/HKU-BAL/Clair3/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-BAL%2FClair3/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34188272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-biology","deep-learning","genomics","illumina","long-reads","nanopore","ont-data","ont-models","pacbio","variant-calling","variant-detection"],"created_at":"2026-05-25T16:00:41.941Z","updated_at":"2026-06-11T08:00:33.756Z","avatar_url":"https://github.com/HKU-BAL.png","language":"Python","funding_links":[],"categories":["Software packages"],"sub_categories":["Variant, SV calling, Phasing"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://en.wiktionary.org/wiki/%E7%9C%BC\" target=\"_blank\"\u003e\n    \u003cimg src=\"docs/images/clair3_logo.png\" width=\"110\" height=\"90\" alt=\"Clair3\"\u003e\n  \u003c/a\u003e\n\n  \u003ch1\u003eClair3\u003c/h1\u003e\n\n  \u003cp\u003e\u003cb\u003eSymphonizing pileup and full-alignment for deep-learning-based long-read variant calling\u003c/b\u003e\u003c/p\u003e\n\n  \u003cp\u003e\n    \u003ca href=\"https://opensource.org/licenses/BSD-3-Clause\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-BSD%203--Clause-blue.svg\" alt=\"License\"\u003e\u003c/a\u003e\n    \u003ca href=\"http://bioconda.github.io/recipes/clair3/README.html\"\u003e\u003cimg src=\"https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat\" alt=\"install with bioconda\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://hub.docker.com/r/hkubal/clair3\"\u003e\u003cimg src=\"https://img.shields.io/badge/docker-hkubal%2Fclair3-blue.svg\" alt=\"Docker\"\u003e\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n---\n\n**Contact:** Ruibang Luo, Zhenxian Zheng, Xian Yu\n\n**Email:** rbluo@cs.hku.hk · zxzheng@cs.hku.hk · yuxian@connect.hku.hk\n\n---\n\n## Introduction\n\n**Clair3** is a germline small-variant caller for long-read sequencing. It combines two complementary models to balance speed and accuracy:\n\n- **Pileup calling** — fast, handles the majority of variant candidates from summarized alignment statistics.\n- **Full-alignment calling** — computationally intensive, resolves uncertain candidates from haplotype-resolved full alignments.\n\nClair3 is the 3rd generation of [Clair](https://github.com/HKU-BAL/Clair) (2nd) and [Clairvoyante](https://github.com/aquaskyline/Clairvoyante) (1st).\n\n### Looking for a different variant caller?\n\n| Use case | Tool |\n| --- | --- |\n| Germline on **long-read RNA-seq** | [Clair3-RNA](https://github.com/HKU-BAL/Clair3-RNA) |\n| Somatic, **paired tumor/normal** | [ClairS](https://github.com/HKU-BAL/ClairS) |\n| Somatic, **tumor-only** | [ClairS-TO](https://github.com/HKU-BAL/ClairS-TO) |\n\n### Agent Skill\n\n[Clair-skills](https://github.com/HKU-BAL/Clair-skills) is a plug-in for agentic AI coding assistants (Claude Code, Cursor, Codex, …) that covers the entire Clair suite. It helps the agent pick the right tool and model, generate ready-to-run commands, and analyze results.\n\n---\n\n## Contents\n\n- [Latest Updates](#latest-updates)\n- [Installation](#installation) — [Docker](#option-1-docker) · [Singularity](#option-2-singularity) · [Bioconda](#option-3-bioconda) · [Step-by-step (Conda)](#option-4-step-by-step-conda)\n- [Pre-trained Models](#pre-trained-models)\n- [Quick Demo](#quick-demo)\n- [Usage](#usage)\n- [Advanced Topics](#advanced-topics) — [Dwelling time](#dwelling-time-feature) · [Amplicon data](#dealing-with-amplicon-data) · [Postprocessing](#postprocessing-scripts)\n- [Reference](#reference) — [Folder structure](#folder-structure-and-submodules) · [Training data](#training-data) · [VCF/GVCF formats](#vcfgvcf-output-formats) · [Model training guides](#model-training-guides)\n- [Citation](#citation)\n\n---\n\n## Latest Updates\n\n### v2.0.1 — *Apr 27, 2026*\n\n- Added the ONT `r1041_e82_400bps_sup_v520_with_mv` signal-aware (move-table) model for Dorado v5.2 SUP basecalled data ([#428](https://github.com/HKU-BAL/Clair3/issues/428)).\n- Added a pre-built **GPU Docker image** `hkubal/clair3:v2.0.1_gpu` (CUDA 12.1, PyTorch). See [GPU (NVIDIA CUDA on Linux)](#gpu-nvidia-cuda-on-linux) ([#433](https://github.com/HKU-BAL/Clair3/issues/433)).\n- Fixed `clair3_version` shown in VCF headers — previously stuck at `1.2.0` regardless of installed version ([#432](https://github.com/HKU-BAL/Clair3/issues/432)).\n- Fixed `SortVcf` writing pileup `RefCall` entries when no variants were found, instead of an empty VCF ([#436](https://github.com/HKU-BAL/Clair3/issues/436)).\n\n### v2.0.0 — *Feb 9, 2026* \u0026nbsp; **(Major release)**\n\nA preprint describing the performance of Clair3 v2 is available on [bioRxiv](https://www.biorxiv.org/content/10.64898/2026.02.13.705285v1).\n\n- **PyTorch migration.** The deep-learning backend moved from TensorFlow to PyTorch. **v1 TensorFlow models are _not_ compatible with v2** (including the TF models ONT provides via Rerio). Use the [Converted Rerio Clair3 Models (PyTorch)](https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/), or convert your own with the [Model Migration Guide](docs/model_migration_guide.md). Pre-trained PyTorch models: [download here](https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/).\n- **Signal-aware variant calling for ONT.** Pass `--enable_dwell_time` on BAMs with Dorado `mv` tags (requires `--emit-moves`). See [Dwelling Time Feature](docs/dwelling_time.md).\n- **New Python runner.** `run_clair3.sh` was reconstructed as `run_clair3.py`; both remain usable.\n- **Checkpoint format.** TF `.index`/`.data` → PyTorch `.pt`.\n\n### v1.2.0 — *Aug 1, 2025*\n\nNative GPU support on Linux and Apple Silicon. Clair3 on GPU runs **~5× faster than CPU**. See the [GPU Quick Start](docs/gpu_quick_start.md).\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/images/clair3_gpu_benchmark.png\" width=\"400\" alt=\"Clair3 GPU benchmark\"\u003e\n\u003c/div\u003e\n\n### v1.1.2 — *Jul 10, 2025*\n\n- Boundary check for an insertion immediately followed by soft-clipping ([#394](https://github.com/HKU-BAL/Clair3/issues/394), @[dpryan79](https://github.com/dpryan79)).\n- Parallel-job exit-code checking; pipeline now exits immediately on any job failure ([#392](https://github.com/HKU-BAL/Clair3/issues/392), @[SamStudio8](https://github.com/SamStudio8)).\n\n### v1.1.1 — *May 19, 2025*\n\n- Fixed the malformed VCF header on AWS ([#380](https://github.com/HKU-BAL/Clair3/issues/380)).\n- Added an R10.4.1 model fine-tuned on 12 [bacterial genomes](https://elifesciences.org/reviewed-preprints/98300) ([notes](docs/fine-tuning_Clair3_with_12_bacteria_samples.pdf), @[wshropshire](https://github.com/wshropshire)).\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eEarlier versions\u003c/b\u003e (click to expand)\u003c/summary\u003e\n\n**v1.1.0 — Apr 8, 2025.** Removed `parallel` version checking ([#377](https://github.com/HKU-BAL/Clair3/issues/377)).\n\n**v1.0.11 — Mar 19, 2025.** Added `--enable_variant_calling_at_sequence_head_and_tail` to call variants in the first/last 16 bp of a sequence (use with caution — less reliable alignments and less context; [#257](https://github.com/HKU-BAL/Clair3/issues/257)). Added `--output_all_contigs_in_gvcf_header` ([#371](https://github.com/HKU-BAL/Clair3/issues/371)). Added postprocessing `AddPairEndAlleleDepth` (PEAD tag, Bin Guan, NEI). Fixed AF format in GVCF output ([#365](https://github.com/HKU-BAL/Clair3/issues/365)). Added a [split-into-haplotypes calling workflow](docs/split_haplotype_into_haploid_calling.md). `set -o pipefail` in `run_clair3.sh` ([#368](https://github.com/HKU-BAL/Clair3/issues/368)). Clarified parameter docs ([#369](https://github.com/HKU-BAL/Clair3/issues/369)).\n\n**v1.0.10 — Jul 28, 2024.** Fixed an out-of-range bug in non-human GVCF output ([#317](https://github.com/HKU-BAL/Clair3/issues/317)). Faster amplicon calling via `--chunk_num=-1` ([#306](https://github.com/HKU-BAL/Clair3/issues/306)). LongPhase bumped to 1.7.3 ([#321](https://github.com/HKU-BAL/Clair3/issues/321)).\n\n**v1.0.9 — May 15, 2024.** Fixed VCF header ([#305](https://github.com/HKU-BAL/Clair3/pull/305)); updated `DP` FORMAT description.\n\n**v1.0.8 — Apr 29, 2024.** Fixed occasional quality-score differences between VCF and GVCF output. LongPhase bumped to 1.7.\n\n**v1.0.7 — Apr 7, 2024.** Memory guards for full-alignment C implementation ([#286](https://github.com/HKU-BAL/Clair3/pull/286)). Raised max mpileup coverage to 2^20 ([#292](https://github.com/HKU-BAL/Clair3/pull/292)). LongPhase bumped to 1.6.\n\n**v1.0.6 — Mar 15, 2024.** Stack-overflow fix at very high coverage ([#282](https://github.com/HKU-BAL/Clair3/issues/282)). Reference caching for CRAM ([#278](https://github.com/HKU-BAL/Clair3/pull/278)). Fixed RefCall outputs when FA model calls no variant ([#271](https://github.com/HKU-BAL/Clair3/issues/271)). Fixed min-coverage filtering ([#262](https://github.com/HKU-BAL/Clair3/issues/262)). `--min_snp_af` / `--min_indel_af` default to 0.0 when `--vcf_fn` is set ([#261](https://github.com/HKU-BAL/Clair3/issues/261)).\n\n**v1.0.5 — Dec 20, 2023.** Fixed multi-allelic AF at very high coverage ([#241](https://github.com/HKU-BAL/Clair3/issues/241)). `--base_err` and `--gq_bin_size` to reduce excess `./.` in GVCF ([#220](https://github.com/HKU-BAL/Clair3/issues/220)).\n\n**v1.0.4 — Jul 11, 2023.** Command line and reference source now in VCF header. Fixed AF for 1/2 genotypes. Added AD tag.\n\n**v1.0.3 — Jun 20, 2023.** Colon `:` allowed in reference sequence names ([#203](https://github.com/HKU-BAL/Clair3/issues/203)).\n\n**v1.0.2 — May 22, 2023.** Added PacBio HiFi Revio model. Fixed halt on too few variant candidates ([#198](https://github.com/HKU-BAL/Clair3/issues/198)).\n\n**v1.0.1 — Apr 24, 2023.** WhatsHap bumped to 1.7 (~15% faster haplotagging, [#193](https://github.com/HKU-BAL/Clair3/issues/193)). Fixed PL when ALT is N ([#191](https://github.com/HKU-BAL/Clair3/issues/191)).\n\n**v1.0.0 — Mar 6, 2023.** Clair3 version in VCF header ([#141](https://github.com/HKU-BAL/Clair3/issues/141)). NumPy int fix ([#165](https://github.com/HKU-BAL/Clair3/issues/165)). IUPAC → N by default, keep with `--keep_iupac_bases` ([#153](https://github.com/HKU-BAL/Clair3/issues/153)). Added `--use_{longphase,whatshap}_for_intermediate_phasing` / `--use_{longphase,whatshap}_for_final_output_phasing` / `--use_whatshap_for_final_output_haplotagging` ([#164](https://github.com/HKU-BAL/Clair3/issues/164)). Fixed Docker shell under host user mode ([#175](https://github.com/HKU-BAL/Clair3/issues/175)).\n\n**v0.1-r12 — Aug 19, 2022.** CRAM input ([#117](https://github.com/HKU-BAL/Clair3/issues/117)). Python 3.9, TensorFlow 2.8, Samtools 1.15.1, WhatsHap 1.4. `DP` now shows raw coverage for pileup calls ([#128](https://github.com/HKU-BAL/Clair3/issues/128)). Illumina representation-unification fix ([#110](https://github.com/HKU-BAL/Clair3/issues/110)). LongPhase 1.3.\n\n**v0.1-r11 minor 2 — Apr 16, 2022.** Fixed missing non-variant GVCF positions at chunk boundaries. Reduced GVCF memory footprint ([#88](https://github.com/HKU-BAL/Clair3/issues/88)).\n\n**v0.1-r11 — Apr 4, 2022.** ~2.5× faster on ONT Q20 data with pileup and full-alignment feature generation in C. LongPhase as a phasing option (`--longphase_for_phasing`). `--min_coverage`, `--min_mq`, `--min_contig_size`. CSI index support ([#90](https://github.com/HKU-BAL/Clair3/issues/90)). See [Notes on r11](docs/v0.1_r11_speedup.md).\n\n**v0.1-r10 — Jan 13, 2022.** Added the Guppy5 model `r941_prom_sup_g5014` ([benchmarks](docs/guppy5_20220113.md)); applicable to `sup`, `hac`, `fast` reads. The older `r941_prom_sup_g506` was obsoleted. Added `--var_pct_phasing`.\n\n**v0.1-r9 — Dec 1, 2021.** `--enable_long_indel` for indel calls \u003e50 bp ([benchmarks](docs/indel_gt50_performance.md), [#64](https://github.com/HKU-BAL/Clair3/issues/64)).\n\n**v0.1-r8 — Nov 11, 2021.** `--enable_phasing` to emit WhatsHap-phased VCF ([#63](https://github.com/HKU-BAL/Clair3/issues/63)). Fixed unexpected program termination on success.\n\n**v0.1-r7 — Oct 18, 2021.** ONT `var_pct_full` raised 0.3 → 0.7 (+~0.2% indel F1). Fall-through to next-likely variant on low coverage ([#53](https://github.com/HKU-BAL/Clair3/pull/53)). Streamlined training. `mini_epochs` in `Train.py` ([#60](https://github.com/HKU-BAL/Clair3/pull/60)). GVCF intermediates now lz4-compressed (5× smaller). `--remove_intermediate_dir` ([#48](https://github.com/HKU-BAL/Clair3/issues/48)). ONT models renamed per [Medaka](https://github.com/nanoporetech/medaka/blob/master/medaka/options.py#L22) convention. Training-data leakage fixed ([#57](https://github.com/HKU-BAL/Clair3/issues/57)).\n\n**ONT-provided models — Sep 23, 2021.** ONT also provides chemistry-/basecaller-specific Clair3 models via [Rerio](https://github.com/nanoporetech/rerio).\n\n**v0.1-r6 — Sep 4, 2021.** Reduced `SortVcf` memory ([#45](https://github.com/HKU-BAL/Clair3/issues/45)). Lower `ulimit -n` requirement ([#47](https://github.com/HKU-BAL/Clair3/issues/47)). Clair3-Illumina in bioconda ([#42](https://github.com/HKU-BAL/Clair3/issues/42)).\n\n**v0.1-r5 — Jul 19, 2021.** Training-data generator fix to avoid Tensorflow segfaults. Simplified Dockerfile. Fixed ALT output for reference calls. Fixed multi-allelic AF ([ACGT]Del). AD tag in GVCF. `--call_snp_only` ([#40](https://github.com/HKU-BAL/Clair3/issues/40)). Pileup/FA validity checks ([#32](https://github.com/HKU-BAL/Clair3/issues/32), [#38](https://github.com/HKU-BAL/Clair3/issues/38)).\n\n**v0.1-r4 — Jun 28, 2021.** Bioconda install. ONT Guppy2 model ([benchmarks](docs/guppy2.md) — must be used on Guppy2-or-earlier data). [Colab notebooks](colab). Fix on too few variant candidates ([#28](https://github.com/HKU-BAL/Clair3/issues/28)).\n\n**v0.1-r3 — Jun 9, 2021.** `ulimit -u` check with auto-retry on failed jobs ([#20](https://github.com/HKU-BAL/Clair3/issues/20), [#23](https://github.com/HKU-BAL/Clair3/issues/23), [#24](https://github.com/HKU-BAL/Clair3/issues/24)). ONT Guppy5 model ([benchmarks](docs/guppy5.md)).\n\n**v0.1-r2 — May 23, 2021.** BED out-of-range fix ([#12](https://github.com/HKU-BAL/Clair3/issues/12)). Both `.bam.bai` and `.bai` accepted ([#10](https://github.com/HKU-BAL/Clair3/issues/10)). Boundary and package version checks.\n\n**v0.1-r1 — May 18, 2021.** Relative paths in Conda ([#5](https://github.com/HKU-BAL/Clair3/issues/5)). `taskset` CPU-core visibility fix and Singularity image ([#6](https://github.com/HKU-BAL/Clair3/issues/6)).\n\n**v0.1 — May 17, 2021.** Initial release.\n\n\u003c/details\u003e\n\n---\n\n## Installation\n\n\u003e **Pick the right install method for your hardware:**\n\u003e - **CPU** → Docker (Option 1), Singularity (Option 2), or Bioconda (Option 3).\n\u003e - **NVIDIA GPU (Linux)** → Docker GPU (Option 1) or Singularity GPU (Option 2); fall back to Step-by-step (Option 4) if unsupported.\n\u003e - **Apple Silicon (M1/M2/M3/M4)** → Step-by-step (Option 4).\n\u003e\n\u003e See the [GPU Quick Start](docs/gpu_quick_start.md) for tuned settings.\n\n### Option 1. Docker\n\nPre-built image: [hkubal/clair3](https://hub.docker.com/r/hkubal/clair3).\n\n\u003e **Use absolute paths** for `INPUT_DIR` and `OUTPUT_DIR`.\n\n#### CPU\n\n```bash\nINPUT_DIR=\"[YOUR_INPUT_FOLDER]\"        # e.g. /home/user1/input  (absolute path)\nOUTPUT_DIR=\"[YOUR_OUTPUT_FOLDER]\"      # e.g. /home/user1/output (absolute path)\nTHREADS=\"[MAXIMUM_THREADS]\"            # e.g. 8\nMODEL_NAME=\"[YOUR_MODEL_NAME]\"         # e.g. r1041_e82_400bps_sup_v500\n\ndocker run -it \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1 \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\                       ## {ont,hifi,ilmn}\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR}\n```\n\n\u003e `python3 /opt/bin/run_clair3.py` can replace `/opt/bin/run_clair3.sh` in the command above.\n\n#### GPU (NVIDIA CUDA on Linux)\n\nImage: `hkubal/clair3:v2.0.1_gpu` (built on CUDA 12.1).\n\n**Requirements**\n\n- NVIDIA driver ≥ 530.30.02.\n- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed on the host.\n\n```bash\ndocker run -it --gpus all \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1_gpu \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\                       ## {ont,hifi,ilmn}\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --use_gpu\n```\n\n**Notes**\n\n- Select specific GPUs with `--gpus '\"device=0,1\"'` (Docker) and `--device=cuda:0,1` (Clair3).\n- If the image does not work on your setup (unsupported driver/CUDA, no NVIDIA Container Toolkit, Apple Silicon, etc.), fall back to [Step-by-step (Option 4)](#option-4-step-by-step-conda).\n\n### Option 2. Singularity\n\n\u003e **Use absolute paths** for `INPUT_DIR` and `OUTPUT_DIR`.\n\n#### CPU\n\n```bash\nconda config --add channels defaults\nconda create -n singularity-env -c conda-forge singularity -y\nconda activate singularity-env\n\nsingularity pull docker://hkubal/clair3:v2.0.1\n\nsingularity exec \\\n  -B ${INPUT_DIR},${OUTPUT_DIR} \\\n  clair3_v2.0.1.sif \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\                       ## {ont,hifi,ilmn}\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR}\n```\n\n#### GPU (NVIDIA CUDA on Linux)\n\n**Requirements**\n\n- NVIDIA driver ≥ 530.30.02.\n- Singularity (or Apptainer) with `--nv` support.\n\n```bash\nsingularity pull docker://hkubal/clair3:v2.0.1_gpu\n\nsingularity exec --nv --cleanenv --env TMPDIR=/tmp \\\n  -B ${INPUT_DIR},${OUTPUT_DIR} \\\n  clair3_v2.0.1_gpu.sif \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\                       ## {ont,hifi,ilmn}\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --use_gpu\n```\n\n**Notes**\n\n- `--nv` injects the host NVIDIA driver and libraries into the container (equivalent of Docker's `--gpus all`); no NVIDIA Container Toolkit needed.\n- `--cleanenv --env TMPDIR=/tmp` avoids `parallel` failing when the host `TMPDIR` points to a path not visible inside the container.\n- If the image does not work on your setup, fall back to [Step-by-step (Option 4)](#option-4-step-by-step-conda).\n\n### Option 3. Bioconda\n\nClair3 is available on [Bioconda](https://bioconda.github.io/recipes/clair3/README.html). The recipe bundles PyPy, samtools, parallel, whatshap, LongPhase, and the pre-trained models under `${CONDA_PREFIX}/bin/models/`. See [bioconda-recipes#64260](https://github.com/bioconda/bioconda-recipes/pull/64260) for the v2 (PyTorch) recipe.\n\n```bash\nmamba create -n clair3 -c conda-forge -c bioconda -y clair3\nmamba activate clair3\n\nMODEL_NAME=\"[YOUR_MODEL_NAME]\"         # e.g. r1041_e82_400bps_sup_v500\n\nrun_clair3.sh \\\n  --bam_fn=input.bam \\\n  --ref_fn=ref.fa \\\n  --threads=${THREADS} \\\n  --platform=ont \\                 ## {ont,hifi,ilmn}\n  --model_path=${CONDA_PREFIX}/bin/models/${MODEL_NAME} \\\n  --output=${OUTPUT_DIR}\n```\n\n\u003e **Note.** The Bioconda package ships a CPU-only PyTorch build. For NVIDIA GPU or Apple Silicon, use [Step-by-step (Option 4)](#option-4-step-by-step-conda).\n\n### Option 4. Step-by-step (Conda)\n\nInstall Mamba or Conda from [miniforge](https://github.com/conda-forge/miniforge) (Mamba is much faster).\n\n**Step 1 — Create and activate the environment**\n\n```bash\nmamba create -n clair3_v2 -c conda-forge -c bioconda -y \\\n  python=3.11 samtools whatshap parallel \\\n  zstd xz zlib bzip2 automake make gcc gxx curl pigz\nmamba activate clair3_v2\npip install uv\n```\n\n**Step 2 — Install PyTorch**\n\nPick the right build for your system from the [PyTorch website](https://pytorch.org/get-started/locally/).\n\n```bash\n# Example: NVIDIA CUDA 13.0\nuv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130\n\n# Or: CPU only\nuv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu\n```\n\n**Step 3 — Clone Clair3**\n\n```bash\ncd ${HOME}\ngit clone https://github.com/HKU-BAL/Clair3.git\ncd Clair3\nexport CLAIR3_PATH=$(pwd)\n```\n\n**Step 4 — Install Python deps and build C sources**\n\n```bash\nuv pip install numpy h5py hdf5plugin numexpr tqdm cffi torchmetrics\nmake PREFIX=${CONDA_PREFIX}\n```\n\n\u003e `make` compiles samtools/htslib, LongPhase, and the Clair3 C shared library (`libclair3.so`) used for fast pileup and full-alignment tensor generation.\n\n**Step 5 — Install PyPy3.11** (speeds up preprocessing)\n\n```bash\nwget https://downloads.python.org/pypy/pypy3.11-v7.3.20-linux64.tar.bz2\ntar -xjf pypy3.11-v7.3.20-linux64.tar.bz2 \u0026\u0026 rm pypy3.11-v7.3.20-linux64.tar.bz2\n\nln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy3\nln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy\n\npypy3 -m ensurepip\npypy3 -m pip install mpmath==1.2.1\n```\n\n**Step 6 — (Optional) Download pre-trained models**\n\n```bash\ncd ${CLAIR3_PATH}\nmkdir -p models\nwget -r -np -nH --cut-dirs=2 -R \"index.html*\" -P ./models \\\n  https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/\n```\n\nIndividual models can also be grabbed from [the model index](https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/).\n\n**Step 7 — Run Clair3**\n\n```bash\nMODEL_NAME=r1041_e82_400bps_sup_v500\n${CLAIR3_PATH}/run_clair3.sh \\\n  --bam_fn=input.bam \\\n  --ref_fn=ref.fa \\\n  --threads=${THREADS} \\\n  --platform=ont \\\n  --model_path=${CLAIR3_PATH}/models/${MODEL_NAME} \\\n  --output=${OUTPUT_DIR}\n```\n\n\u003e `python3 ${CLAIR3_PATH}/run_clair3.py` accepts the same arguments and can be used interchangeably.\n\n---\n\n## Pre-trained Models\n\n\u003e **Important: v1 TensorFlow models are not compatible with Clair3 v2** (including the TF models ONT provides via Rerio). Convert your own with the [Model Migration Guide](docs/model_migration_guide.md), or use the pre-converted models below.\n\n**Download:**\n\n- HKU-provided: \u003chttps://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/\u003e\n- Converted ONT Rerio: \u003chttps://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/\u003e\n\n**Bundled locations:** `/opt/models/` (Docker) · `${CONDA_PREFIX}/bin/models/` (Bioconda).\n\n### HKU-provided models\n\nListed at \u003chttps://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/\u003e.\n\n| Model | Platform | `--platform` | Training samples / Notes | Bioconda | Docker |\n| --- | --- | :-: | --- | :-: | :-: |\n| **`r1041_e82_400bps_hac_v600_with_mv`** *(latest)* | ONT R10.4.1 E8.2 (5 kHz), HAC | `ont` | HG001,2,5 (chr20 excluded) — **signal-aware**, use `--enable_dwell_time` | | |\n| **`r1041_e82_400bps_hac_v520_with_mv`** *(latest)* | ONT R10.4.1 E8.2 (5 kHz), HAC | `ont` | HG001,2,5 (chr20 excluded) — **signal-aware**, use `--enable_dwell_time` | | ✓ |\n| **`r1041_e82_400bps_sup_v520_with_mv`** *(latest)* | ONT R10.4.1 E8.2 (5 kHz), SUP | `ont` | HG001,2,5 (chr20 excluded) — **signal-aware**, use `--enable_dwell_time` | | ✓ |\n| `r1041_e82_400bps_sup_v430_bacteria_finetuned` | ONT R10.4.1 | `ont` | Fine-tuned on 12 [bacterial genomes](https://elifesciences.org/reviewed-preprints/98300) | | ✓ |\n| `r941_prom_sup_g5014` | ONT R9.4.1, Guppy5 SUP | `ont` | HG002,4,5; also usable on HAC reads ([benchmarks](docs/guppy5_20220113.md)) | ✓ | ✓ |\n| `r941_prom_hac_g360+g422` | ONT R9.4.1, Guppy3/4 HAC | `ont` | HG001,2,4,5 | | |\n| `hifi_revio` | PacBio HiFi Revio | `hifi` | HG002,4 | ✓ | ✓ |\n| `hifi_sequel2` | PacBio HiFi Sequel II | `hifi` | HG001,2,4,5 | ✓ | ✓ |\n| `ilmn` | Illumina | `ilmn` | HG001,2,4,5 | ✓ | ✓ |\n\n\u003e **Recommendation for modern ONT R10.4.1 data:** when your BAM has Dorado `mv` tags, use the dwell-time model (`..._with_mv`) for the best accuracy; otherwise, use an ONT-trained model below.\n\n### ONT-provided models (bundled)\n\n\u003e ONT's models are fine-tuned to specific chemistries / basecallers and **typically outperform the HKU baselines** — we recommend using them for best results. Official PyTorch distributions from ONT are in progress; in the meantime, use the [converted Rerio models](#converted-rerio-models) below.\n\nThe following ONT-trained models are bundled with Clair3 Docker / Bioconda since v1.1.1:\n\n| Model | Chemistry | Dorado model | Bioconda | Docker |\n| --- | --- | --- | :-: | :-: |\n| `r1041_e82_400bps_sup_v500` | R10.4.1 E8.2 (5 kHz) | v5.0.0 SUP | ✓ | ✓ |\n| `r1041_e82_400bps_hac_v500` | R10.4.1 E8.2 (5 kHz) | v5.0.0 HAC | | ✓ |\n| `r1041_e82_400bps_sup_v410` | R10.4.1 E8.2 (4 kHz) | v4.1.0 SUP | ✓ | ✓ |\n| `r1041_e82_400bps_hac_v410` | R10.4.1 E8.2 (4 kHz) | v4.1.0 HAC | | ✓ |\n\n\u003e **ONT has released newer Dorado v5.2.0 models** (`r1041_e82_400bps_sup_v520` / `hac_v520`). They are not yet bundled in Docker / Bioconda — download them from the [Converted Rerio models](#converted-rerio-models) section below.\n\n### Converted Rerio models\n\nThe full ONT [Rerio](https://github.com/nanoporetech/rerio) catalog converted to PyTorch for Clair3 v2 is available at \u003chttps://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/\u003e. A selection of recent R10.4.1 E8.2 (5 kHz) models is listed below.\n\n| Model | Chemistry | Dorado model |\n| --- | --- | --- |\n| `r1041_e82_400bps_hac_v600` *(latest)* | R10.4.1 E8.2 (5 kHz) | v6.0.0 HAC |\n| `r1041_e82_400bps_sup_v520` *(latest)* | R10.4.1 E8.2 (5 kHz) | v5.2.0 SUP |\n| `r1041_e82_400bps_hac_v520` | R10.4.1 E8.2 (5 kHz) | v5.2.0 HAC |\n| `r1041_e82_400bps_sup_v500` | R10.4.1 E8.2 (5 kHz) | v5.0.0 SUP |\n| `r1041_e82_400bps_hac_v500` | R10.4.1 E8.2 (5 kHz) | v5.0.0 HAC |\n| `r1041_e82_400bps_sup_v430` | R10.4.1 E8.2 (5 kHz) | v4.3.0 SUP |\n| `r1041_e82_400bps_hac_v430` | R10.4.1 E8.2 (5 kHz) | v4.3.0 HAC |\n| `r1041_e82_400bps_sup_v410` | R10.4.1 E8.2 (5 kHz) | v4.1.0 SUP |\n| `r1041_e82_400bps_hac_v410` | R10.4.1 E8.2 (5 kHz) | v4.1.0 HAC |\n\nFor other chemistries and basecaller versions (R10.4.1 E8.2 260 bps, R10.4 E8.1, earlier Guppy `g6xx` / `g5015`, v4.0.0 / v4.2.0), browse the full [model directory](https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/) and pick the one matching your chemistry and basecaller (Dorado / Guppy) version.\n\n---\n\n## Quick Demo\n\n- **ONT with dwelling time** — [ONT Dwelling Time Quick Demo](docs/quick_demo/ont_mv_quick_demo.md)\n- **Oxford Nanopore (ONT)** — [ONT Quick Demo](docs/quick_demo/ont_quick_demo.md)\n- **PacBio HiFi** — [PacBio HiFi Quick Demo](docs/quick_demo/pacbio_hifi_quick_demo.md)\n- **Illumina NGS** — [Illumina Quick Demo](docs/quick_demo/illumina_quick_demo.md)\n\n---\n\n## Usage\n\n### General usage\n\n\u003e **Caution:** Use `=value` for all parameters, e.g. `--bed_fn=fn.bed` (not `--bed_fn fn.bed`).\n\n```bash\n./run_clair3.sh \\\n  --bam_fn=${BAM} \\\n  --ref_fn=${REF} \\\n  --threads=${THREADS} \\\n  --platform=ont \\                 ## {ont,hifi,ilmn}\n  --model_path=${MODEL_PREFIX} \\\n  --output=${OUTPUT_DIR} \\\n  --include_all_ctgs               ## required for non-human species\n```\n\nOutputs:\n\n| File | Description |\n| --- | --- |\n| `${OUTPUT_DIR}/pileup.vcf.gz` | Pileup model calls |\n| `${OUTPUT_DIR}/full_alignment.vcf.gz` | Full-alignment model calls |\n| `${OUTPUT_DIR}/merge_output.vcf.gz` | **Final Clair3 output** |\n\nBy default, variants are called on `chr{1..22,X,Y}` and `{1..22,X,Y}`. Override with `--include_all_ctgs`, `--ctg_name`, or `--bed_fn`.\n\n\u003e `python3 run_clair3.py` is interchangeable with `./run_clair3.sh`.\n\n### Options\n\n**Required**\n\n```\n-b, --bam_fn=FILE         Indexed BAM input.\n-f, --ref_fn=FILE         Indexed FASTA reference.\n-m, --model_path=STR      Folder containing pileup.pt and full_alignment.pt.\n-t, --threads=INT         Max threads. Each chunk uses 4; ceil(threads/4)*3 chunks run in parallel.\n-p, --platform=STR        {ont,hifi,ilmn}\n-o, --output=PATH         VCF/GVCF output directory.\n```\n\n**Common options**\n\n```\n    --bed_fn=FILE                     Call variants only in these BED regions.\n    --vcf_fn=FILE                     Candidate sites VCF; only call at these sites.\n    --ctg_name=STR                    Sequence(s) to process.\n    --sample_name=STR                 Sample name in the output VCF.\n    --qual=INT                        Variants with QUAL \u003e $qual are PASS, else LowQual.\n    --chunk_size=INT                  Chunk size for parallel processing. Default: 5000000.\n    --pileup_only                     Pileup model only. Default: disable.\n    --print_ref_calls                 Include 0/0 calls in the VCF. Default: disable.\n    --include_all_ctgs                Call on all contigs. Default: chr{1..22,X,Y}.\n    --gvcf                            Emit GVCF. Default: disable.\n    --remove_intermediate_dir         Drop intermediate files when no longer needed.\n```\n\n**GPU / signal-aware**\n\n```\n    --use_gpu                         Enable GPU-accelerated calling.\n    --device=STR                      GPU device(s), e.g. 'cuda:0' or 'cuda:0,1'. Default: all visible GPUs.\n    --enable_dwell_time               Signal-aware calling via Dorado mv tags (ONT only; C impl required).\n```\n\n**Phasing**\n\n```\n    --use_whatshap_for_intermediate_phasing      Default: enable.\n    --use_longphase_for_intermediate_phasing     Default: disable.\n    --use_whatshap_for_final_output_phasing      Default: disable.\n    --use_longphase_for_final_output_phasing     Default: disable.\n    --use_whatshap_for_final_output_haplotagging Default: disable.\n    --enable_phasing                             Alias of --use_whatshap_for_final_output_phasing (legacy).\n    --longphase_for_phasing                      Alias of --use_longphase_for_intermediate_phasing (legacy).\n```\n\n**External binaries**\n\n```\n    --samtools=STR     samtools \u003e= 1.10\n    --python=STR       python3 \u003e= 3.6\n    --pypy=STR         pypy3 \u003e= 3.6\n    --parallel=STR     parallel \u003e= 20191122\n    --whatshap=STR     whatshap \u003e= 1.0\n    --longphase=STR    longphase \u003e= 1.0\n```\n\n**Experimental / advanced**\n\n```\n    --snp_min_af=FLOAT        Min SNP AF. Default: ont/hifi/ilmn = 0.08.\n    --indel_min_af=FLOAT      Min indel AF. Default: ont=0.15, hifi/ilmn=0.08.\n    --var_pct_full=FLOAT      Pct of low-quality 0/1 and 1/1 pileup calls rerun in full-alignment. Default: 0.3.\n    --ref_pct_full=FLOAT      Pct of low-quality 0/0 pileup calls rerun in full-alignment. Default: 0.3 (ilmn/hifi), 0.1 (ont).\n    --var_pct_phasing=FLOAT   Pct of high-quality 0/1 pileup variants used for WhatsHap phasing. Default: 0.8 (ont guppy5), 0.7 (others).\n    --pileup_model_prefix=STR Pileup model prefix. Default: pileup.\n    --fa_model_prefix=STR     Full-alignment model prefix. Default: full_alignment.\n    --min_mq=INT              Filter reads with MAPQ \u003c $min_mq. Default: 5.\n    --min_coverage=INT        Min coverage to call a variant. Default: 2.\n    --min_contig_size=INT     Skip contigs smaller than $min_contig_size. Default: 0.\n    --fast_mode               Skip candidates with AF \u003c= 0.15.\n    --haploid_precise         Haploid: only 1/1 is a variant.\n    --haploid_sensitive       Haploid: 0/1 and 1/1 are variants.\n    --no_phasing_for_fa       Skip WhatsHap phasing in full-alignment calling.\n    --call_snp_only           Skip indels.\n    --enable_long_indel       Call indels \u003e 50 bp.\n    --keep_iupac_bases        Keep IUPAC bases (default: convert to N).\n    --base_err=FLOAT          Estimated base error rate for GVCF. Default: 0.001.\n    --gq_bin_size=INT         GQ bin size for non-variant merging in GVCF. Default: 5.\n    --enable_variant_calling_at_sequence_head_and_tail\n                              Call in the first/last 16 bp of a sequence (amplicon-friendly).\n    --output_all_contigs_in_gvcf_header\n                              List all contigs in the GVCF header.\n    --disable_c_impl          Disable C implementation for tensor creation (default: enable).\n```\n\n### Examples\n\n#### Call variants on selected chromosomes\n\n```bash\nCONTIGS_LIST=\"[YOUR_CONTIGS_LIST]\"     # e.g \"chr21\" or \"chr21,chr22\"\n\ndocker run -it \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1 \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --ctg_name=${CONTIGS_LIST}\n```\n\n#### Call variants at known sites\n\n```bash\nKNOWN_VARIANTS_VCF=\"[YOUR_VCF_PATH]\"   # e.g. /home/user1/known_variants.vcf.gz\n\ndocker run -it \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1 \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --vcf_fn=${KNOWN_VARIANTS_VCF}\n```\n\n#### Call variants in BED regions\n\n\u003e A BED file is recommended over point coordinates.\n\n```bash\n# Build a BED (0-based, \"ctg start end\") if needed\necho -e \"${CONTIGS}\\t${START_POS}\\t${END_POS}\" \u003e /home/user1/tmp.bed\n\nBED_FILE_PATH=\"[YOUR_BED_FILE]\"        # e.g. /home/user1/tmp.bed\n\ndocker run -it \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1 \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --bed_fn=${BED_FILE_PATH}\n```\n\n#### Call variants in non-diploid organisms (haploid)\n\n```bash\ndocker run -it \\\n  -v ${INPUT_DIR}:${INPUT_DIR} \\\n  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \\\n  hkubal/clair3:v2.0.1 \\\n  /opt/bin/run_clair3.sh \\\n    --bam_fn=${INPUT_DIR}/input.bam \\\n    --ref_fn=${INPUT_DIR}/ref.fa \\\n    --threads=${THREADS} \\\n    --platform=ont \\\n    --model_path=/opt/models/${MODEL_NAME} \\\n    --output=${OUTPUT_DIR} \\\n    --no_phasing_for_fa \\                      ## disable FA phasing\n    --include_all_ctgs \\                       ## call on all contigs\n    --haploid_precise \\                        ## or --haploid_sensitive\n    --enable_variant_calling_at_sequence_head_and_tail\n```\n\n---\n\n## Advanced Topics\n\n### Dwelling Time Feature\n\nClair3 v2.0 introduces **signal-aware variant calling** for Oxford Nanopore data. Dwell time (signal duration per base) extracted from BAM `mv` tags is used as an additional input channel to the full-alignment model, improving accuracy.\n\n```bash\n./run_clair3.sh \\\n  --bam_fn=input.bam \\\n  --ref_fn=ref.fa \\\n  --threads=8 \\\n  --platform=ont \\\n  --model_path=${MODEL_PATH} \\\n  --output=${OUTPUT_DIR} \\\n  --enable_dwell_time\n```\n\n**Requirements**\n\n- BAM must contain `mv` (move-table) tags from Dorado with `--emit-moves`.\n- `--platform=ont`.\n- C implementation must be enabled (default; do **not** pass `--disable_c_impl`).\n\nSee [Dwelling Time Feature](docs/dwelling_time.md) (full guide incl. training) and the [ONT Dwelling Time Quick Demo](docs/quick_demo/ont_mv_quick_demo.md).\n\n### Dealing with amplicon data\n\n- Use `--enable_variant_calling_at_sequence_head_and_tail`.\n- If coverage is excessively high: set `--var_pct_full=1` and `--ref_pct_full=1`.\n  - Human: also set `--var_pct_phasing=1`.\n  - Non-human: add `--no_phasing_for_fa`.\n- Context: discussions [#160](https://github.com/HKU-BAL/Clair3/issues/160#issuecomment-1396743261), [#240](https://github.com/HKU-BAL/Clair3/issues/240).\n\n### Postprocessing scripts\n\n#### `SwitchZygosityBasedOnSVCalls`\n\nGiven a Clair3 VCF and a Sniffle2 SV VCF, this module re-labels Clair3 SNPs from homozygous to heterozygous when both:\n\n1. AF ≤ 0.7, and\n2. the ±16 bp flanking region falls inside one or more SV deletions.\n\nTwo INFO tags are added: `SVBASEDHET` and `ORG_CLAIR3_SCORE` (original QUAL). The new QUAL becomes the top QUAL among overlapping deletions. Inspired by Philipp Rescheneder (ONT).\n\n```bash\npypy3 ${CLAIR3_PATH}/clair3.py SwitchZygosityBasedOnSVCalls \\\n  --bam_fn input.bam \\\n  --clair3_vcf_input clair3_input.vcf.gz \\\n  --sv_vcf_input sniffle2.vcf.gz \\\n  --vcf_output output.vcf \\\n  --threads 8\n```\n\n---\n\n## Reference\n\n### Folder Structure and Submodules\n\n\u003e All submodules accept `-h` / `--help`.\n\n**`clair3/`** — not pypy-compatible, run with python.\n\n| Submodule | Description |\n| --- | --- |\n| `CallVariants` | Call variants from a trained model and candidate tensors. |\n| `CallVarBam` | Call variants from a trained model and a BAM. |\n| `Train` | Train a model with AdamW (PyTorch). DDP via `torchrun`. Initial LR `1e-3` with warm-up. Takes tensor binaries from `Tensor2Bin`. |\n\n**`preprocess/`** — pypy-compatible unless noted.\n\n| Submodule | Description |\n| --- | --- |\n| `CheckEnvs` | Validate inputs/environment; preprocess BED; `--chunk_size` sets per-job chunk size. |\n| `CreateTensorPileup` | Generate pileup tensors for training/calling. |\n| `CreateTensorFullAlignment` | Generate phased full-alignment tensors. |\n| `GetTruth` | Extract variants from a truth VCF (reference FASTA required if ALT contains `*`). |\n| `MergeVcf` | Merge pileup and full-alignment VCF/GVCF. |\n| `RealignReads` | Local read realignment (Illumina). |\n| `SelectCandidates` | Select pileup candidates for full-alignment calling. |\n| `SelectHetSnp` | Select heterozygous-SNP candidates for WhatsHap phasing. |\n| `SelectQual` | Select a quality cutoff from pileup results; variants below it go to phasing + full-alignment. |\n| `SortVcf` | Sort a VCF file. |\n| `SplitExtendBed` | Split BED by contig; extend by 33 bp for variant calling. |\n| `UnifyRepresentation` | Representation unification between candidates and truth. |\n| `MergeBin` | Merge tensor binaries. |\n| `CreateTrainingTensor` | Create training tensor binaries (pileup or full-alignment). |\n| `Tensor2Bin` | Combine variant/non-variant tensors into a `blosc:lz4hc` binary (**not pypy-compatible**; ~10–15 GB training memory). |\n\n### Training Data\n\nPileup and full-alignment models were trained on four GIAB samples (HG001, HG002, HG004, HG005), excluding HG003. On ONT, a second model trained on HG001–3, 5 excluded HG004. Chr20 was excluded from all training (chr1–19, 21, 22 only).\n\n| Platform | Reference | Aligner | Training samples |\n| --- | --- | --- | --- |\n| ONT | GRCh38_no_alt | minimap2 | HG001,2,(3\\|4),5 |\n| PacBio HiFi Sequel II | GRCh38_no_alt | pbmm2 | HG001,2,4,5 |\n| PacBio HiFi Revio | GRCh38_no_alt | pbmm2 | HG002,4 |\n| Illumina | GRCh38 | BWA-MEM / NovoAlign | HG001,2,4,5 |\n\nFull details and download links: [Training Data](docs/training_data.md).\n\n### VCF/GVCF Output Formats\n\nClair3 uses **VCF 4.2**. Extra INFO tags distinguish call source:\n\n- `P` — called by the pileup model.\n- `F` — called by the full-alignment model.\n\nGVCF output is **GATK-compatible** and passes GATK `ValidateVariants`. Clair3 uses `\u003cNON_REF\u003e` (same as GATK), not DeepVariant's `\u003c*\u003e`. Merge with GLNexus — a caller-specific config is [available for download](http://www.bio8.cs.hku.hk/clair3_trio/config/clair3.yml).\n\n### Model Training Guides\n\n- [Pileup model training](docs/pileup_training.md)\n- [Full-alignment model training](docs/full_alignment_training_r1.md)\n- [Representation unification](docs/representation_unification.md)\n- [Model migration (TensorFlow → PyTorch)](docs/model_migration_guide.md)\n\n### Visualization\n\n- [Model input visualization](docs/model_input_visualization.md)\n- [Representation unification visualization](docs/representation_unification_visualization.md)\n\n---\n\n## Citation\n\n| Paper | Venue | Topic |\n| --- | --- | --- |\n| Symphonizing pileup and full-alignment for deep learning-based long-read variant calling | [Nature Computational Science](https://rdcu.be/c1TPa) · [bioRxiv preprint](https://www.biorxiv.org/content/10.1101/2021.12.29.474431v2) | Original Clair3 |\n| Accelerated long-read variant calling with Clair3 for whole-genome sequencing | [Bioinformatics, 2026](https://doi.org/10.1093/bioinformatics/btag181) | GPU-accelerated Clair3 |\n| Leveraging ONT move table values for signal aware variant calling | [bioRxiv preprint, 2026](https://www.biorxiv.org/content/10.64898/2026.02.13.705285v1) | ONT `mv`-tag (move-table) signal-aware tuning |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKU-BAL%2FClair3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHKU-BAL%2FClair3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKU-BAL%2FClair3/lists"}