https://github.com/vineetver/favor-cli
From raw variants to biological mechanisms in one tool
https://github.com/vineetver/favor-cli
annotation association-testing bioinformatics genomics rare-variant rust staar whole-genome-sequencing
Last synced: about 1 month ago
JSON representation
From raw variants to biological mechanisms in one tool
- Host: GitHub
- URL: https://github.com/vineetver/favor-cli
- Owner: vineetver
- License: gpl-3.0
- Created: 2026-04-05T23:34:21.000Z (2 months ago)
- Default Branch: master
- Last Pushed: 2026-04-27T21:03:33.000Z (about 1 month ago)
- Last Synced: 2026-04-27T23:08:08.033Z (about 1 month ago)
- Topics: annotation, association-testing, bioinformatics, genomics, rare-variant, rust, staar, whole-genome-sequencing
- Language: Rust
- Homepage: https://github.com/vineetver/favor-cli
- Size: 1.17 MB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 38
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
FAVOR CLI
Raw variants in. Rare-variant results out.
Annotate. Enrich. Analyze. Interpret.
Install · Quick Start · Commands · Roadmap · Citation
---
> **Pre-1.0.** Commands and interfaces may change between releases.
## Install
```bash
curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | sh
```
## Quick Start
```bash
# 1. configure: point at a data directory + choose annotation tier
favor setup --root /data/favor --tier base
# 2. pull annotation data (~200 GB base, ~508 GB full)
favor data pull
# 3. ingest and annotate variants
favor ingest variants.vcf.gz
favor annotate variants.ingested
# 4. run STAAR rare-variant association
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
--trait-name LDL --covariates age,sex,PC1,PC2 \
--annotations variants.annotated
```
## Commands
| Command | What it does |
|---------|-------------|
| `favor setup` | Configure data root, annotation tier, environment |
| `favor data pull` | Download annotation parquets and optional packs |
| `favor ingest` | Normalize VCF/TSV/CSV into canonical parquet variant sets |
| `favor annotate` | Join variants against FAVOR base or full annotations |
| `favor enrich` | Overlay tissue-specific eQTL, regulatory, enhancer-gene data |
| `favor staar` | STAAR rare-variant association testing |
| `favor meta-staar` | Cross-study meta-analysis from summary statistics |
| `favor schema` | Inspect annotation table columns and types |
| `favor manifest` | Show installed data and available commands |
Use `--format json` for machine-readable output. Use `--dry-run` before heavy computation.
## Data layout
FAVOR CLI uses two separate storage areas:
**Data root** (`--root` during setup) holds annotation parquets shared across projects:
```
/data/favor/
base/chromosome=*/sorted.parquet # base tier (~200 GB)
full/chromosome=*/sorted.parquet # full tier (~508 GB)
tissue/ # optional enrichment packs
reference/ # gene index, cCRE regions (40 MB, always installed)
rollups/ # gene-level summaries (49 MB, always installed)
variant_in_region/ # variant-region junction (155 GB, always installed)
variant_eqtl/ # GTEx eQTL (3 GB, optional)
region_ccre_tissue_signals/ # ENCODE regulatory (18 GB, optional)
...
```
**Project store** (`.cohort/` in your working directory) holds per-project data:
```
my_study/
.cohort/
cohorts// # built by favor ingest or favor staar
manifest.json
samples.txt
chromosome=*/
sparse_g.bin # sparse genotype matrix (mmap'd)
variants.parquet # variant metadata + STAAR weights
membership.parquet # gene-variant assignments
cache/score_cache/ # reused across mask/MAF reruns
annotations/refs.toml # attached annotation databases
```
The store root is resolved as: `--store-path` flag > `FAVOR_STORE` env > walk up for `.cohort/` > `/.cohort/`.
See [Setup guide](docs/setup.md) for detailed configuration, pack selection, HPC tips, and working directory organization.
## Resource requirements
Tested on UKB exome chr22 (~200K samples, ~400K variants, ~17K rare) with 64 GB. Full genome not yet tested.
```text
samples RAM notes
─────── ────── ─────────────────────────────
10K 32 GB comfortable
200K 64 GB tested (UKB exome chr22)
```
Memory, threads, and temp directory are auto-detected from SLURM and cgroup. Override with:
```text
SLURM_MEM_PER_NODE memory pool
FAVOR_KINSHIP_MEM_GB kinship budget (default 16 GB)
TMPDIR scratch space
```
## Docs
- **[Setup guide](docs/setup.md)** - installation, configuration, data management, HPC best practices
- [Ingest](docs/ingest.md) - VCF ingest patterns, preflight, throughput
- [Genotype store](docs/storage.md) - sparse genotype store for rare-variant analysis
- [STAAR](docs/staar.md) - null model, score test, masks, outputs, meta-analysis
- [Validation](docs/validation.md) - statistical accuracy vs R reference
- [Statistical divergences](docs/statistical-divergences.md) - known differences from R STAAR/SKAT and why
- [Performance](docs/performance.md) - benchmarks and optimization roadmap
- [Agent reference](AGENTS.md) - machine interface for LLM agents
## Roadmap
| Milestone | Focus |
|-----------|-------|
| [v0.2.0 - STAAR hardening](https://github.com/vineetver/favor-cli/milestone/1) | GRM, score validation, multi-VCF input, performance profiling |
| [v0.3.0 - MetaSTAAR](https://github.com/vineetver/favor-cli/milestone/2) | cross-biobank meta-analysis, allele flip, conditional, effect sizes |
| [v0.4.0 - Interpret](https://github.com/vineetver/favor-cli/milestone/3) | variant interpretation, fine-mapping, colocalization, V2G, tiers |
| [v0.5.0 - memory and thread pool overhaul](https://github.com/vineetver/favor-cli/milestone/5) | one compute handle, bounded scratch, machine-visible resource control |
| [v0.6.0 - storage and query engine](https://github.com/vineetver/favor-cli/milestone/6) | store format, query paths, incremental ingest, cloud I/O, agent-friendly queries |
| [v1.0.0 - Production](https://github.com/vineetver/favor-cli/milestone/4) | orchestration, provenance, QC, full test suite |
## Citation
FAVOR CLI implements the [STAAR](https://github.com/xihaoli/STAARpipeline) framework and the [FAVOR](https://favor.genohub.org) annotation database. If you use this tool, please cite:
> Li Z\*, Li X\*, Zhou H, et al. **A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies.** *Nature Methods*, 19(12), 1599-1611 (2022). [DOI: 10.1038/s41592-022-01640-x](https://doi.org/10.1038/s41592-022-01640-x)
> Li X\*, Li Z\*, Zhou H, et al. **Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.** *Nature Genetics*, 52(9), 969-983 (2020). [DOI: 10.1038/s41588-020-0676-4](https://doi.org/10.1038/s41588-020-0676-4)
> Zhou H, Verma V, Li X, et al. **FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation.** *Nucleic Acids Research*, 54(D1), D1405-D1414 (2026). [DOI: 10.1093/nar/gkaf1217](https://doi.org/10.1093/nar/gkaf1217)
> Zhou H, Arapoglou T, Li X, et al. **FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.** *Nucleic Acids Research*, 51(D1), D1300-D1311 (2023). [DOI: 10.1093/nar/gkac966](https://doi.org/10.1093/nar/gkac966)
> Li TC, Zhou H, Verma V, et al. **FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.** *Bioinformatics Advances*, 4(1), vbae143 (2024). [DOI: 10.1093/bioadv/vbae143](https://doi.org/10.1093/bioadv/vbae143)
## License
GPL-3.0