{"id":46575187,"url":"https://github.com/msk-access/gbcms","last_synced_at":"2026-05-15T23:15:05.948Z","repository":{"id":326100750,"uuid":"1068141861","full_name":"msk-access/gbcms","owner":"msk-access","description":"A high-performance orientation-aware genotype counting system for genomic variants","archived":false,"fork":false,"pushed_at":"2026-05-10T21:31:57.000Z","size":519087,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-10T22:29:23.453Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://msk-access.github.io/gbcms/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msk-access.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-01T23:21:21.000Z","updated_at":"2026-05-04T18:19:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/msk-access/gbcms","commit_stats":null,"previous_names":["msk-access/py-gbcms","msk-access/gbcms"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/msk-access/gbcms","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fgbcms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fgbcms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fgbcms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fgbcms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msk-access","download_url":"https://codeload.github.com/msk-access/gbcms/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msk-access%2Fgbcms/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33083036,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T20:25:35.270Z","status":"ssl_error","status_checked_at":"2026-05-15T20:25:34.732Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-07T09:33:33.091Z","updated_at":"2026-05-15T23:15:05.942Z","avatar_url":"https://github.com/msk-access.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gbcms\n\n**Complete orientation-aware counting system for genomic variants**\n\n[![Tests](https://github.com/msk-access/gbcms/workflows/Tests/badge.svg)](https://github.com/msk-access/gbcms/actions)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/msk-access/gbcms)\n\n## Features\n\n- 🚀 **High Performance**: Rust-powered core engine with multi-threading\n- 🧬 **Complete Variant Support**: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)\n- 🧪 **WFA + PairHMM Phase 3**: Pangenomic fast-path WFA alignment with PairHMM fallback for complex multi-allelic classification\n- 📊 **Orientation-Aware**: Forward and reverse strand analysis with fragment counting\n- 📏 **mFSD (Mutant Fragment Size Distribution)**: Per-allele cfDNA fragment size profiling with KS test and log-likelihood ratio\n- 🔬 **Statistical Analysis**: Fisher's exact test for strand bias (read-level and fragment-level)\n- 📁 **Flexible I/O**: VCF and MAF input/output formats\n- 🎯 **Quality Filters**: 8 configurable read and quality filtering options with heuristic BAQ\n- 🧬 **RNA Mode**: Transcriptome-aware counting with strandedness, splice detection, and A-to-I editing\n- 🔗 **UMI Support**: Molecule-level deduplication with UMI-aware fragment grouping\n- 🔧 **Normalize Command**: Standalone variant normalization (left-align + REF validation) without counting\n\n## Installation\n\n**Quick install:**\n```bash\npip install gbcms\n```\n\n**From source (requires Rust):**\n```bash\ngit clone https://github.com/msk-access/gbcms.git\ncd gbcms\npip install .\n```\n\n**Docker:**\n```bash\ndocker pull ghcr.io/msk-access/gbcms:X.Y.Z  # Replace X.Y.Z with latest from PyPI\n```\n\n\u003e 💡 Find the latest version on [PyPI](https://pypi.org/project/gbcms/) or [GHCR](https://github.com/msk-access/gbcms/pkgs/container/gbcms).\n\n📖 **Full documentation:** https://msk-access.github.io/gbcms/\n\n---\n\n## Usage\n\n`gbcms` can be used in two ways:\n\n### 🔧 Option 1: Standalone CLI (1-10 samples)\n\n**Best for:** Quick analysis, local processing, direct control\n\n```bash\ngbcms dna \\\n    --variants variants.vcf \\\n    --bam sample1.bam \\\n    --fasta reference.fa \\\n    --output-dir results/\n```\n\n**Output:** `results/sample1.vcf`\n\n**Learn more:**\n- 📘 [CLI Quick Start](https://msk-access.github.io/gbcms/getting-started/quickstart/)\n- 📖 [CLI Reference — DNA](https://msk-access.github.io/gbcms/cli/dna/)\n- 📖 [CLI Reference — RNA](https://msk-access.github.io/gbcms/cli/rna/)\n- 📖 [CLI Reference — Normalize](https://msk-access.github.io/gbcms/cli/normalize/)\n\n---\n\n### 🔄 Option 2: Nextflow Workflow (10+ samples, HPC)\n\n**Best for:** Many samples, HPC clusters (SLURM), reproducible pipelines\n\n```bash\nnextflow run nextflow/main.nf \\\n    --input samplesheet.csv \\\n    --variants variants.vcf \\\n    --fasta reference.fa \\\n    --mode dna \\\n    -profile slurm\n```\n\n**Features:**\n- ✅ Automatic parallelization across samples\n- ✅ SLURM/HPC integration\n- ✅ Container support (Docker/Singularity)\n- ✅ Resume failed runs\n\n**Learn more:**\n- 🔄 [Nextflow Workflow Guide](https://msk-access.github.io/gbcms/nextflow/)\n- 📋 [Usage Patterns Comparison](https://msk-access.github.io/gbcms/getting-started/)\n\n---\n\n## Which Should I Use?\n\n| Scenario | Recommendation |\n|----------|----------------|\n| 1-10 samples, local machine | **CLI** |\n| 10+ samples, HPC cluster | **Nextflow** |\n| Quick ad-hoc analysis | **CLI** |\n| Production pipeline | **Nextflow** |\n| Need auto-parallelization | **Nextflow** |\n| Full manual control | **CLI** |\n\n---\n\n## Quick Examples\n\n### CLI: DNA Single Sample\n```bash\ngbcms dna \\\n    --variants variants.vcf \\\n    --bam tumor.bam \\\n    --fasta hg19.fa \\\n    --output-dir results/ \\\n    --threads 4\n```\n\n### CLI: RNA-seq\n```bash\ngbcms rna \\\n    --variants variants.vcf \\\n    --bam rna_sample:aligned.bam \\\n    --fasta hg19.fa \\\n    --rna-editing-db TABLE1_hg38.txt.gz \\\n    --output-dir results/\n```\n\n### CLI: Normalize Variants\n```bash\ngbcms normalize \\\n    --variants variants.vcf \\\n    --fasta hg19.fa \\\n    --output-dir results/\n```\n\n### CLI: Multiple Samples (Sequential)\n```bash\ngbcms dna \\\n    --variants variants.vcf \\\n    --bam-list samples.txt \\\n    --fasta hg19.fa \\\n    --output-dir results/\n```\n\n### Nextflow: Many Samples (Parallel)\n```bash\n# samplesheet.csv:\n# sample,bam,bai\n# tumor1,/path/to/tumor1.bam,\n# tumor2,/path/to/tumor2.bam,\n\nnextflow run nextflow/main.nf \\\n    --input samplesheet.csv \\\n    --variants variants.vcf \\\n    --fasta hg19.fa \\\n    --mode dna \\\n    --outdir results \\\n    -profile slurm\n```\n\n---\n\n## Documentation\n\n📚 **Full Documentation:** https://msk-access.github.io/gbcms/\n\n**Quick Links:**\n- [Installation](https://msk-access.github.io/gbcms/getting-started/installation/)\n- [CLI Quick Start](https://msk-access.github.io/gbcms/getting-started/quickstart/)\n- [Nextflow Workflow](https://msk-access.github.io/gbcms/nextflow/)\n- [CLI Reference — DNA](https://msk-access.github.io/gbcms/cli/dna/)\n- [CLI Reference — RNA](https://msk-access.github.io/gbcms/cli/rna/)\n- [CLI Reference — Normalize](https://msk-access.github.io/gbcms/cli/normalize/)\n- [Input Formats](https://msk-access.github.io/gbcms/reference/input-formats/)\n- [Output Formats](https://msk-access.github.io/gbcms/reference/output-formats/)\n- [Architecture](https://msk-access.github.io/gbcms/reference/architecture/)\n\n---\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development guidelines.\n\nTo contribute to documentation, see the [`gh-pages` branch](https://github.com/msk-access/gbcms/tree/gh-pages).\n\n---\n\n## Citation\n\nIf you use `gbcms` in your research, please cite:\n\n\u003e Shah, R. et al. (2026). *gbcms: A high-performance orientation-aware genotype counting system for genomic variants.* Available at: https://github.com/msk-access/gbcms\n\n**BibTeX:**\n```bibtex\n@software{pygbcms,\n  author       = {Shah, Ronak and contributors},\n  title        = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},\n  year         = {2026},\n  url          = {https://github.com/msk-access/gbcms},\n  note         = {GitHub repository}\n}\n```\n\n---\n\n## License\n\nAGPL-3.0 - see [LICENSE](LICENSE) for details.\n\n---\n\n## Support\n\n- 🐛 **Issues:** https://github.com/msk-access/gbcms/issues\n- 💬 **Discussions:** https://github.com/msk-access/gbcms/discussions\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsk-access%2Fgbcms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsk-access%2Fgbcms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsk-access%2Fgbcms/lists"}