{"id":33051604,"url":"https://github.com/walaj/svaba","last_synced_at":"2026-01-21T22:43:18.890Z","repository":{"id":11076479,"uuid":"68258647","full_name":"walaj/svaba","owner":"walaj","description":"Structural variation and indel detection by local assembly","archived":false,"fork":false,"pushed_at":"2025-09-16T18:01:20.000Z","size":42733,"stargazers_count":249,"open_issues_count":75,"forks_count":48,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-12-06T08:32:15.041Z","etag":null,"topics":["assembled-contigs","c-plus-plus","indels","structural-variations","variants"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/walaj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-09-15T01:52:17.000Z","updated_at":"2025-11-06T04:08:45.000Z","dependencies_parsed_at":"2024-03-25T15:05:17.864Z","dependency_job_id":"e0c02d70-7a55-404b-a7c2-f522c5c9e07c","html_url":"https://github.com/walaj/svaba","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/walaj/svaba","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walaj%2Fsvaba","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walaj%2Fsvaba/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walaj%2Fsvaba/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walaj%2Fsvaba/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/walaj","download_url":"https://codeload.github.com/walaj/svaba/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walaj%2Fsvaba/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28645551,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T21:29:11.980Z","status":"ssl_error","status_checked_at":"2026-01-21T21:24:31.872Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembled-contigs","c-plus-plus","indels","structural-variations","variants"],"created_at":"2025-11-14T03:00:27.118Z","updated_at":"2026-01-21T22:43:18.875Z","avatar_url":"https://github.com/walaj.png","language":"C++","funding_links":[],"categories":["based"],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/walaj/svaba.svg?branch=master)](https://travis-ci.org/walaj/svaba)\n\n## *SvABA* - Structural variation and indel analysis by assembly\n\n[This project was formerly \"Snowman\"]\n\n**License:** [GNU GPLv3][license] \n\nTable of contents\n=================\n\n  * [Installation](#gh-md-toc)\n  * [Description](#description)\n  * [Output file description](#output-file-description)\n  * [Filtering and refiltering](#filtering-and-refiltering)\n  * [Recipes and examples](#recipes-and-examples)\n    * [Whole genome somatic SV and indel detection](#whole-genome-somatic-sv-and-indel-detection)\n    * [Whole genome germline SV and indel detection](#whole-genome-germline-sv-and-indel-detection)\n    * [Targeted (exome) detection](#targeted-detection)\n    * [Targeted local assembly](#targeted-local-assembly)\n    * [Assemble all reads](#assemble-all-reads)\n    * [Runtime snapshot](#snapshot-of-where-svaba-run-is-currently-operating)\n    * [Debug a local assembly and produce assembly graph](#debug-a-local-assembly-and-produce-the-assembly-graph)\n    * [View all of the ASCII alignments](#view-all-of-the-ascii-alignments)\n    * [View a particular contig with read-to-contig alignments](#view-a-particular-contig-with-read-to-contig-alignments)\n    * [Make a function to sort and index the contigs](#make-a-function-to-sort-and-index-contigs)\n  * [Attributions](#attributions)\n\n\nInstallation\n------------\nWe recommend compiling with GCC-4.8 or greater. svaba now uses CMake instead of autotools.\nNote: svaba no longer bundles htslib. A system version of htslib needs to be pointed to during compilation\n```\ngit clone --recursive https://github.com/walaj/svaba\ncd svaba\nmkdir build\ncd build\n## replace the paths below with the paths on your own system\ncmake .. -DHTSLIB_DIR=/home/jaw34/software/htslib-1.16\nmake\n\n## QUICK START (eg run tumor / normal on Chr22, with 4 cores)\nbuild/svaba -t tumor.bam -n normal.bam -k 22 -G ref.fa -a test_id -p -4\n\n## get help\nsvaba --help\nsvaba run --help\n```\n\nSvABA uses the [SeqLib][seqlib] API for BAM access, BWA-MEM alignments, interval trees and operations,\nand several other auxillary operations.\n\nDescription\n-----------\n\nSvABA is a method for detecting structural variants in sequencing data using genome-wide local assembly. Under the hood, \nSvABA uses a custom implementation of [SGA](https://github.com/jts/sga) (String Graph Assembler) by Jared Simpson, and [BWA-MEM](https://github.com/lh3/bwa) by Heng Li. Contigs are assembled\nfor every 25kb window (with some small overlap) for every region in the genome. The default is to use only clipped, discordant, \nunmapped and indel reads, although this can be customized to any set of reads at the command line using [VariantBam][vbam] rules. \nThese contigs are then immediately aligned to the reference with BWA-MEM and parsed to identify variants. Sequencing reads are then\nrealigned to the contigs with BWA-MEM, and variants are scored by their read support.\n\nSvABA is currently configured to provide indel and rearrangement calls (and anything \"in between\"). It can jointly call any number of BAM/CRAM/SAM files,\nand has built-in support for case-control experiments (e.g. tumor/normal, or trios or quads). In case/control mode, \nany number of cases and controls (but min of 1 case) can be input, and \nwill jointly assemble all sequences together. If both a case and control are present, variants are output separately in \"somatic\" and \"germline\" VCFs. \nIf only a single BAM is present (input with the ``-t`` flag), a single SV and a single indel\nVCF will be emitted.\n\nA BWA-MEM index reference genome must also be supplied with ``-G``.\n\n\u003cimg src=\"https://github.com/walaj/svaba/blob/master/gitfig_schematic.png\"\nwidth=800/\u003e\n\nOutput file description\n-----------------------\n\n##### ``*.bps.txt.gz``\nRaw, unfiltered variants. This file is parsed at the end to produce the VCF files. With the bps.txt.gz,\none can define a new set of filteirng criteria (depending on sensitivity/specificity needs) using ``svaba refilter``. \n\n##### ``*.contigs.bam``\nAll assembly contigs as aligned to the reference with BWA-MEM. Note that this is an unsorted file. To view in IGV,\nit must be first sorted and indexed (e.g. ``samtools sort -m 8G id.contigs.bam id.sort \u0026\u0026 samtools index id.sort.bam``)\n\n##### ``*.discordants.txt.gz``\nInformation on all clusters of discordant reads identified with 2+ reads. \n\n##### ``*.log``\nLog file giving run-time information, including CPU and Wall time (and how it was partitioned among the tasks), number of \nreads retrieved and contigs assembled for each region.\n\n##### ``*.alignments.txt.gz``\nAn ASCII plot of variant-supporting contigs and the BWA-MEM alignment of reads to the contigs. This file is incredibly\nuseful for debugging and visually inspecting the exact information SvABA saw when it performed the variant-calling. This file\nis typically quite large. The recommended usage is to identify the contig name of your variant of interest first from the VCF file \n(SCTG=contig_name). Then do ``gunzip -c id.alignment.txt.gz | grep contig_name \u003e plot.txt``. It is highly recommended that you \nview in a text editor with line truncation turned OFF, so as to not jumble the alignments.\n\n\u003cimg src=\"https://github.com/walaj/svaba/blob/master/gitfig_ascii.png\"\nwidth=800/\u003e\n\n##### ``*.vcf``\nVCF of rearrangements and indels parsed from bps.txt.gz and with a somatic_score == 1 (somatic) or 0 (germline) and quality == PASS. *NOTE* that \nthe cutoff for rearrangement vs indel is taken from BWA-MEM, whether it produces a single gapped-alignment \nor two separate alignments. This is an arbitrary cutoff, just as there is no clear consensus distinction between what \nconstitutes an \"indel\" and a \"structural variant\". The unfiltered VCF files include non-PASS variants. \n\nFiltering and Refiltering\n-----------------------\n\nSvABA performs a series of log-likelihood calculations for each variant. The purpose is to first classify a variant as real vs artifact, \nand then to determine if the variant is somatic or germline. These log-likelihoods are output in the VCF and bps.txt.gz file and described here:\n* ``LOD (LO)`` - Log of the odds that variant is real vs artifact. For indels, the likelihood of an artifact read is proportional to the length of local repeats (repeating units up to 5 long per unit)\n* ``LR`` - Log of the odds that the variant has allelic fraction (AF) of 0 or \u003e=0.5. This is used for somatic vs germline classification\n* ``SL`` - Scaled LOD. LOD scores is heuristically scaled as: (min(Mapping quality #1, Mapping quality #2) - 2 * NM) / 60 * LOD\n\nSvABA can refilter the bps.txt.gz file to produce new VCFs with different stringency cutoffs. To run, the following are required:\n* ``-b`` - a BAM from the original run, which is used just for its header\n* ``-i`` - input bps.txt.gz file\n\nExamples and recipes\n--------------------\n\n#### Whole genome somatic SV and indel detection \n```\nwget \"https://data.broadinstitute.org/snowman/dbsnp_indel.vcf\" ## get a DBSNP known indel file\nDBSNP=dbsnp_indel.vcf\nCORES=8 ## set any number of cores\nREF=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta\n## -a is any string you like, which gives the run a unique ID\nsvaba run -t $TUM_BAM -n $NORM_BAM -p $CORES -D $DBSNP -a somatic_run -G $REF\n```\n\n#### Whole genome germline SV and indel detection\n```\n## Set -I to not do mate-region lookup if mates are mapped to different chromosome.\n##   This is appropriate for germline-analysis, where we don't have a built-in control\n##   to against mapping artifacts, and we don't want to get bogged down with mate-pair\n##   lookups.\n## Set -L to 6 which means that 6 or more mate reads must be clustered to \n##   trigger a mate lookup. This also reduces spurious lookups as above, and is more \n##   appropriate the expected ALT counts found in a germline sample \n##   (as opposed to impure, subclonal events in cancer that may have few discordant reads).\nsvaba run -t $GERMLINE_BAM -p $CORES -L 6 -I -a germline_run -G $REF\n```\n\n#### Targeted detection\n```\n## eg targets.bed is a set of exome capture regions\nsvaba run -t $BAM -k targets.bed -a exome_cap -G $REF\n```\n\n#### Targeted local assembly\n```\n## -k can be a chromosome, a samtools/IGV style string \n##     (e.g. 1:1,000,000-2,000,000), or a BED file\nk=chr17:7,541,145-7,621,399\nsvaba run -t $TUM_BAM -n $NORM_BAM -p $CORES -k $k  -a TP53 -G $REF\n```\n\n#### Assemble all reads\n```\n## default behavior is just assemble clipped/discordant/unmapped/gapped reads\n## This can be overridden with -r all flag\nsvaba run -t $BAM -r all -G $REF\n```\n\n#### Snapshot of where svaba run is currently operating\n```\ntail somatic_run.log\n```\n\n#### Debug a local assembly and produce the assembly graph\n```\nk=chr17:7,541,145-7,621,399\nsvaba run -t $BAM -a local_test -k $k --write-asqg\n\n## plot the graph\n$GIT/svaba/R/svaba-asqg.R\n```\n\n#### View all of the ASCII alignments \n```\n## Make a read-only and no-line-wrapping version of emacs.\n## Very useful for *.alignments.txt.gz files\nfunction ev { \n  emacs $1 --eval '(setq buffer-read-only t)' -nw --eval '(setq truncate-lines t)';\n  }\nev somatic_run.alignments.txt.gz \n```\n\n#### View a particular contig with read to contig alignments\n```\ngunzip -c somatic_run.alignments.txt.gz | grep c_1_123456789_123476789 \u003e c_1_123456789_123476789.alignments.txt\nev c_1_123456789_123476789.alignments.txt\n```\n\n#### Make a function to sort and index contigs\n```\nfunction sai() {\n  if [[ -f $1.contigs.bam ]]; then\n     samtools sort -m 4G $1.contigs.bam -o $1.contigs.sort.bam\n     mv $1.contigs.sort.bam $1.contigs.bam\n     samtools index $1.contigs.bam\n  fi\n}\n## for example, for somatic_run.contigs.bam:\nsai somatic_run\n```\n\n\nAttributions\n============\n\nSvABA is developed and maintained by Jeremiah Wala (jwala@broadinstitute.org) --  Rameen Berkoukhim lab -- Dana Farber Cancer Institute, Boston, MA. \n\nThis project was developed in collaboration with the Cancer Genome Analysis team at the Broad Institute. Particular thanks to:\n* Cheng-Zhong Zhang - Asst Prof of Biomedical Informatics, Harvard Medical School (https://dbmi.hms.harvard.edu/person/faculty/cheng-zhong-zhang)\n* Marcin Imielinski - Asst Prof of Computational Genomics, Weill Cornell Medicine, (http://www.nygenome.org/lab-groups-overview/imielinski-lab/)\n\nAdditional thanks to Jared Simpson for SGA, Heng Li for htslib and BWA, and for the other developers whose  \ncode contributed to [SeqLib](https://github.com/walaj/SeqLib).\n\n[vbam]: https://github.com/walaj/VariantBam\n\n[license]: https://github.com/walaj/svaba/blob/master/LICENSE\n\n[seqlib]: https://github.com/walaj/SeqLib\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwalaj%2Fsvaba","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwalaj%2Fsvaba","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwalaj%2Fsvaba/lists"}