{"id":35048676,"url":"https://github.com/dellytools/sansa","last_synced_at":"2025-12-27T09:04:36.270Z","repository":{"id":51939609,"uuid":"310290461","full_name":"dellytools/sansa","owner":"dellytools","description":"Structural variant VCF annotation, duplicate removal and comparison","archived":false,"fork":false,"pushed_at":"2025-02-26T10:13:28.000Z","size":208,"stargazers_count":29,"open_issues_count":4,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-26T10:31:05.147Z","etag":null,"topics":["delly","gene-annotation","structural-variation","sv-annotation","sv-merging","vcf-annotation","vcf-comparison","vcf-filtering"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dellytools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-05T12:22:55.000Z","updated_at":"2025-02-26T10:13:31.000Z","dependencies_parsed_at":"2024-12-08T20:35:33.084Z","dependency_job_id":null,"html_url":"https://github.com/dellytools/sansa","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/dellytools/sansa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dellytools%2Fsansa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dellytools%2Fsansa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dellytools%2Fsansa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dellytools%2Fsansa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dellytools","download_url":"https://codeload.github.com/dellytools/sansa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dellytools%2Fsansa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28076552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-27T02:00:05.897Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delly","gene-annotation","structural-variation","sv-annotation","sv-merging","vcf-annotation","vcf-comparison","vcf-filtering"],"created_at":"2025-12-27T09:04:30.792Z","updated_at":"2025-12-27T09:04:36.257Z","avatar_url":"https://github.com/dellytools.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/sansa/README.html)\n[![Anaconda-Server Badge](https://anaconda.org/bioconda/sansa/badges/downloads.svg)](https://anaconda.org/bioconda/sansa)\n[![C/C++ CI](https://github.com/dellytools/sansa/workflows/C/C++%20CI/badge.svg)](https://github.com/dellytools/sansa/actions)\n[![Docker CI](https://github.com/dellytools/sansa/workflows/Docker%20CI/badge.svg)](https://hub.docker.com/r/dellytools/sansa/)\n[![GitHub license](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/dellytools/sansa/blob/master/LICENSE)\n[![GitHub Releases](https://img.shields.io/github/release/dellytools/sansa.svg)](https://github.com/dellytools/sansa/releases)\n\n# Sansa\n\nStructural variant (SV) annotation.\n\n## Installation\n\nThe easiest way to get sansa is to download a statically linked binary from the [sansa github release page](https://github.com/dellytools/sansa/releases/) or using [bioconda](https://anaconda.org/bioconda/sansa).\n\n`conda install -c bioconda sansa`\n\nYou can also build sansa from source using a recursive clone and make.\n\n`git clone --recursive https://github.com/dellytools/sansa.git`\n\n`cd sansa/`\n\n`make all`\n\n## Usage\n\nSansa has several subcommands\n\n`sansa annotate` for [SV annotation](https://github.com/dellytools/sansa#sv-annotation)\n\n`sansa markdup` to [mark duplicate SV sites](https://github.com/dellytools/sansa#mark-duplicates) in a multi-sample VCF file\n\n`sansa compvcf` to [compare multi-sample VCF files](https://github.com/dellytools/sansa#compare-vcfs)\n\n## SV annotation\n\nDownload an annotation database. Examples are [gnomAD-SV](https://gnomad.broadinstitute.org/) or [1000 Genomes phase 3](https://www.internationalgenome.org/phase-3-structural-variant-dataset) and then run the annotation.\n\n`sansa annotate -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\nThe method generates two output files: `anno.bcf` with annotation SVs augmented by a unique ID (INFO/ANNOID) and `query.tsv.gz` with query SVs matched to annotation IDs.\n\n[bcftools](https://github.com/samtools/bcftools) can be used to extract all INFO fields you want as annotation. For instance, let's annotate with the VCF ID and EUR_AF for the European allele frequency in gnomad-SV. Always include INFO/ANNOID as the first column.\n\n`bcftools query -H -f \"%INFO/ANNOID\\t%ID\\t%INFO/EUR_AF\\n\" anno.bcf | sed -e 's/^# //' \u003e anno.tsv`\n\nLast is a simple join of query SVs with matched database SVs based on the first column (ANNOID).\n\n`join anno.tsv \u003c(zcat query.tsv.gz | sort -k 1b,1) \u003e results.tsv`\n\n## SV annotation parameters\n\n[Sansa](https://github.com/dellytools/sansa) matches SVs based on the absolute difference in breakpoint locations (`-b`) and the size ratio (`-r`) of the smaller SV compared to the larger SV. By default, the SVs need to have their start and end breakpoint within 50bp and differ in size by less than 20% (`-r 0.8`).\n\n`sansa annotate -b 50 -r 0.8 -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\nBy default, [sansa](https://github.com/dellytools/sansa) only reports the best matching SVs. You can change the matching strategy to `all` using `-s`.\n\n`sansa annotate -s all -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\nYou can also include unmatched query SVs in the output using `-m`.\n\n`sansa annotate -m -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\nBy default, SVs are only compared within the same SV type (DELs with DELs, INVs with INVs, and so on). For [delly](https://github.com/dellytools/delly) this comparison is INFO/CT aware. You can deactivate this SV type check using `-n`.\n\n`sansa annotate -n -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\n## Feature/Gene annotation\n\nBased on a distance cutoff (`-t`) [sansa](https://github.com/dellytools/sansa) matches SVs to nearby genes. The gene annotation file can be in [gtf/gff2](https://en.wikipedia.org/wiki/General_feature_format) or [gff3](https://en.wikipedia.org/wiki/General_feature_format) format.\n\n`sansa annotate -g Homo_sapiens.GRCh37.87.gtf.gz input.vcf.gz`\n\n`sansa annotate -i Name -g Homo_sapiens.GRCh37.87.gff3.gz input.vcf.gz`\n\nThe output has 2 columns for genes near the SV start breakpoint and genes near the SV end breakpoint. For each gene, the output lists the gene name and in paranthesis the distance (negative values: before SV breakpoint, 0: SV breakpoint within gene, positive values: after SV breakpoint) and the strand of the gene (+/-/*).\n\nYou can also use the Ensembl gene id or annotate exons instead of genes.\n\n`sansa annotate -i gene_id -g Homo_sapiens.GRCh37.87.gff3.gz input.vcf.gz`\n\n`sansa annotate -f exon -i exon_id -g Homo_sapiens.GRCh37.87.gff3.gz input.vcf.gz`\n\nGene and SV annotation can be run in a single command.\n\n`sansa annotate -g Homo_sapiens.GRCh37.87.gtf.gz -d gnomad_v2.1_sv.sites.vcf.gz input.vcf.gz`\n\n## Discovering gene fusion candidates\n\nUsing [delly](https://github.com/dellytools/delly) and the `INFO/CT` values one can identify gene fusion candidates. Here is the mapping from gene strand to CT values with classical cancer genomics examples (GRCh37 coordinates).\n\n| chr  | start     | chr2 | end       | svtype | ct   | startfeature  | endfeature    |\n|------|-----------|------|-----------|--------|------|---------------|---------------|\n| chrA | posStart  | chrA | posEnd    | INV    | 3to3 | geneA(0;+)    | geneB(0;-)    |\n| chrA | posStart  | chrA | posEnd    | INV    | 3to3 | geneC(0;-)    | geneD(0;+)    |\n| 10   | 89672219  | 10   | 90267336  | INV    | 3to3 | PTEN(0;+)     | RNLS(0;-)     |\n| chrA | posStart  | chrA | posEnd    | DEL    | 3to5 | geneA(0;+)    | geneB(0;+)    |\n| chrA | posStart  | chrA | posEnd    | DEL    | 3to5 | geneC(0;-)    | geneD(0;-)    |\n| 21   | 39887792  | 21   | 42869743  | DEL    | 3to5 | ERG(0;-)      | TMPRSS2(0;-)  |\n| chrA | posStart  | chrA | posEnd    | DUP    | 5to3 | geneA(0;+)    | geneB(0;+)    |\n| chrA | posStart  | chrA | posEnd    | DUP    | 5to3 | geneC(0;-)    | geneD(0;-)    |\n| 7    | 138547350 | 7    | 140491430 | DUP    | 5to3 | KIAA1549(0;-) | BRAF(0;-)     |\n| chrA | posStart  | chrA | posEnd    | INV    | 5to5 | geneA(0;+)    | geneB(0;-)    |\n| chrA | posStart  | chrA | posEnd    | INV    | 5to5 | geneC(0;-)    | geneD(0;+)    |\n| 8    | 32139712  | 8    | 33359541  | INV    | 5to5 | NRG1(0;+)     | TTI2(0;-)     |\n| chrA | posA      | chrB | posB      | BND    | 3to3 | geneA(0;+)    | geneB(0;-)    |\n| chrA | posA      | chrB | posB      | BND    | 3to3 | geneC(0;-)    | geneD(0;+)    |\n| 14   | 68316364  | 5    | 58914908  | BND    | 3to3 | RAD51B(0;+)   | PDE4D(0;-)    |\n| chrA | posA      | chrB | posB      | BND    | 3to5 | geneA(0;+)    | geneB(0;+)    |\n| chrA | posA      | chrB | posB      | BND    | 3to5 | geneC(0;-)    | geneD(0;-)    |\n| 21   | 42867595  | 7    | 14027003  | BND    | 3to5 | TMPRSS2(0;-)  | ETV1(0;-)     |\n| chrA | posA      | chrB | posB      | BND    | 5to3 | geneA(0;+)    | geneB(0;+)    |\n| chrA | posA      | chrB | posB      | BND    | 5to3 | geneC(0;-)    | geneD(0;-)    |\n| 21   | 39826990  | 1    | 205637229 | BND    | 5to3 | ERG(0;-)      | SLC45A3(0;-)  |\n| chrA | posA      | chrB | posB      | BND    | 5to5 | geneA(0;+)    | geneB(0;-)    |\n| chrA | posA      | chrB | posB      | BND    | 5to5 | geneC(0;-)    | geneD(0;+)    |\n| 3    | 169190498 | 2    | 47689038  | BND    | 5to5 | MECOM(0;-)    | MSH2(0;+)     |\n\n## Mark duplicates\n\nFor larger studies that employ single sample calling and then merge SVs across samples a common problem is to identify duplicate SV sites that occur due to SV breakpoint imprecisions. `sansa markdup` identifies duplicates sites based on genomic proximity, genotype concordance and SV allele similarity. By default, duplicate SVs need to have SV breakpoints within 50bp (`-b 50`), a reciprocal overlap of 80% (`-s 0.8`), a maximum SV allele divergence of 10% (`-s 0.1`) and a minimum fraction of shared SV carriers of 25% (`-c 0.25`). The SV allele comparison requires [delly's](https://github.com/dellytools/delly) `INFO/CONSENSUS` field as the SV haplotype. \n\n`sansa markdup -o rmdup.bcf pop.delly.bcf`\n\n## Compare VCFs\n\nCompare an input VCF/BCF file to a ground truth (base) VCF/BCF file.\n\n`sansa compvcf -a base.bcf input.bcf`\n\nTo compare SV site lists that lack genotypes, you need to set the minimum allele count to zero (`-e 0`).\n\n`sansa compvcf -a base.bcf -e 0 input.bcf`\n\n## Citation\n\nTobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel.      \nDELLY: structural variant discovery by integrated paired-end and split-read analysis.     \nBioinformatics. 2012 Sep 15;28(18):i333-i339.       \n[https://doi.org/10.1093/bioinformatics/bts378](https://doi.org/10.1093/bioinformatics/bts378)\n\n## License\n\nSansa is distributed under the BSD 3-Clause license. Consult the accompanying [LICENSE](https://github.com/dellytools/sansa/blob/master/LICENSE) file for more details.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdellytools%2Fsansa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdellytools%2Fsansa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdellytools%2Fsansa/lists"}