{"id":26359459,"url":"https://github.com/karchinlab/svcfit","last_synced_at":"2025-03-16T15:59:51.220Z","repository":{"id":275055328,"uuid":"896124347","full_name":"KarchinLab/SVCFit","owner":"KarchinLab","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-30T22:28:09.000Z","size":141,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T23:25:23.864Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KarchinLab.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-29T15:41:04.000Z","updated_at":"2025-01-30T22:28:12.000Z","dependencies_parsed_at":"2025-01-30T23:35:43.687Z","dependency_job_id":null,"html_url":"https://github.com/KarchinLab/SVCFit","commit_stats":null,"previous_names":["karchinlab/svcfit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KarchinLab%2FSVCFit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KarchinLab%2FSVCFit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KarchinLab%2FSVCFit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KarchinLab%2FSVCFit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KarchinLab","download_url":"https://codeload.github.com/KarchinLab/SVCFit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243893830,"owners_count":20364916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-16T15:59:50.268Z","updated_at":"2025-03-16T15:59:51.214Z","avatar_url":"https://github.com/KarchinLab.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# SVCFit\n\n\u003c!-- badges: start --\u003e\n\n\u003c!-- badges: end --\u003e\n\nSVCFit is a fast and scalable computational tool developed to estimate the structural variant cellular fraction (SVCF) of inversions, deletions and tandem duplications. SVCFit is designed to run in an R environment.\n\nAll open access data used in this research can be freely downloaded [here](https://doi.org/10.17632/bwzb6n3xbc.1).\n\nProtected data can be requested from European Genome-phenome Archive (EGAD00001001343) and the script used to create the mixtures is available at \u003chttps://github.com/mcmero/SVclone_Rmarkdown/blob/master/make_insilico_mixtures.sh\u003e [1].\n\n## Installation\n\nTo install SVCFit from GitHub, you must have a GitHub acount. If you don't have an account, first sign up at [GitHub](https://github.com/). Then, You can then install SVCFit within an R environment.\n\n``` r\ninstall.packages(\"usethis\")\ninstall.packages(\"remotes\")\n\nusethis::use_git_config(user.name = \"Github_user_name\")\nusethis::create_github_token()\n\n# This will open a web page in your browser where you can sign in to GitHub.\n# Once signed in, you will be directed to a page to generate a new Personal Access Token (PAT).\n# Enter a descriptive note for your PAT. Use the default \"Scope Options\" if you're unsure about them\n# Then, Click the \"Generate Token\" button, and copy the generated PAT. \n# Make sure to store it securely in a text file or a password manager.\n\ncredentials::set_github_pat()\n\n#A pop-up screen will appear with a box to enter your PAT. Go ahead and enter it\n\nremotes::install_github(\"KarchinLab/SVCFit\", build_vignettes = TRUE, dependencies = TRUE)\n```\n\n## Input your structural variants into SVCFit\n\nSVCFit is designed to take input from Variant Call Format (VCF) files. By default, it accepts the VCF format produced by the Manta package [2]. If you have a VCF output from a structural variant caller other than Manta, please modify to match this format:\n\n| CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | normal | tumor |\n|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|\n| chr1 | 1000 | INV:6:0:1:0:0:0 | T | \u003cINV\u003e | . | PASS | END=1500;SVTYPE=INV;SVLEN=500 | PR:SR | 20,30:19,27 | 23,0:17,0 |\n| chr2 | 5000 | DEL:7:0:1:0:0:0 | G | \u003cDEL\u003e | . | PASS | END=5300;SVTYPE=DEL;SVLEN=300 | PR | 15,30 | 19,0 |\n\n## General workflow\n\n*SVCF()* is the main function in this package that wraps all functionality described below. All functions can also be run separately.\n\n``` r\nSVCF(vcf_path=\"~/path/to/file.vcf\", tumor_only=FALSE, length_threshold=0, overlap=TRUE, tolerance=6, window=100, multiple=FALSE, truth_path=NULL, mode=\"heritage\")\n```\n\n*SVCF()* requires two arguments and includes several optional parameters:\n\n### Required Arguments\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `vcf_path` | Character | `NULL` | Path to VCF files. |\n| `tumor_only` | Boolean | `FALSE` | Whether the VCF is created without a matched normal sample. |\n\n### Optional Arguments\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `length_threshold` | Integer | `0` | Structural variant length filter threshold. |\n| `overlap` | Boolean | `FALSE` | Whether structural variants should be filtered based on coordinate overlap. |\n| `tolerance` | Integer | `6` | Maximum distance between structural variants to be considered as a single variant. |\n| `window` | Integer | `1000` | Number of structural variants checked for overlap to form a single variant. |\n| `multiple` | Boolean | `FALSE` | Whether the sample has multiple clones (used in simulated data for assigning clones). |\n| `truth_path` | Character | `NULL` | Path to BED files storing true structural variant information with clonal assignment. Each BED file should be named like `\"c1.bed, c2.bed\"`, etc. Structural variants should be saved in separate BED files if they belong to different (sub)clones. |\n| `mode` | Character | `\"heritage\"` | Describes how true clonal information is saved:\u003cbr\u003e- **`\"heritage\"`**: BED files for child clones contain all ancestral structural variants of their parents.\u003cbr\u003e- **`\"separate\"`**: Child clones do not contain any ancestral structural variants. |\n\nThe steps executed by *SVCF()* are:\n\n### 1. Extract information from input VCF (*extract_info()*)\n\nThis step assigns column names and extracts key information for downstream calculation and filtering. Key information includes reads, structural variant length, and structural variant coordinates.\n\nThis function has 3 arguments:\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `vcf_path` | Character | `NULL` | Path to VCF files. |\n| `tumor_only` | Boolean | `FALSE` | Whether the VCF is created without a matched normal sample. |\n| `length_threshold` | Numeric | `0` | Structural variant length filter threshold. |\n\nFor example, the following command will generate an annotated VCF file with all structural variants with length\\\u003e50\n\n``` r\nvcf \u003c- extract_info(\"~/path/to/file.vcf\", tumor_only=TRUE, length_threshold=50)\n```\n\nThe output from *extract_info()* will be in annotated VCF format.\n\n### 2. Check overlapping structural variants (*check_overlap()*)\n\nThis step checks if structural variants in VCF files are close enough to be considered as a single structural variant. Skipping this step doesn't affect the workflow.\n\nThis function has 4 arguments:\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `dat` | DataFrame | N/A | A dataframe to be compared. This dataframe should be the output of `extract_info` (an annotated VCF file) and contains structural variant genome coordinates. |\n| `compare` | DataFrame | N/A | A dataframe used as a reference for comparison. This dataframe should be the output of `extract_info` (an annotated VCF file) and contains structural variant genome coordinates. |\n| `tolerance` | Integer | `6` | Threshold for the maximum distance between structural variants to be considered as a single variant. |\n| `window` | Integer | `1000` | Number of structural variants checked for overlap to form a single variant. |\n\nNote: When *dat* and *compare* are the same dataframe, this function will merge overlapping structural variants and recompute the reads supporting the new structural variant by taking the average of the overlapping structural variants. When *compare* is the ground truth from a simulation, this function will also remove false positive structural variants that were not included in the simulation.\n\n``` r\nchecked \u003c- check_overlap(dat, compare)\n```\n\n### 3. Calculate SVCF for structural variants (*calculate_svcf()*)\n\nThis step calculates the structural variant cellular fraction (SVCF) for all structural variants in the input VCF file.\n\n``` r\noutput \u003c- calculate_svcf(input=checked, tumor_only=FALSE)\n```\n\nThis function has 2 arguments:\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `dat` | DataFrame | N/A | It stores the set of information for structural variants used to calculate SVCF. This dataframe should be the output of `extract_info`(an annotated VCF file) . |\n| `tumor_only` | Boolean | `FALSE` | Whether the VCF is created without a matched normal sample. |\n\nThe output is an annotated VCF with additional fields for VAF, Rbar, r and SVCF. VAF=variant allele frequency; Rbar=average break interval count in a sample; r = inferred integer copy number of break intervals; SVCF=structural variant cellular fraction.\n\n### 4. Additional functions\n\n*attach_clone* and *read_clone* are functions to assign structural variants to tumor clones, when the assignment is known.\n\n4.1 read clonal assignment\n\n``` r\ntruth \u003c- read_clone(truth_path, mode=\"heritage\")\n```\n\nThis function has 2 arguments:\n\n| Argument | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `truth_path` | Character | N/A | Path to BED files storing true structural variant information with clonal assignment. Each BED file should be named like `\"c1.bed, c2.bed\"`, etc. Structural variants should be saved in separate BED files if they belong to different (sub)clones. |\n| `mode` | Character | `heritage` | Describes how true clonal information is saved:\u003cbr\u003e- **`\"heritage\"`**: BED files for child clones contain all ancestral structural variants of their parents.\u003cbr\u003e- **`\"separate\"`**: Child clones do not contain any ancestral structural variants. |\n\nThe file path should follow this structure:\n\n``` r\nroot/\n├── true_clone/\n│   ├── c1.bed/\n│   ├── c2.bed/\n│   ├── c3.bed/\n│   └── .../\n```\n\nTo determine which `mode` suits your data, refer to the illustration below. Parent node should always have higher rank in name (i.e. c3.bed instead of c1.bed)\n\nAccording to the tree on the left, variants in bed files for the two modes should look like:\n\n![](inst/extdata/tree.png){width=\"487\"}\n\n4.2 attach clonal assignment to structural variants\n\n``` r\nattched \u003c- attach_clone(dat, truth, tolerance = 6)\n```\n\nThis function has 3 arguments:\n\n| Variable | Type | Default | Description |\n|------------------|------------------|------------------|------------------|\n| `dat` | DataFrame | N/A | Stores structural variants for clone assignment. |\n| `truth` | DataFrame | N/A | Stores the clone assignment for each structural variant designed in a simulation. |\n| `tolerance` | Integer | `6` | Sets the threshold for the maximum distance between structural variants to be considered as a single structural variant when assigning clones. |\n\n## Tutorial\n\n``` r\nlibrary(SVCFit)\nvignette(\"SVCFit_guide\", package = \"SVCFit\")\n```\n\n## Reference\n\n1.  Cmero, Marek, Yuan, Ke, Ong, Cheng Soon, Schröder, Jan, Corcoran, Niall M., Papenfuss, Tony, et al., \"Inferring Structural Variant Cancer Cell Fraction,\" Nature Communications, 11(1) (2020), 730.\n2.  Chen, X. et al. (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220-1222. \u003cdoi:10.1093/bioinformatics/btv710\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarchinlab%2Fsvcfit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkarchinlab%2Fsvcfit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarchinlab%2Fsvcfit/lists"}