{"id":19749581,"url":"https://github.com/mskcc/facets2n","last_synced_at":"2026-03-08T17:34:21.309Z","repository":{"id":59443467,"uuid":"537196331","full_name":"mskcc/facets2n","owner":"mskcc","description":"Algorithm to implement Fraction and Allelic Copy number Estimate from Tumor/normal Sequencing using unmatched normal sample(s) for log ratio calculations","archived":false,"fork":false,"pushed_at":"2023-09-15T05:42:00.000Z","size":251156,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-12T02:29:29.186Z","etag":null,"topics":["allele-specific","copy-number-variation","ngs","ngs-analysis"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mskcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null}},"created_at":"2022-09-15T20:29:44.000Z","updated_at":"2024-09-02T07:53:40.000Z","dependencies_parsed_at":"2023-01-18T10:45:14.667Z","dependency_job_id":null,"html_url":"https://github.com/mskcc/facets2n","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Ffacets2n","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Ffacets2n/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Ffacets2n/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2Ffacets2n/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mskcc","download_url":"https://codeload.github.com/mskcc/facets2n/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233409166,"owners_count":18671975,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allele-specific","copy-number-variation","ngs","ngs-analysis"],"created_at":"2024-11-12T02:27:19.994Z","updated_at":"2026-03-08T17:34:21.278Z","avatar_url":"https://github.com/mskcc.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# facets2n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/mskcc/facets2n/workflows/R-CMD-check/badge.svg)](https://github.com/mskcc/facets2n/actions)\n[![Codecov test coverage](https://codecov.io/gh/mskcc/facets2n/branch/master/graph/badge.svg)](https://app.codecov.io/gh/mskcc/facets2n?branch=master)\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n\u003c!-- badges: end --\u003e\n\nAlgorithm to implement Fraction and Allelic Copy Number Estimates from Tumor:normal Sequencing using unmatched normal sample(s) for log ratio calculations.\n\n## Table of Contents \n   * [Aims](#key-aims)\n   * [Implementation](#sparkles-implementation-sparkles)\n   * [Requirements](#requirements)\n   * [Installation](#installation)\n   \n   Vignettes\n   \n   * [Analysis of T/N pair using set of reference normals \u0026amp; sequencing batch control (PoolNormal)](#analysis-of-tn-pair-using-set-of-reference-normals--sequencing-batch-control-poolnormal)\n     * [(1) Generate Reference Files](#1-generate-reference-files)\n     * [(2) snp-pileup command to generate a counts file for input data](#2-snp-pileup-command-to-generate-a-counts-file-for-input-data)\n     * [(3) Analysis](#3-analysis)\n     * [(3) Plot](#3-plot)\n     * [(4) QC fit](#4-qc-fit)\n     \n   * [Analysis of post transplant tumor samples with donor data](#analysis-of-post-transplant-tumor-samples-with-donor-data)\n     * [(1) Generate counts file for donor sample(s)](#1-generate-counts-file-for-donor-samples)\n     * [(2) Analyze and plot](#2-analyze-and-plot)\n   \n   * [Inaccurate results without baseline donor sample](#inaccurate-results-without-baseline-donor-sample)\n \n## :key: Aims \n\n1. Increase sensitivity of the original FACETS algorithm by [Shen R, \u0026 Sheshan V.E. (2016) *Nucleic Acids Res.*](https://pubmed.ncbi.nlm.nih.gov/27270079/) \n     - *Problem addressed*:\n          - FFPE tumor and matched normal tissues (e.g. blood) may result in noisy log ratio calculations due to sample specific biases, namely GC and insert size distibutions, in addition to batch effects, therefore decreasing the sensitivty of joint segmentation.\n\n2. Enable allele-specific copy number analysis for tumor samples from patients following hematopoietic stem cell transplantation (HSCT)  \n     - *Problem addressed*: \n          - Current methods are not able calculate genome-wide allele specific copy number from post tranplant (non-autogolous) samples that are chimeric for host and donor derived DNA\n\u003cbr/\u003e\u003cbr/\u003e\n\n## :sparkles: Implementation :sparkles: \n\nThe normal sample with computed minimal noise, realtive to tumor, is selected for copy number log2 ratio calculations (logR), while the matched normal is always selected for variant allele log odds ratio (logOR) calculation. \n\n:construction: (beta) infer sex from matched normal sample, and select an unmatched normal with same sex for chrX normalization.\n\n---\n## Requirements\n-  R (\u003e= 3.4.0), pctGCdata (\u003e= 0.3.0)\n- snp-pileup and HTSlib: see [Installation and usage](https://github.com/rptashkin/facets2n/blob/master/inst/extcode/README.txt)\n- BAM file from tumor sample\n- BAM from patient matched normal sample\n- BAM(s) from unmatched normal sample(s) *(optional)*\n\u003cbr/\u003e\u003cbr/\u003e\n---\n\n## Installation\n\n```\ndevtools::install_github(\"rptashkin/facets2n\")\n```\n\n---\n\n## Analysis of T/N pair using set of reference normals \u0026 sequencing batch control (PoolNormal)\n\n#### (1) Generate Reference Files\n\nReference files only need to be generated once for a given sequencing assay and conditions. \n\n*We suggest at least 5 male and 5 female reference normal BAMs, that were processed in the laboratory and with an analysis pipeline using the same parameters of the tumor sample you are analyzing. For hybridization capture sequencing, we have found that a pooled sample of non-neoplastic tissue (e.g. blood from healthy donors) from 10 individuals captured and sequenced together with the data being analyzed often outperforms the matched normal and individual reference normals for minimizing noise in log ratio plots.*\n\u003cbr/\u003e\u003cbr/\u003e\n##### (1a) Reference snp-pileup\n\n```\ninst/extcode/snp-pileup-wrapper.R \\\n  --output-prefix reference_normals  \\\n  --vcf-file dbsnp_137.hg19__RmDupsClean__plusPseudo50__DROP_SORT_NOCHR.vcf \\\n  --unmatched-normal-BAMS \"\u003csome/path_to_bam_directory\u003e/*-N*.bam\"\n```\n\n##### (1b) Reference loess normalization\n\n```\nlibrary(facets2n)\nMakeLoessObject(pileup = PreProcSnpPileup(filename = \"inst/extdata/reference_normals.snp_pileup.gz\", \n  is.Reference = TRUE), \n  write.loess = TRUE,\n  outfilepath = \"inst/extdata/reference_normals.loess.txt\", is.Reference = TRUE)\n  \n```\n\n\n#### (2) snp-pileup command to generate a counts file for input data\n*Note: Multiple BAM files can also be suppplied as a quoted, space seperated, string to --unmatched-normal-BAMS, as an alternative to providing reference normal files, with increased run time*\n\n```\ninst/extcode/snp-pileup-wrapper.R \\\n  --snp-pileup-path \u003coptional, path to snp-pileup executable, defaults to snp-pileup in your PATH\u003e \\\n  --vcf-file \u003cpath to SNP VCF, e.g. dbSNP\u003e \\\n  --normal-bam Normal.bam \\\n  --tumor-bam Tumor.bam \\\n  --unmatched-normal-BAMS \u003c\"\u003csome/path/PoolNormal.bam\"\u003e\n  --output-prefix \u003cprefix for output file, e.g. Normal_Tumor_PoolNormal\u003e\n```\n\nThe above command was used to generate the following counts file for testing purposes:\n\n```\ninst/extdata/Normal_Tumor_PoolNormal.snp_pileup.gz\n```\n\n\n#### (3) Analysis \n\nparse and pre-process the input counts data (~5min): \n*including the argument refX=TRUE in call to readSnpMatrix() will force selection of a individual reference sample from chrX normalization (preffered)*\n\n```\nreadu \u003c- readSnpMatrix(filename = \"inst/extdata/Normal_Tumor_PoolNormal.snp_pileup.gz\",\n  MandUnormal = TRUE,\n  ReferencePileupFile = \"inst/extdata/reference_normals.snp_pileup.gz\",\n  ReferenceLoessFile = \"inst/extdata/reference_normals.loess.txt\",\n  useMatchedX = FALSE,\n  refX=TRUE)\n```\n\n```\ndata \u003c- preProcSample(readu$rcmat, unmatched = F,\n  ndepth = 50,het.thresh = 0.25, ndepthmax = 5000,\n  spanT = readu$spanT, spanA=readu$spanA, spanX = readu$spanX,\n  MandUnormal = TRUE)\n```\n\u003cbr/\u003e\u003cbr/\u003e\nperform a first pass with high cval to determine the logR corresponding with diploid state (diplLogR):\n\n```\npass1 \u003c- procSample(data,min.nhet = 10, cval = 150)\n```\n```\ndlr \u003c- pass1$dipLogR\n```\n\nperform a second pass with higher sensitivity cval:\n```\npass2 \u003c- procSample(data,min.nhet = 10, cval = 50, dipLogR = dlr)\nfit \u003c- emcncf(pass2, min.nhet = 10)\n```\n\n#### (3) Plot\n```\npng(filename = \"tests/analysis_with_reference_normals.png\",width = 4, height = 6, units = \"in\",res = 300)\nplotSample(x=pass2,emfit=fit, plot.type = \"both\")\ndev.off()\n```\n\u003cbr/\u003e\u003cbr/\u003e\n![test-1](/tests/pngs/analysis_with_reference_normals.png)\n\u003cbr/\u003e\u003cbr/\u003e\n#### (4) QC fit\ncheck fit with logRlogOR spider plot\n\n```\npng(filename = \"tests/QC_Fit.png\",width = 6, height = 6, units = \"in\",res = 300)\nlogRlogORspider(cncf = fit$cncf)\ndev.off()\n\n```\n\u003cbr/\u003e\u003cbr/\u003e\n![qc-fit](/tests/pngs/QC_Fit.png)\n\n*The segment summaries are plotted as circles where the size of the circle increases with the number of loci in the segment. The expected value for various integer copy number states are drawn as curves for purity ranging from 0 to 0.95. For a good fit, the segment summaries should be close to one of the lines.*\n\n\u003cbr/\u003e\u003cbr/\u003e\nSide by side Comparison of using matched vs unmatched normal for logR calculations:\n\n  Results using matched normal for copy number logR             |  Results using unmatched normal copy number logR\n:--------------------------------------------------------------:|:-------------------------------------------------------------------:\n![matched normal cnlr](/tests/pngs/analysis_with_matched_normal.png) | ![unmatched normal cnlr](/tests/pngs/analysis_with_reference_normals.png)\n\n\n\n---\n## Analysis of post transplant tumor samples with donor data\n*Starting with v0.3.0, allele specific copy number with transplant cases is possible by including a seperate counts matrix for donor sample(s). Requires a baseline host sample as matched normal (e.g. nails or other source of non-neoplastic cells).*\n\n#### (1) Generate counts file for donor sample(s) \n\n```\n#example\n\ninst/extcode/snp-pileup-wrapper.R \\\n  --output-prefix donor  \\\n  --vcf-file dbsnp_137.hg19__RmDupsClean__plusPseudo50__DROP_SORT_NOCHR.vcf \\\n  --unmatched-normal-BAMS \"\u003csome/path_to_bam_directory\u003e/donor_sample.bam\"\n  \n```\n\n#### (2) Analyze and plot\n\n\n```\nlibrary(facets2n)\n\n#read counts matrix for tumor and baseline host sample\n\nreadu = readSnpMatrix(filename = \"inst/extdata/transplant.snp_pileup.gz\",\n     MandUnormal = TRUE,\n     ReferencePileupFile = \"inst/extdata/reference_normals.snp_pileup.gz\",\n     ReferenceLoessFile = \"inst/extdata/reference_normals.loess.txt\",\n     useMatchedX = FALSE,\n     refX = TRUE)\n```\n\n```\n#read counts matrix for donor sample\n\nreadonor = readSnpMatrix(filename = \"inst/extdata/donor.snp_pileup.gz\", donorCounts = TRUE)\n```\n\n\n```\n# preprocess sample and limit hets to those that are het in both host and donor\n\ndata \u003c- preProcSample(readu$rcmat, \n     unmatched = F,\n     ndepth = 50,\n     het.thresh = 0.3, \n     ndepthmax = 5000,\n     spanT = readu$spanT, spanA=readu$spanA, spanX = readu$spanX,\n     MandUnormal = TRUE, \n     donorCounts = readonor)\n     \n```\n\n```\npass1 \u003c- procSample(data,min.nhet = 10, cval = 150)\ndlr \u003c- pass1$dipLogR\npass2 \u003c- procSample(data,min.nhet = 10, cval = 50, dipLogR = dlr)\nfit \u003c- emcncf(pass2, min.nhet = 10)\n```\n\n```\npng(filename = \"tests/host_donor_logOR.png\",width = 4, height = 6, units = \"in\",res = 500)\nplotSample(x=pass2,emfit=fit, plot.type = \"both\")\ndev.off()\n```\n#### (3) QC fit\n```\nlogRlogORspider(cncf = fit$cncf)\n```\n### Inaccurate results without baseline donor sample\n```\n#running without baseline donor sample produces inaccurate logOR. allele specific copy number is not possible.\ndata \u003c- preProcSample(readu$rcmat, \n          unmatched = F,\n          ndepth = 50,\n          het.thresh = 0.3,\n          ndepthmax = 5000,\n          spanT = readu$spanT,\n          spanA=readu$spanA,\n          spanX = readu$spanX,\n          MandUnormal = TRUE)\n\npass1 \u003c- procSample(data,min.nhet = 10, cval = 150)\ndlr \u003c- pass1$dipLogR\npass2 \u003c- procSample(data,min.nhet = 10, cval = 50, dipLogR = dlr)\nfit \u003c- emcncf(pass2, min.nhet = 10)\n```\n\n```\npng(filename = \"tests/only_host_logOR.png\",width = 4, height = 6, units = \"in\",res = 500)\nplotSample(x=pass2,emfit=fit, plot.type = \"both\")\ndev.off()\n```\n\n  Results using baseline host and donor samples           |  Results using baseline host sample only\n:--------------------------------------------------------:|:--------------------------------------------------------:\n![host and donor](tests/pngs/host_donor_logOR.png)        | ![host only](/tests/pngs/only_host_logOR.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Ffacets2n","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmskcc%2Ffacets2n","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Ffacets2n/lists"}