{"id":28436001,"url":"https://github.com/immunogenomics/scent","last_synced_at":"2025-07-08T04:33:35.966Z","repository":{"id":100623681,"uuid":"541720639","full_name":"immunogenomics/SCENT","owner":"immunogenomics","description":"Single-Cell ENhancer Target gene mapping using multimodal data with ATAC + RNA","archived":false,"fork":false,"pushed_at":"2025-04-21T05:31:15.000Z","size":10138,"stargazers_count":74,"open_issues_count":1,"forks_count":10,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-06-05T21:09:42.903Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/immunogenomics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-26T18:09:21.000Z","updated_at":"2025-04-03T08:45:38.000Z","dependencies_parsed_at":"2024-01-02T19:30:31.414Z","dependency_job_id":"5e368993-2b9f-445f-85d1-a3de0a5bcca0","html_url":"https://github.com/immunogenomics/SCENT","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/immunogenomics/SCENT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FSCENT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FSCENT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FSCENT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FSCENT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/immunogenomics","download_url":"https://codeload.github.com/immunogenomics/SCENT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/immunogenomics%2FSCENT/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262361354,"owners_count":23299080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-05T21:09:40.205Z","updated_at":"2025-06-28T01:32:24.470Z","avatar_url":"https://github.com/immunogenomics.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n# SCENT\n\nSingle-Cell ENhancer Target gene mapping using multimodal data with ATAC + RNA\n\nThe manuscript is now publised in *Nature Genetics*! (Sakaue et al. [\"**Tissue-specific enhancer-gene maps from multimodal single-cell data identify causal disease alleles**\"](https://www.nature.com/articles/s41588-024-01682-1))\n\n\n\n### Overview\n\nSCENT uses single-cell multimodal data (e.g., 10X Multiome RNA/ATAC) and links ATAC-seq peaks (putative enhancers) to their target genes by modeling association between chromatin accessibility and gene expression across individual single cells.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/immunogenomics/SCENT/main/fig/cover_image2.png\" width=90%\u003e\n\u003c/div\u003e\n\n\n\nWe use Poisson regression to associate gene expression (raw) count and (binarized) peak accessibility, and estimate errors in coefficients by bootstrapping framework to control for type I error.\n\n\n### Release notes\n\n- **v1.0.1**: Aug 2024, bug fix in parallelization scripts in `Parallelized Bash Script` folder\n- **v1.0.0**: Jan 2024, first official release\n\n### Installation of SCENT Package\n\nYou can install the development version of SCENT from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"immunogenomics/SCENT\")\n```\n\n\n### Requirements\n\nThe SCENT package will automatically install CRAN R packages. The packages below will go into your `R`.\n\n- `methods`\n- `data.table`\n- `lme4`\n- `stringr`\n- `boot`\n- `MASS`\n- `Matrix`\n- `parallel`\n\nThe SCENT package also requires command-line tool, bedtools, for developing a list of: gene-peak pair dataframes to parallelize through.\n- `https://github.com/arq5x/bedtools2`\n\n\n### Example usage\n\nVignettes are posted in this github repo to show 2 potential uses of the SCENT package.\n\n### 1.) Using SCENT interactively for testing small sets of gene-peak associations\n\n`SCENT_interactive.Rmd` vignette contains an example of using the SCENT package to generate results on small sets of gene-peak associations. \n\nIn summary, the main functionality is the SCENT object construction:\n\n```r\nlibrary(SCENT)\n\nSCENT_obj \u003c- CreateSCENTObj(rna = mrna, atac = atac, meta.data = meta,\n                            peak.info = gene_peak,\n                            covariates = c(\"log(nUMI)\",\"percent.mito\",\"sample\", \"batch\"), \n                            celltypes = \"celltype\")\n```\n\nFollowed by SCENT algorithm:\n\n```r\nSCENT_obj \u003c- SCENT_algorithm(object = SCENT_obj, celltype = \"Tcell\", ncores = 6, regr = 'poisson', bin = TRUE)\n```\nThe user specifies a `celltype` (in this case “Tcell”) for association analysis (in `meta.data` slot in SCENT object), `ncores` for the number of cores for parallelized bootstrapping, `regr` for the regression type (Poisson ‘poisson’ or Negative Binomial ‘negbin’ regression), and `bin` for whether to binarize ATAC counts (TRUE for binarization or FALSE for not).\n\nThe output of the SCENT algorithm will be contained in the field:\n```r\nSCENT_obj@SCENT.result\n```\nwhich can be saved as a textfile for further downstream analysis.\n\n\nFurther information on Inputs and Outputs of SCENT are detailed below:\n\n#### Arguments To `CreateSCENTObj`:\n\n| #    | Argument name (format)       | Descriptions                                                 |\n| ---- | ---------------------------- | ------------------------------------------------------------ |\n| 1    | rna (sparse matrix) | A gene-by-cell count matrix from multimodal RNA-seq data. This is a raw count matrix without any normalization. The row names should be the gene names used in the `peak.info` file. The column names are the cell names which should be the same names used in the `cell`column of the dataframe specified for `meta.data`. Sparse matrix format is required. |\n| 2    | atac (sparse matrix) | A peak-by-cell count matrix from multimodal ATAC-seq data. This is a raw count matrix without any normalization. The row names should be the peak names used in the `peak.info` file. The column names are the cell names which should be the same names used in `rna` and the `cell`column of dataframe specified for `meta.data`. The matrix may not be binarized while it will be binarized within the function. Sparse matrix format is required. |\n| 3    | meta.data (dataframe)     | A meta data frame for cells (rows are cells, and **cell names should be in the column named as \"cell\"**; see below example). Additionally, this text should include covariates to use in the model. Examples include: % mitochondrial reads, log(nUMI), sample, and batch as covariates. Dataframe format is required. |\n| 4    | peak.info (dataframe) | A table with two columns indicating which gene-peak pairs you want to test in this chunk (see below example) **genes should be in the 1st column and peaks in the 2nd column**. We highly recommend splitting gene-peak pairs into many chunks to increase computational efficiency (See Parallelized Jobs Info in Section 2). List(Dataframe) format which is a list of multiple data frames for parallelization is required. \\* |\n| 5    | covariates (a vector of character) | A vector of character fields that denote the covariates listed in the meta.data. For example, a set of covariates can be: %mitochondrial reads, log_nUMI, sample, and batch. Additionally the user can specify transformations to the covariates such as log transformation on nUMI counts for direct usage in the SCENT algorithm invoking poisson glm. **We recommend users to at least use log(number_of_total_RNA_UMI_count_per_cell) as the base model is Poisson regression and we do not include the offset term into the default model.** |\n| 6    | celltypes (character)        | User specified naming of the celltype column in the meta.data file. This column should contain the names of the celltypes you want to test in this association analysis. |\n\n\\* Extra Argument: The peak.info.list field can be left blank initially and a created List(Dataframe) can be constructed using the CreatePeakToGeneList function in the SCENT package. This function requires the user to specify a bed file that specifies ~500 kb windows of multiple gene loci to identify cis gene-peak pairs to test. The vignette, SCENT_parallelize.Rmd, will show steps to produce a SCENT object with a peak.info.list field that is used for parallelization in the SCENT_parallelization.R script.\n\n\n\n#### Example Formats: \nThe example format of  `peak.info` argument:\n\n```bash\n\u003e gene_peak \u003c- read.table(\"/path/to/your_gene_peak_text_file.txt\")\n\u003e head(gene_peak)\n\n    V1                      V2\n1 A1BG chr19-57849279-57850722\n2 A1BG chr19-57888160-57889279\n3 A1BG chr19-57915851-57917093\n4 A1BG chr19-57934422-57935603\n5 A1BG chr19-57946848-57948062\n```\n\nWe usually only select peaks of which the center falls within 500 kb from the target gene (*cis* analysis). Also, while we have a function to QC peaks and genes so that they are present in at least 5% of all cells within `SCENT.R`, **it is more efficient to only include these QCed peaks and genes in  `peak.info`  to reduce the number of tests**.\n\n\nThe example format of  `meta.data` argument:\n\n```r\nmeta \u003c- readRDS(metafile)\nmeta$`log(nUMI)` \u003c- log(meta$nUMI)\nhead(meta)\n\n                                 cell nUMI percent.mito   sample   batch\nAAACAGCCAAGGAATC-1 AAACAGCCAAGGAATC-1 8380   0.01503428 sample_1 batch_a\nAAACAGCCAATCCCTT-1 AAACAGCCAATCCCTT-1 3771   0.02207505 sample_1 batch_a\nAAACAGCCAATGCGCT-1 AAACAGCCAATGCGCT-1 6876   0.01435579 sample_1 batch_a\nAAACAGCCACACTAAT-1 AAACAGCCACACTAAT-1 1733   0.03881841 sample_1 batch_a\nAAACAGCCACCAACCG-1 AAACAGCCACCAACCG-1 5415   0.01600768 sample_1 batch_a\nAAACAGCCAGGATAAC-1 AAACAGCCAGGATAAC-1 2759   0.02485340 sample_1 batch_a\n                   celltype  log(nUMI)\nAAACAGCCAAGGAATC-1    Tcell   9.033603\nAAACAGCCAATCCCTT-1    Tcell   8.235095\nAAACAGCCAATGCGCT-1    Tcell   8.835792\nAAACAGCCACACTAAT-1    Tcell   7.457609\nAAACAGCCACCAACCG-1    Tcell   8.596928\nAAACAGCCAGGATAAC-1    Tcell   7.922624\n```\n\n\n#### Output of SCENT (`SCENT.result` slot)\n\n```bash\n\u003e head(SCENT_obj@SCENT.result)\ngene\tpeak\tbeta\tse\tz\tp\tboot_basic_p\nA1BG\tchr19-57849279-57850722\t0.587060911718621\t0.227961010352348\t2.57526894977009\t0.0100162168431262\t0.0192\nA1BG\tchr19-57888160-57889279\t-0.0842330294127105\t0.232845263030106\t-0.3617553920425660.717534829528597\t0.688\nA1BG\tchr19-57915851-57917093\t-0.00971211792633636\t0.225020479431863\t-0.0431610400566990.965573161660521\t1\nA1BG\tchr19-57934422-57935603\t0.0136752444069743\t0.249810124611214\t0.05474255468331160.956343566437322\t0.968\n```\n\nEach column indicates ...\n\n| Column       | Descriptions                                                 |\n| ------------ | ------------------------------------------------------------ |\n| gene         | The gene(-peak) pair in each test statistics                 |\n| peak         | The (gene-)peak pair in each test statistics                 |\n| beta         | The regression coefficient from primary Poisson regression   |\n| se           | The standard error  from primary Poisson regression          |\n| z            | The Z score from primary Poisson regression                  |\n| p            | The raw p value from primary Poisson regression              |\n| boot_basic_p | The bootstrap p value calculated from bootstrapping analyses |\n\n\n\n### 2.) Using SCENT with parallelized jobs.\n\n`SCENT_parallelization.R` is the example code necessary for running parallelized SCENT jobs.\nThis code needs a `SCENT_Object.rds` file that contains a list of gene-peak pairs. \nTo generate this object please follow the SCENT_parallelize.Rmd vignette file.\n\nThe corresponding bash script `parallelizedSCENT.sh` contains a parallelization scheme that is \ndependent on the amount of gene-peak pair batches that is user defined (for context please refer to the\nSCENT_parallelize.Rmd vignette). The main part of the bash script contains the line:\n\n```bash\nRscript SCENT_parallelization.R $LSB_JOBINDEX ${num_cores} ${file_SCENT_obj} ${celltype} ${regr} ${bin} ${output_dir}\n```\n\nArguments in the bash file are user specified as follows:\n\n|#      | Argument Name | Descriptions |\n| ----  | ------------- | ------------ |\n|1    | LSB_JOBINDEX   | jobarray index specified by BSUB -J SCENT[1-100] |\n|2    | num_cores      | number of cores (ex. 6) to parallelize to the SCENT algorithm |\n|3    | file_SCENT_obj | SCENT object that contains atac_matrix, rna_matrix, metafile, peak_gene_list, etc. To run the SCENT algorithm |\n|4    | celltype       | User specified celltype (ex. \"Tcells\") to run the SCENT algorithm |\n|5    | regr           | User specified regression type (ex. \"poisson\") to run SCENT algorithm |\n|6    | bin            | User specified choice to binarize ATAC counts (ex. TRUE) |\n|7    | output_dir     | User specified directory to output the SCENT results to aggregate once completed |\n\n### Enhancer-gene links from the paper\n\nSCENT enhancer-gene linkages (FDR\u003c10%) from the 8 datasets that we described in the paper can be downloaded from the following dropbox link.\n\nhttps://www.dropbox.com/scl/fo/g20tfnwkcuhib4a6z1wp4/ABYaK5s8bwTLnzrJ0KoZn48?rlkey=j1s5365gso53r2v2dsdynnsr2\u0026st=5np1fq0a\u0026dl=0\n\n### Contact\n\nSaori Sakaue ssakaue@broadinstitute.org\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimmunogenomics%2Fscent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimmunogenomics%2Fscent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimmunogenomics%2Fscent/lists"}