{"id":17892733,"url":"https://github.com/asmagen/spagefinder","last_synced_at":"2025-03-23T01:32:03.379Z","repository":{"id":67192880,"uuid":"163208790","full_name":"asmagen/SPAGEfinder","owner":"asmagen","description":"Computational approach to identify Survival associated Pairwise Gene Expression states.","archived":false,"fork":false,"pushed_at":"2023-07-16T20:50:22.000Z","size":41294,"stargazers_count":8,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-18T16:22:37.139Z","etag":null,"topics":["genetic","genetic-interactions","interaction","survival-analysis","tcga-data"],"latest_commit_sha":null,"homepage":"http://dx.doi.org/10.1016/j.celrep.2019.06.067","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asmagen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-26T18:55:34.000Z","updated_at":"2023-10-08T08:04:08.000Z","dependencies_parsed_at":"2023-10-20T18:01:46.903Z","dependency_job_id":null,"html_url":"https://github.com/asmagen/SPAGEfinder","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asmagen%2FSPAGEfinder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asmagen%2FSPAGEfinder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asmagen%2FSPAGEfinder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asmagen%2FSPAGEfinder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asmagen","download_url":"https://codeload.github.com/asmagen/SPAGEfinder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245044042,"owners_count":20551876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genetic","genetic-interactions","interaction","survival-analysis","tcga-data"],"created_at":"2024-10-28T14:43:36.854Z","updated_at":"2025-03-23T01:32:01.517Z","avatar_url":"https://github.com/asmagen.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![DOI](https://zenodo.org/badge/163208790.svg)](https://zenodo.org/badge/latestdoi/163208790)\n\n# SPAGE-finder Manual (*[Magen](https://assafmagen.com) et al*)\n\nThis repository contains code and documentation for a multi-step computational pipeline to search for Survival associated Pairwise Gene Expression states finder (SPAGE-finder).\nPreviously, the repository was called EnGIne. The [paper](http://dx.doi.org/10.1016/j.celrep.2019.06.067) describing the project can be found at:\nhttp://dx.doi.org/10.1016/j.celrep.2019.06.067\n\nThe repository can be retrieved from GitHub via the command\n*git clone https://github.com/asmagen/SPAGEfinder*\n\nThis creates a new directory called SPAGEfinder where this README.md can be found. If one reads README.md as a text file, there will be asterisks around the variable names; do not copy the asterisks when copying a command to assign a value to a variable.\n\nSPAGEfinder has two subdirectories: \n1. data\n2. R  \n\n# Code files\nThere are seven code files:\n- aggregateLogRankGene.cpp\n- analyze.pairwise.significance.R  \n- calculate.base.cox.model.R  \n- calculate.candidates.cox.fdr.R  \n- main.script.functions.R  \n- main.script.R \n- merge.pancancer.results.R\n\nSix of these are in R and one in C++.\nThe C++ code is compiled from within R. Therefore, one need not compile it before running the pipeline.\n\n## Input format\nThe input includes mRNA expression matrix and patient clinical-demographic information which are stored in a single data source object and corresponding file. The TCGA dataset (filtered to Census Cancer Genes) is provided here for quick analysis (Example dataset available: 'data/data.mRNA.RData').\n\n## Output format\nFinal SPAGEs list is generated in a matrix format where each row represents a SPAGE that is annotated by a quadruple *(x,y,bin,effect)*, where x and y are the two interacting genes, bin is a number indicating the bin annotation and effect annotated the significance level where the sign of the effect represents the direction of the interaction; Positive sign represents higher survival risk while negative sign represents lower survival risk.\n\n## Analysis setup\n\n### UNIX commands\nConnect to remote server [Example: `ssh USER_NAME@SERVER_ADDRESS`]  \n```git clone https://github.com/asmagen/SPAGEfinder.git```  \n\nRequest an interactiveexecution session if required by your system [Example: `sinteractive`]\n\nInvoke R version 3.3.1 [Example (may be different across systems): `module purge; module add R/3.3.1; R`]  \nThe following commands are set and run in the R environment.  \n\n### Install R packages\nInstall the required R packages into the default location (no need to specify where to install, enter 'yes' to indicate installation to personal library if asked).  \n```install.packages(pkgs = c('Rcpp','RcppArmadillo','survival','rslurm','foreach','doMC','data.table','igraph','whisker','foreach'))```  \nSpecify a repository of choice and verify successful installations by loading packages (Example: library('Rcpp')).  \n`source(\"https://bioconductor.org/biocLite.R\"); biocLite(\"survcomp\")`  \nVerify successful installation.\n\n### Define relevant analysis paths (in R)\n```\nr.package.path = 'USER_SET_PACKAGE_PATH' # Define path for the downloaded pipeline scripts and data (Example: '/USER/SPAGEfinder')  \nresults.path = 'USER_SET_ANALYSIS_DIRECTORY_PATH' # Define path for analysis results set by user (Example: '/USER/analysis/TCGA_analysis')  \n```\n### Assign values to additional analysis and slurm parameters\n\nThe suggested values shown below may be adjusted by the user as needed.  \n\n*p.val.quantile.threshold* = 0.8 # Log-Rank threshold (p value quantile)  \n\nThe next 7 parameters may need to be adjusted based on the specification of your high-performance computing system. Run the following command to obtain the info about the available queues, memory, walltime and num.jobs (number of concurrent jobs) resources:  \n```\nsacctmgr show qos format=name,MaxJobs,MaxWall,MaxTRES\n\n      Name MaxJobs     MaxWall       MaxTRES \n---------- ------- ----------- ------------- \n    normal                                   \n   default      16    01:00:00        mem=4G \nthroughput     125    18:00:00       mem=36G \nhigh_thro+     300    08:00:00        mem=8G \n     large       5 11-00:00:00      mem=128G \n    xlarge       1 21-00:00:00      mem=512G \n      long      16  7-00:00:00       mem=12G \nworkstati+       4  7-00:00:00       mem=48G \n```\n\nBased on this information and the scope of the analysis (whole-genome or in this example case, only about 500 genes) you would define the following parameters:  \n```\nqueues   = 'throughput' # SLURM HPC queue  \nnum.jobs = 50 # Number of concurrent jobs  \nwalltime = '1:00:00' # Time limit per job  \nmemory   = '4GB' # Memory allocation per job  \n```\nAnd the following parameters specifically for the merge.pancancer.results function as it requires more memory than the usual:  \n```\nlarge.queues   = 'throughput' # SLURM HPC queue   \nlarge.walltime = '1:00:00' # Time limit per job  \nlarge.memory   = '36GB' # Memory allocation per job  \n```\nThe appropriate parameters for whole genome analysis (analysis of about 20k genes) are:  \n*num.jobs* = 120, *walltime* = '8:00:00', *memory* = '8GB', *large.queues* = 'large', *large.walltime* = '5:00:00', *large.memory* = '120GB'  \n\nNote that in some systems there is no need to specify the queues parameter. If the queues specification above results in an error, use *queues = NA* and *large.queues = NA* to let the system choose the appropriate queues by itself.\n\nContinue the analysis by executing the commands in 'R/main.script.R'  \n\n### Creating new datasets for analysis\nThe function preprocess.genomic.data (r.package.path) can be used to process and perform binning of a 'dataset' object located at 'data/dataset.RData') and constructed in the following fields:  \n- *mRNA* - RSEM normalized mRNA measurements (rows corresponding to genes and columns to samples)  \n- *scna* - copy-number variation measurements (rows corresponding to genes and columns to samples)  \n- *samples* - sample IDs as factors\n- *type* - cancer types as factors\n- *sex* - sex annotation as (two) factors\n- *race* - race annotation as factors\n- *time* - patient survival as number of days to death  \n- *status* - patient death/alive status as 0 or 1, respectively  \n\nThe annotation or format of *samples, type, sex and race* is not important as long as the variables are converted to factors.\n\n### Potential problems\n\n- The following error may rarely come up in the *get_slurm_output* function:  \n\t```'slurm_load_jobs error: Socket timed out on send/recv operation'```  \n\tThe error does not reflect the failure of the analysis but only a failure of monitoring the job execution. Simply rerun the *get_slurm_output* function.  \n- The following error may come up due to insufficient resource allocation:  \n\t'The following files are missing: ... Check failed jobs error outputs'  \n\n\n[Assaf Magen Ph.D.](https://assafmagen.com) and [Ask Mendel AI](https://askmendel.ai)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasmagen%2Fspagefinder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasmagen%2Fspagefinder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasmagen%2Fspagefinder/lists"}