{"id":20921170,"url":"https://github.com/czheluo/ppisr","last_synced_at":"2026-04-21T22:02:13.181Z","repository":{"id":165921458,"uuid":"316910869","full_name":"czheluo/PPISR","owner":"czheluo","description":"Polygenic Prediction of Complex Traits with Iterative Screen Regression Models","archived":false,"fork":false,"pushed_at":"2021-04-05T15:05:52.000Z","size":12999,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-19T17:34:36.056Z","etag":null,"topics":["genomic-selection","gwas","isr","polygenic"],"latest_commit_sha":null,"homepage":"http://melworks.org/","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/czheluo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-29T08:52:44.000Z","updated_at":"2021-04-05T15:05:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"148448a9-d890-4119-8427-2575396eddf2","html_url":"https://github.com/czheluo/PPISR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czheluo%2FPPISR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czheluo%2FPPISR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czheluo%2FPPISR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czheluo%2FPPISR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/czheluo","download_url":"https://codeload.github.com/czheluo/PPISR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243318751,"owners_count":20272137,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genomic-selection","gwas","isr","polygenic"],"created_at":"2024-11-18T19:42:37.135Z","updated_at":"2025-12-28T22:39:27.930Z","avatar_url":"https://github.com/czheluo.png","language":"MATLAB","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Polygenic Prediction of Complex Traits with Iterative Screen Regression Models\n\n\n\u003cdiv align=\"center\"\u003e\u003ca href=\"\"\u003e\u003cimg src=\"Fig/PPISR.png\" width=\"80%\" alt=\"pca\"\u003e\u003c/a\u003e\u003c/div\u003e\n\n#### Compared result For Polygenic Prediction model \n\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"/Fig/PVE5.png\"\u003e\n\u003cimg src=\"Fig/PVE5.png\" width=\"80%\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003e \u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e Schematic overview of model-based is iterative screening regression for GS. The first input dataset with markers (SNPs) matrix representing individual genotypes (rows) of a population with alleles (0, 2, and 1, missing genotypes will be replaced by the mean genotype or imputed by others complicate algorithm) per marker (columns). Secondly, we formulated a regression information criterion (RIC, objective function) as the screening criterion of the regression model. Combined the proposed iterative screen optimize the procedure, which mainly included expansion screen and contraction select two-steps. The third, apply it to multiple regression analysis, and two models can be selected, one for the linear model and the other for is the binomial model (including the epistasis effect). We using ISR model for  Polygenic prediction, comparing with [DPR](http://xzlab.org/software.html), [BayesR](https://github.com/syntheke/bayesR), [BSLMM](http://xzlab.org/software.html),[rrBLUP](https://cran.r-project.org/web/packages/rrBLUP/index.html),[Bayes A,B,C AND LASSO](https://github.com/gdlc/BGLR-R).\n\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n# Installation\n\u003eOnly to need install [MATLAB](https://www.mathworks.com/products/matlab.html) software which the code can compile in linux and window environment.If you runing in linux, you need install [plink](https://www.cog-genomics.org/plink/) for data file transform.\n# Data Preparation\n## Missing genotypes \n\u003e \u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003eThe ISR method relies on having complete or imputed genotype data at each SNP. That is, ISR requires the user to impute all missing genotypes before perform association testing. This imputation step is arguably preferable than simply dropping individuals with missing genotypes since it can improve power to detect associations. Therefore, for ﬁtting ISR that missing genotypes are recommended to be imputed ﬁrst (e.g.,using the [Beagle](https://faculty.washington.edu/browning/beagle/beagle.html)). Otherwise, the missing genotypes for SNPs which replaced with the estimated mean or median genotype of that SNP in ISR.\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\n## Missing Phenotypes\n\u003e\u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003eIndividuals with missing phenotypes will not be included in the ISR analysis. If all individuals have missing phenotypes, then no SNP and no individuals will be included in the analysis.\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\n## Input File Formats\n\u003e\u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e ISR requires four input files containing genotypes, phenotypes,and covariates (optionally). Genotype and phenotype files can be in multiple formats, which the PLINK binary [traw (recommended)](https://www.cog-genomics.org/plink/1.9/formats#traw), [bed](https://www.cog-genomics.org/plink/1.9/formats#bed),  and [vcf](https://samtools.github.io/hts-specs/VCFv4.2.pdf) formats. which bed and vcf formats only can run in linux, and using [PLINK](https://www.cog-genomics.org/plink/2.0/) software recode to .traw format that codes genotypes using 0/1/2.\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n```matlab\n# or can use the R.matlab R packages output the mat format \n\u003e traw2mat('pop.fam','pop.traw','pop.mat',2279,355551,1,1)\n```\n\n### Phenotype File\n\u003e\u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e Reformat the phenotype files in the same order of [.fam](https://www.cog-genomics.org/plink/1.9/formats#fam) (PLINK sample information file). The columns represented as following, included the six columns and both the phenotype value. Missing phenotype information is denoted as NA. The number of rows should be equal to the number of individuals in the genotype file. \u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e\n1. Family ID ('FID')\n2. Within-family ID ('IID'; cannot be '0')\n3. Within-family ID of father ('0' if father isn't in dataset)\n4. Within-family ID of mother ('0' if mother isn't in dataset)\n5. Sex code ('1' = male, '2' = female, '0' = unknown)\n6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control) \u003e- trait one \n7. trait two\n8. ....\n....\n\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n# Running ISR \n\u003e Compile in linux enviroment. make sure you add alias in path like  \"alias matlab='/mnt/d/linux/MATLAB2016b/bin/matlab -nodesktop -nosplash -singleCompThread -logfile `date +%Y_%m_%d-%H_%M_%S`.log -r'\" before Running. A Small GWAS Example Dataset Running as following:\n```matlab\n%% not add covariates \n$ matlab \"PPISR_linux('phefile','../data/pop.fam','genofile','../data/pop.traw','sample',87,'nSNP',28228,'ntrait',1,'ncov',5),exit;\"\n%% add five PCs as covariates\n$ matlab \"PPISR_linux('phefile','../data/pop.fam','genofile','../data/pop.traw','sample',87,'nSNP',28228,'ntrait',1,'ncov',5),exit;\"\n%% nolinear model and epistasis \n$ matlab \"PPISR_linux('phefile','../data/pop.fam','genofile','../data/pop.traw','sample',87,'nSNP',28228,'ntrait',1,'ncov',5,'mdl',2),exit;\"\n%% load mat data\n$ matlab \"PPISR_linux('matfile','demo.mat','sample',798,'nSNP',92641,'ncov',3),exit;\"\n$ Basic Usage\nOptions :\nphefile = string, can be any of file format split with \"\\t\"(default = 'phe.fam')\ngenofile = string, .traw file format from plink (default = 'pop.traw')\noutfile = string, save covert genotypes file name with any name you defined and save matlab format (default = 'pop.traw.mat')\nsample = number, the number of individuals you want to analysis\nnSNP = number, the number of SNPs.\nntrait = number, the number of traits.\nchr = number, the number of chromosome.\nopt_outresult = string, write the result to text file (default = 'ISR.opt.outresult.txt') \nall_outresult = string, write the result to text file (default = 'ISR.outresult.txt')  \nvcf = string, the VCF file name.\nbed = string, the bed file name.\nncov = number, the number of PCs covariates.\nIM = impute missing genotype with mean and median value, '1' was the default method means and others was median.\nsgv = number, the bonferroni correction for association tests results.\nmdl = number, 1 for linear model and 2 or 3 for nolinear model ; input('Using Model II(without square term 2) or Model III(with square term 3) 2/3? ').\n```\n\n## Output Files\n\u003e \u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e There will be two output files, both inside an output folder in the current directory. The pre-fix.log.txt file contains some detailed information about the running parameters and computation time. In addition, prefix.log.txt contains PVE estimate and its standard error in the null linear mixed model. \u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\u003e \u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e The ISR.outresult.txt contains the results. An example file with a few SNPs is shown below: \u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e\n1. SNP\tChromosome\tPosition\tFt\tP_F\tBeta\tSEbeta\tR2\n2. chr1_3913855\t1\t3913855\t0.046 \t8.3E-01\t0.0111 \t0.0517 \t0.0001 \n3. chr1_3913895\t1\t3913895\t1.737 \t1.9E-01\t0.1237 \t0.0939 \t0.0034 \n4. chr1_4680660\t1\t4680660\t0.161 \t6.9E-01\t-0.0324 \t0.0807 \t0.0003 \n5. chr1_4680664\t1\t4680664\t0.572 \t4.5E-01\t-0.0841 \t0.1113 \t0.0011 \n6. chr1_4680724\t1\t4680724\t0.807 \t3.7E-01\t-0.0697 \t0.0776 \t0.0016 \n7. chr1_4680752\t1\t4680752\t1.333 \t2.5E-01\t-0.1330 \t0.1151 \t0.0026 \n8. chr1_4680899\t1\t4680899\t3.587 \t6.2E-02\t-0.1385 \t0.0731 \t0.0069 \n\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n\u003e \u003cspan style=\"font-family: Times New Roman; font-size:1em;\"\u003e\u003cfont size=\"4\"\u003e \u003cdiv align=\"justify\"\u003e The eight columns are: SNP ids, chromosome, base-pair positions on the chromosome, F value and p-value for F value, beta estimates, standard errors for beta, estimates each marker PVE. And ISR.opt.outresult.txt  only contain the significant marker after Bonferroni correction at 5% default.\u003c/div\u003e\u003c/font\u003e \u003c/span\u003e\n\n## Maize dataset result\n\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"/Fig/maize.png\"\u003e\n\u003cimg src=\"Fig/maize.png\" width=\"80%\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n## Citation\nMeng Luo and Shiliang Gu (2020). [Polygenic Prediction of Complex Traits with Iterative Screen Regression Models](https://www.biorxiv.org/content/10.1101/2020.11.29.402180v1). bioRxiv.\n\n## Contact\n\nIf you are interested in our model, and found any questions OR comments feel free to contact [me](czheluo@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczheluo%2Fppisr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fczheluo%2Fppisr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczheluo%2Fppisr/lists"}