{"id":18014330,"url":"https://github.com/darkeyes/mrreg","last_synced_at":"2025-04-04T15:13:05.757Z","repository":{"id":56935157,"uuid":"260826219","full_name":"DarkEyes/MRReg","owner":"DarkEyes","description":" The framework for finding multiresolution partitions that have homogeneous linear models from multiresolution dataset.","archived":false,"fork":false,"pushed_at":"2022-08-19T15:12:23.000Z","size":1908,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-04T15:13:02.642Z","etag":null,"topics":["data-science","minimum-description-length","multiresolution","r","regression-analysis"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DarkEyes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-03T04:13:13.000Z","updated_at":"2022-08-19T15:12:26.000Z","dependencies_parsed_at":"2022-08-21T01:10:09.641Z","dependency_job_id":null,"html_url":"https://github.com/DarkEyes/MRReg","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarkEyes%2FMRReg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarkEyes%2FMRReg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarkEyes%2FMRReg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarkEyes%2FMRReg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DarkEyes","download_url":"https://codeload.github.com/DarkEyes/MRReg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247198466,"owners_count":20900081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","minimum-description-length","multiresolution","r","regression-analysis"],"created_at":"2024-10-30T04:07:47.774Z","updated_at":"2025-04-04T15:13:05.738Z","avatar_url":"https://github.com/DarkEyes.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"MRReg: MDL Multiresolution Linear Regression Framework\n==========================================================\n[![Travis CI build status](https://api.travis-ci.com/DarkEyes/MRReg.svg)](https://app.travis-ci.com/github/DarkEyes/MRReg/)\n[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/)\n[![CRAN Status Badge](https://www.r-pkg.org/badges/version-last-release/MRReg)](https://cran.r-project.org/package=MRReg)\n[![Download](https://cranlogs.r-pkg.org/badges/grand-total/MRReg)](https://cran.r-project.org/package=MRReg)\n[![arXiv](https://img.shields.io/badge/cs.LG-arXiv%3A1907.05234-B31B1B.svg)](https://arxiv.org/abs/1907.05234)\n[![](https://img.shields.io/badge/doi-10.1145%2F3424670-yellow)](https://doi.org/10.1145/3424670)\n[![License](https://img.shields.io/badge/License-MIT-orange.svg)](https://spdx.org/licenses/MIT.html)\n\nIn this work, we provide the framework to analyze multiresolution partitions (e.g. country, provinces, subdistrict) where each individual data point belongs to only one partition in each layer (e.g. i belongs to subdistrict A, province P, and country Q).\n\nWe assume that a partition in a higher layer subsumes lower-layer partitions (e.g. a nation is at the 1st layer subsumes all provinces at the 2nd layer). \n\nGiven N individuals that have a pair of real values (x,y) that generated from  independent variable X and dependent variable Y.\nEach individual i belongs to one partition per layer.\n\nOur goal is to find which partitions at which highest level that all individuals in the these partitions share the same linear model Y=f(X) where f is a linear function.\n\nThe framework deploys the Minimum Description Length principle (MDL) to infer solutions.\n\nInstallation\n------------\n\nFor the newest version on github, please call the following command in R terminal.\n\n\n``` r\nremotes::install_github(\"DarkEyes/MRReg\")\n```\nThis requires a user to install the \"remotes\" package before installing MRReg.\n\nExample: Inferred optimal homogeneous partitions\n----------------------------------------------------------------------------------\n\nIn the first step, we generate a simulation dataset.\n\nAll simulation types have three layers except the type 4 has four layers.\n\nThe type-1 simulation has all individuals belong to the same homogeneous partition in the first layer.\n\nThe type-2 simulation has four homogeneous partitions in a second layer. Each partition has its own models.\n\nThe type-3 simulation has eight homogeneous partitions in a third layer. Each partition has its own models\n\nThe type-4 simulation has one homogeneous partition in a second layer, four homogeneous partitions in a third layer, and eight homogeneous partitions in a fourth layer. Each partition has its own model.\n\nIn this example, we use type-4 simulation.\n\n\n```{r}\nlibrary(MRReg)\n\n# Generate simulation data type 4 by having 100 individuals per homogeneous partition.\nDataT\u003c-SimpleSimulation(100,type=4)\n\ngamma \u003c- 0.05 # Gamma parameter\n\nout\u003c-FindMaxHomoOptimalPartitions(DataT,gamma)\n\n```\nThen we plot the optimal homogeneous tree.\n\n```{r}\nplotOptimalClustersTree(out)\n```\n\n\u003cimg src=\"https://github.com/DarkEyes/MRReg/raw/master/man/FIG/Htree.png\" width=\"600\"\u003e\n\nThe red nodes are homogeneous partitions.\nAll children of a homogeneous partition node share the same linear model.\n\n\nLastly, we can print the result in text form.\n\n\n```{r}\nPrintOptimalClustersResult(out, selFeature = TRUE)\n```\nThe result is below.\n```{r}\n[1] \"========== List of Optimal Clusters ==========\"\n[1] \"Layer2,ClS-C1:clustInfoRecRatio=0.08,modelInfoRecRatio=0.72, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 2\n[1] \"Layer3,ClS-C11:clustInfoRecRatio=0.10,modelInfoRecRatio=0.63, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 2\n[1] \"Layer3,ClS-C12:clustInfoRecRatio=0.10,modelInfoRecRatio=0.70, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 3\n[1] \"Layer3,ClS-C13:clustInfoRecRatio=0.10,modelInfoRecRatio=0.68, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 4\n[1] \"Layer3,ClS-C14:clustInfoRecRatio=0.09,modelInfoRecRatio=0.61, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 5\n[1] \"Layer4,ClS-C21:clustInfoRecRatio=NA,modelInfoRecRatio=0.61, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 2\n[1] \"Layer4,ClS-C22:clustInfoRecRatio=NA,modelInfoRecRatio=0.58, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 3\n[1] \"Layer4,ClS-C23:clustInfoRecRatio=NA,modelInfoRecRatio=0.61, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 4\n[1] \"Layer4,ClS-C24:clustInfoRecRatio=NA,modelInfoRecRatio=0.46, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 5\n[1] \"Layer4,ClS-C25:clustInfoRecRatio=NA,modelInfoRecRatio=0.55, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 6\n[1] \"Layer4,ClS-C26:clustInfoRecRatio=NA,modelInfoRecRatio=0.60, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 7\n[1] \"Layer4,ClS-C27:clustInfoRecRatio=NA,modelInfoRecRatio=0.63, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 8\n[1] \"Layer4,ClS-C28:clustInfoRecRatio=NA,modelInfoRecRatio=0.61, eta(C)cv=1.00\"\n[1] \"Selected features\"\n[1] 9\n[1] \"min eta(C)cv:1.000000\"\n```\nNote for selected features: 1 is reserved for an intercept, and d is a selected feature if Y[i] ~ X[i,d-1] in linear model.\nNote that the clustInfoRecRatio values are always NA for last-layer partitions.\n\nExplanation: FindMaxHomoOptimalPartitions(DataT,gamma)\n----------------------------------------------------------------------------------\n\n- INPUT: DataT$X[i,j] is the value of jth independent variable of ith individual. \n- INPUT: DataT$Y[i] is the value of dependent variable of ith individual. \n- INPUT: DataT$clsLayer[i,k] is the cluster label of ith individual in kth cluster layer.\n\n- OUTPUT: out$Copt[p,1] is equal to k implies that a cluster that is a pth member of the maximal homogeneous partition is at kth layer and the cluster name in kth layer is Copt[p,2]\n- OUTPUT: out$Copt[p,3] is \"Model Information Reduction Ratio\" of pth member of the maximal homogeneous partition: positive means the linear model is better than the null model.\n- OUTPUT: out$Copt[p,4] is $\\eta( {C} )_{\\text{cv}}$  of pth member of the maximal homogeneous partition. The greater Copt[p,4], the higher homogeneous degree of this cluster.\n- OUTPUT: out$models[[k]][[j]] is the linear regression model of jth cluster in kth layer.\n- OUTPUT: out$models[[k]][[j]]$clustInfoRecRatio is the \"Cluster Information Reduction Ratio\" between the jth cluster in kth layer and its children clusters in (k+1)th layer: positive means current cluster is better than its children clusters. Hence, we should keep this cluster at the member of maximal homogeneous partition instead of its children. \n\nCitation\n----------------------------------------------------------------------------------\nChainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2021). Identifying Linear Models in Multi-Resolution Population Data using Minimum Description Length Principle to Predict Household Income. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(2), 1-30. https://doi.org/10.1145/3424670  \n\nContact\n----------------------------------------------------------------------------------\n- Developer: C. Amornbunchornvej\u003cdiv itemscope itemtype=\"https://schema.org/Person\"\u003e\u003ca itemprop=\"sameAs\" content=\"https://orcid.org/0000-0003-3131-0370\" href=\"https://orcid.org/0000-0003-3131-0370\" target=\"orcid.widget\" rel=\"noopener noreferrer\" style=\"vertical-align:top;\"\u003e\u003cimg src=\"https://orcid.org/sites/default/files/images/orcid_16x16.png\" style=\"width:1em;margin-right:.5em;\" alt=\"ORCID iD icon\"\u003ehttps://orcid.org/0000-0003-3131-0370\u003c/a\u003e\u003c/div\u003e\n- \u003ca href=\"https://www.nectec.or.th\"\u003eStrategic Analytics Networks with Machine Learning and AI (SAI)\u003c/a\u003e, \u003ca href=\"https://www.nectec.or.th/en/\"\u003eNECTEC\u003c/a\u003e, Thailand\n- Homepage: \u003ca href=\"https://sites.google.com/view/amornbunchornvej/home\"\u003eLink\u003c/a\u003e\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarkeyes%2Fmrreg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarkeyes%2Fmrreg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarkeyes%2Fmrreg/lists"}