{"id":32204344,"url":"https://github.com/statcompute/yager","last_synced_at":"2025-10-22T04:54:15.633Z","repository":{"id":56935493,"uuid":"178591940","full_name":"statcompute/yager","owner":"statcompute","description":"General Regression Neural Networks","archived":false,"fork":false,"pushed_at":"2020-10-25T20:31:26.000Z","size":1076,"stargazers_count":11,"open_issues_count":0,"forks_count":4,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-22T04:53:40.415Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statcompute.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-30T17:45:21.000Z","updated_at":"2024-07-27T18:51:30.000Z","dependencies_parsed_at":"2022-08-21T05:50:13.641Z","dependency_job_id":null,"html_url":"https://github.com/statcompute/yager","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/statcompute/yager","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fyager","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fyager/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fyager/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fyager/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statcompute","download_url":"https://codeload.github.com/statcompute/yager/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fyager/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382997,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-22T04:54:13.106Z","updated_at":"2025-10-22T04:54:15.628Z","avatar_url":"https://github.com/statcompute.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"150\" height=\"100\" src=\"https://github.com/statcompute/yager/blob/master/code/08.jpg\"\u003e\n\u003c/p\u003e\n\n### \u003cp align=\"center\"\u003e Yet Another General Regression (YAG[e]R) \u003c/p\u003e\n### \u003cp align=\"center\"\u003e  Neural Network \u003c/p\u003e\n\n#### Introduction\n\nCompared with other types of neural networks, [General Regression Neural Network (Specht, 1991)](https://pdfs.semanticscholar.org/45f4/3abc49a8a60e6b43ddbda5af9fc6c88d663d.pdf) is advantageous in several aspects.\n\n1. Being an universal approximation function, GRNN has only one tuning parameter to control the overall generalization\n2. The network structure of GRNN is surprisingly simple, with only one hidden layer and the number of neurons equal to the number of training samples.\n3. GRNN is always able to converge globally and won’t be trapped by local solutions.\n4. The training of GRNN is a simple 1-pass, regardless of the sample size, and doesn’t require time-consuming iterations.\n5. Since any projected value of GRNN is the weighted average of training samples, predictions are bounded by the observed range.\n\nThe grnn package (https://cran.r-project.org/web/packages/grnn/index.html), which has not been updated since 2013, is the only implementation of GRNN on CRAN and was designed elegantly with a parsimonious set of functions and lots of opportunities for potential improvements. \n\nThe YAGeR project (https://github.com/statcompute/yager) is my attempt to provide a R implementation of GRNN, with several enhancements.\n\n1. While the training function **grnn.fit()** is very similar to learn() and smooth() in the grnn package. three functions were designed to provide GRNN projections. The **grnn.predone()** function generates one projected value based on an input vector. Both **grnn.predict()** and **grnn.parpred()** functions generate a vector of projected values based on an input matrix. The only difference is that **grnn.parpred()** runs in parallel and therefore can be 3 times faster than **grnn.predict()** on my 4-core workstation.\n2. While tuning the only hyper-parameter is the key in GRNN training, there are two functions in the GRnnet project to search for the optimal parameter through the n-fold cross validation, including **grnn.search_rsq()** for numeric outcomes and **grnn.search_auc()** for binary outcomes.\n3. In **grnn.predone()** function, while the default projection is based on the Euclidean distance, there is an option to calculate the GRNN projection based on the Manhattan distance as well for the sake of computational simplicity (Specht, 1991).\n\n#### Why Use GRNN?\n\nIn the banking industry, GRNN can be useful in several areas. First of all, it can be employed as the replacement of splines to approximate the term structure of interest rates. Secondly, like other neural networks, it can be used in Fraud Detection and Anti-Money Laundering given its flexibility. At last, in the credit risk modeling, it can also be used to develop performance benchmarks and rapid prototypes for scorecards or Expected Loss models due to the simplicity.\n\n#### Package Dependencies\nR version 3.6, base, stats, parallel, MLmetrics, randtoolbox, lhs\n\n#### Installation\n\nDownload the [yager_0.1.1.tar.gz](https://github.com/statcompute/yager/blob/master/yager_0.1.1.tar.gz) file, save it in your working directory, and then install the package as below.\n\n```r\ninstall.packages(\"yager_0.1.1.tar.gz\", repos = NULL, type = \"source\")\n```\n\nAlternatively, you can simply install from CRAN.\n\n```r\ninstall.packages(\"yager\")\n```\n\n#### Functions\n\n```txt\nYAGeR \n  |\n  |-- 1D Random Number Generators\n  |     |-- gen_unifm(min = 0, max = 1, n, seed = 1)\n  |     |-- gen_sobol(min = 0, max = 1, n, seed = 1)   \n  |     `-- gen_latin(min = 0, max = 1, n, seed = 1) \n  |\n  |-- Training\n  |     `-- grnn.fit(x, y, w = rep(1, length(y)), sigma = 1) \n  |\n  |-- Prediction\n  |     |-- grnn.predone(net, x, type = 1) \n  |     |-- grnn.predict(net, x)  \n  |     `-- grnn.parpred(net, x)  \n  |\n  |-- Parameter Tuning\n  |     |-- grnn.search_rsq(net, sigmas, nfolds = 4, seed = 1) \n  |     |-- grnn.search_auc(net, sigmas, nfolds = 4, seed = 1)\n  |     `-- grnn.optmiz_auc(net, lower = 0, upper, nfolds = 4, seed = 1, method = 1)\n  |\n  |-- Variable Importance\n  |     |-- grnn.x_imp(net, i, class = F) \n  |     |-- grnn.imp(net, class = F)\n  |     |-- grnn.x_pfi(net, i, class = F, ntry = 1e3, seed = 1)\n  |     `-- grnn.pfi(net, class = F, ntry = 1e3, seed = 1)\n  |\n  `-- Variable Effect\n        |-- grnn.margin(net, i, plot = T) \n        `-- grnn.partial(net, i, plot = T) \n```\n\n#### Example\nIt has been mentioned previously that GRNN is an ideal approach employed to develop performance benchmarks for a variety of risk models. People might wonder what the purpose of performance benchmarks is and why we would even need one at all. Sometimes, a model developer had to answer questions about how well the model would perform even before completing the model. Likewise, a model validator also wondered whether the model being validated has a reasonable performance given the data used and the effort spent. As a result, the performance benchmark, which could be built with the same data sample but an alternative methodology, is called for to address aforementioned questions.\n\nWhile the performance benchmark can take various forms, including but not limited to business expectations, industry practices, or vendor products, a model-based approach should possess following characteristics:  \n\n- Quick prototype with reasonable efforts\n- Comparable baseline with acceptable outcomes\n- Flexible framework without strict assumptions\n- Practical application to broad domains\n\nWith both empirical and conceptual advantages, GRNN is able to accommendate each of abovementioned requirements and thus can be considered an approriate candidate that might potentially be employed to develop performance benchmarks for a wide variety of models. \n\nBelow is an example illustrating how to use GRNN to develop a benchmark model for the logistic regression shown in https://github.com/statcompute/MonotonicBinning#example. \n\n```r\ndf \u003c- readRDS(\"df.rds\") \nsource(\"mob.R\")\nlibrary(yager)\n\n# PRE-PROCESS THE DATA WITH MOB PACKAGE\nbin_out \u003c- batch_bin(df, 3)\nbin_out$BinSum[order(-bin_out$BinSum$iv), ]\n#            var nbin unique miss min  median     max      ks     iv\n#   bureau_score   34    315  315 443   692.5     848 35.2651 0.8357\n#   tot_rev_line   20   3617  477   0 10573.0  205395 26.8943 0.4442\n#  age_oldest_tr   25    460  216   1   137.0     588 20.3646 0.2714\n#      tot_derog    7     29  213   0     0.0      32 20.0442 0.2599\n#            ltv   17    145    1   0   100.0     176 16.8807 0.1911\n#       rev_util   12    101    0   0    30.0     100 16.9615 0.1635\n#         tot_tr   15     67  213   0    16.0      77 17.3002 0.1425\n#   tot_rev_debt    8   3880  477   0  3009.5   96260  8.8722 0.0847\n#     tot_rev_tr    4     21  636   0     3.0      24  9.0779 0.0789\n#     tot_income   17   1639    5   0  3400.0 8147167 10.3386 0.0775\n#    tot_open_tr    7     26 1416   0     5.0      26  6.8695 0.0282\n\n# PERFORMAN WOE TRANSFORMATIONS\ndf_woe \u003c- batch_woe(df, bin_out$BinLst)\n\n# PROCESS AND STANDARDIZE THE DATA WITH ZERO MEAN AND UNITY VARIANCE\nY \u003c- df$bad\nX \u003c- scale(df_woe$df[, -1])\nReduce(rbind, Map(function(c) data.frame(var = colnames(X)[c], mean = mean(X[, c]), variance = var(X[, c])), seq(dim(X)[2])))\n#                 var          mean variance\n#1      woe.tot_derog  2.234331e-16        1\n#2         woe.tot_tr -2.439238e-15        1\n#3  woe.age_oldest_tr -2.502177e-15        1\n#4    woe.tot_open_tr -2.088444e-16        1\n#5     woe.tot_rev_tr -4.930136e-15        1\n#6   woe.tot_rev_debt -2.174607e-16        1\n#7   woe.tot_rev_line -8.589630e-16        1\n#8       woe.rev_util -8.649849e-15        1\n#9   woe.bureau_score  1.439904e-15        1\n#10           woe.ltv  3.723332e-15        1\n#11    woe.tot_income  5.559240e-16        1\n\n# INITIATE A GRNN OBJECT \nnet1 \u003c- grnn.fit(x = X, y = Y)\n# CROSS-VALIDATION TO CHOOSE THE OPTIONAL SMOOTH PARAMETER\nS \u003c- gen_sobol(min = 0.5, max = 1.5, n = 10, seed = 2019)\ncv \u003c- grnn.search_auc(net = net1, sigmas = S, nfolds = 5)\n# $test\n#       sigma       auc\n#1  1.4066449 0.7543912\n#2  0.6205723 0.7303415\n#3  1.0710133 0.7553075\n#4  0.6764866 0.7378430\n#5  1.1322939 0.7553664\n#6  0.8402438 0.7507192\n#7  1.3590402 0.7546164\n#8  1.3031974 0.7548670\n#9  0.7555905 0.7455457\n#10 1.2174429 0.7552097\n# $best\n#     sigma       auc\n#5 1.132294 0.7553664\n\n# REFIT A GRNN WITH THE OPTIMAL PARAMETER VALUE\nnet2 \u003c- grnn.fit(x = X, y = Y, sigma = cv$best$sigma)\nnet2.pred \u003c- grnn.parpred(net2, X)\n\n# BENCHMARK MODEL PERFORMANCE\nMLmetrics::KS_Stat(y_pred = net2.pred, y_true = df$bad)\n# 44.00242\nMLmetrics::AUC(y_pred = net2.pred, y_true = df$bad)\n# 0.7895033\n\n# LOGISTIC REGRESSION PERFORMANCE\nMLmetrics::KS_Stat(y_pred = fitted(mdl2), y_true = df$bad)\n# 42.61731\nMLmetrics::AUC(y_pred = fitted(mdl2), y_true = df$bad)\n# 0.7751298\n\n```\n\nThe function [grnn.margin()](https://github.com/statcompute/GRnnet/blob/master/code/grnn.margin.R) can also be employed to explore the marginal effect of each attribute in a GRNN. \n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"800\" height=\"400\" src=\"https://github.com/statcompute/GRnnet/blob/master/code/grnn_margin.PNG\"\u003e\n\u003c/p\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Fyager","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatcompute%2Fyager","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Fyager/lists"}