{"id":32206684,"url":"https://github.com/statcompute/mob","last_synced_at":"2026-02-21T13:01:45.803Z","repository":{"id":56934743,"uuid":"159185708","full_name":"statcompute/mob","owner":"statcompute","description":"Monotonic Optimal Binning in Consumer Credit Risk Scorecard Development","archived":false,"fork":false,"pushed_at":"2020-11-07T16:57:22.000Z","size":2026,"stargazers_count":59,"open_issues_count":0,"forks_count":22,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-12-09T19:27:32.256Z","etag":null,"topics":["credit-risk","scorecard"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statcompute.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-26T14:51:50.000Z","updated_at":"2025-11-19T20:47:01.000Z","dependencies_parsed_at":"2022-08-21T06:50:48.559Z","dependency_job_id":null,"html_url":"https://github.com/statcompute/mob","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/statcompute/mob","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fmob","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fmob/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fmob/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fmob/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statcompute","download_url":"https://codeload.github.com/statcompute/mob/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Fmob/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29681468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T12:30:22.644Z","status":"ssl_error","status_checked_at":"2026-02-21T12:29:55.402Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["credit-risk","scorecard"],"created_at":"2025-10-22T05:34:07.029Z","updated_at":"2026-02-21T13:01:45.797Z","avatar_url":"https://github.com/statcompute.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"150\" height=\"100\" src=\"https://github.com/statcompute/MonotonicBinning/blob/master/data/mob.jpeg\"\u003e\n\u003c/p\u003e\n\n### \u003cp align=\"center\"\u003e Monotonic Optimal Binning (MOB) \u003c/p\u003e\n### \u003cp align=\"center\"\u003e  for Risk Scorecard Development \u003c/p\u003e\n\n#### Introduction\n\nThe **MOB** (Monotonic Optimal Binning) package is a collection of R functions that would generate the monotonic binning and perform the WoE (Weight of Evidence) transformation used in consumer credit scorecard developments. Being a piecewise constant transformation in the context of logistic regressions, the WoE has also been employed in other use cases, such as consumer credit loss estimation, prepayment, and even fraud detection models.  \n\nIn addition to monotonic binning and WoE transformation, Information Value and KS statistic of each independent variables are also calculated to evaluate the variable predictiveness. \n\n\n#### Why Use Weight of Evidence\n\nI had been asked why I spent so much effort on writing SAS macros and R functions to do monotonic binning for the WoE transformation, given the availability of other cutting-edge data mining or deep learning algorithms that will automatically generate the prediction with whatever predictors fed in the model. About 10 years ago when I worked in the decision science team of Chase, I was once told that even an idiot knows how to put X on the right-hand side of an equal sign. Nonetheless, what really distinguishes a good modeler from the rest is how to handle challenging data issues, including missing values, outliers, linearity, and predictability, in a scalable way that can be rolled out to hundreds or even thousands of potential model drivers in the production environment. \n\nThe WoE transformation through monotonic binning provides a convenient way to address each of aforementioned concerns. \n\n1. Because WoE is a piecewise transformation based on the data discretization, all missing values would fall into a standalone category either by itself or to be combined with the neighbor with a similar bad rate. As a result, the special treatment for missing values is not necessary. \n\n2. After the monotonic binning of each variable, since the WoE value for each bin is a projection from the predictor into the response that is defined by the log ratio between event and non-event distributions, any raw value of the predictor doesn’t matter anymore and therefore the issue related to outliers  would disappear. \n\n3. While many modelers would like to use log or power transformations to achieve a good linear relationship between the predictor and log odds of the response, which is heuristic at best with no guarantee for the good outcome, the WoE transformation is strictly linear with respect to log odds of the response with the unity correlation. It is also worth mentioning that a numeric variable and its strictly monotone functions should converge to the same monotonic WoE transformation. \n\n4. At last, because the WoE is defined as the log ratio between event and non-event distributions, it is indicative of the separation between cases with Y = 0 and cases with Y = 1. As the weighted sum of WoE values with the weight being the difference in event and non-event distributions, the IV (Information Value) is an important statistic commonly used to measure the predictor importance.\n\n\n#### Package Dependencies\n\n```text\nR (\u003e= 3.3.3), stats, gbm, Rborist\n```\n\n#### Installation\n\nDownload the [mob_0.3.tar.gz](https://github.com/statcompute/mob/blob/master/mob_0.3.tar.gz) file and then run:\n\n```r\ninstall.packages(\"mob_0.3.tar.gz\", repos = NULL, type = \"source\")\n```\n\nAlternatively, you can also install the package from CRAN directly by running:\n\n\n```r\ninstall.packages(\"mob\")\n```\n\n#### Functions\n```\nmob\n  |-- qtl_bin()   : The iterative discretization based on quantiles of X.\n  |-- bad_bin()   : The revised iterative discretization for records with Y = 1.\n  |-- iso_bin()   : The discretization algorthm driven by the isotonic regression between X and Y.\n  |-- rng_bin()   : The revised iterative discretization based on the equal-width range of X.\n  |-- kmn_bin()   : The discretization algorthm based on the kmean clustering of X.\n  |-- gbm_bin()   : The discretization algorthm based on the gradient boosting machine.\n  |-- arb_bin()   : The discretization algorthm based on the decision tree.\n  |-- cal_woe()   : Applies the WoE transformation to a numeric vector based on the binning outcome.\n  |-- batch_bin() : Discretizes vectors in a dataframe.\n  `-- batch_woe() : Applies WoE transformaton to vectors in the dataframe.\n```  \n  \n#### Example  \n\n```r\n# get data\ndata(hmeq, package = \"mob\")\n\n# discretize selected variables\nbin_result \u003c- mob::batch_bin(hmeq$BAD, hmeq[, c(\"DELINQ\", \"NINQ\", \"CLAGE\")])\n\n# binning summary\nbin_result$bin_sum\n#   var nbin freq bads miss     iv    ks\n#DELINQ    7 5960 1189  580 0.6815 30.93\n#  NINQ    7 5960 1189  510 0.1749 15.63\n# CLAGE   24 5960 1189  308 0.2697 21.43\n\n# binning outcome for DELINQ\nbin_result$bin_out$DELINQ\n#$cut\n#[1] 0 1 2 3 4\n#$tbl\n#  bin freq miss bads   rate     woe     iv    ks               rule\n#1   0  580  580   72 0.1241 -0.5644 0.0259  4.59         is.na($X$)\n#2   1 4179    0  583 0.1395 -0.4299 0.1132 30.93           $X$ \u003c= 0\n#3   2  654    0  222 0.3394  0.7237 0.0696 21.31 $X$ \u003e 0 \u0026 $X$ \u003c= 1\n#4   3  250    0  112 0.4480  1.1807 0.0771 14.79 $X$ \u003e 1 \u0026 $X$ \u003c= 2\n#5   4  129    0   71 0.5504  1.5917 0.0757 10.03 $X$ \u003e 2 \u0026 $X$ \u003c= 3\n#6   5   78    0   46 0.5897  1.7523 0.0560  6.83 $X$ \u003e 3 \u0026 $X$ \u003c= 4\n#7   6   90    0   83 0.9222  3.8624 0.2640  0.00            $X$ \u003e 4\n\n# apply WoE transformation\nwoe_out \u003c- mob::batch_woe(hmeq, bin_result$bin_out)\n\n# transformed variables\nhead(woe_out, 3)\n#   DELINQ    NINQ  CLAGE\n#1 -0.4299 -0.0626 0.3081\n#2  1.1807 -0.2954 0.3081\n#3 -0.4299 -0.0626 0.2780\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Fmob","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatcompute%2Fmob","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Fmob/lists"}