{"id":34253735,"url":"https://github.com/statcompute/loss_mob","last_synced_at":"2026-03-13T01:31:12.546Z","repository":{"id":184868586,"uuid":"672366083","full_name":"statcompute/loss_mob","owner":"statcompute","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-03T17:31:06.000Z","size":1416,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-19T22:56:45.112Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statcompute.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-07-29T20:37:27.000Z","updated_at":"2023-07-29T20:37:27.000Z","dependencies_parsed_at":"2024-03-03T18:32:07.604Z","dependency_job_id":"5efe0dbe-18b6-4219-8c4c-e9227ec7f812","html_url":"https://github.com/statcompute/loss_mob","commit_stats":null,"previous_names":["statcompute/loss_mob"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/statcompute/loss_mob","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Floss_mob","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Floss_mob/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Floss_mob/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Floss_mob/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statcompute","download_url":"https://codeload.github.com/statcompute/loss_mob/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statcompute%2Floss_mob/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30453545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T21:31:01.033Z","status":"ssl_error","status_checked_at":"2026-03-12T21:30:43.161Z","response_time":114,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-16T11:54:01.259Z","updated_at":"2026-03-13T01:31:12.507Z","avatar_url":"https://github.com/statcompute.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"#### Introduction\n\nTo mimic the py\\_mob package (https://pypi.org/project/py-mob) for binary outcomes, the loss\\_mob is a collection of python functions that would generate the monotonic binning and perform the variable transformation for loss or severity such that the Spearman correlation between the transformed $X$, i.e. $F(X_i)$, and $E(Y_i | X_i)$ is equal to 1. In case of loss models with $Ln()$ link function, the transformation is derived as $F(x)_i = Ln \\frac{\\sum_i Y / \\sum_i Exposure}{\\sum Y / \\sum Exposure}$ in the training sample, where $Exposure$ is the number of cases and $i$ refers to the $ith$ bin groupped by $x$ values.  \n\nShould you have any question or suggestion about the package, please feel free to drop me a line. \n\n#### Core Functions\n\n```\nloss_mob\n  |-- qtl_bin()  : Iterative discretization based on quantiles of X.  \n  |-- los_bin()  : Revised iterative discretization for records with Y \u003e 0.\n  |-- iso_bin()  : Discretization driven by the isotonic regression. \n  |-- val_bin()  : Revised iterative discretization based on unique values of X.  \n  |-- rng_bin()  : Revised iterative discretization based on the equal-width range of X.  \n  |-- kmn_bin()  : Iterative discretization based on the k-means clustering of X.  \n  |-- gbm_bin()  : Discretization based on the gradient boosting machine (GBM).  \n  |-- cus_bin()  : Customized discretization based on pre-determined cut points.  \n  |-- view_bin() : Displays the binning outcome in a tabular form. \n  |-- cal_newx() : Applies the variable transformation to a numeric vector based on the binning outcome.\n  |-- chk_newx() : Verifies the transformation generated from the cal_newx() function.\n  |-- mi_score() : Calculates the Mutual Information (MI) score between X and Y.\n  |-- screen()   : Calculates Spearman and Distance Correlations between X and Y.\n  |-- bin_gini() : Calculates the gini-coefficient between X and Y based on the binning outcome.\n  |-- num_gini() : Calculates the gini-coefficient between raw values of X and Y.\n  |-- smape()    : Calculates the sMAPE value between Y and Yhat.\n  `-- get_mtpl() : Extracts French Motor Third-Part Liability Claims dataset from OpenML.\n```\n\n#### Example\n\n```python\nimport loss_mob as mob\n\n# LOAD THE DATASET\ndata = mob.get_mtpl()\n\ndata.keys()\n# dict_keys(['idpol', 'claimnb', 'exposure', 'area', 'vehpower', 'vehage', 'drivage', \n# 'bonusmalus', 'vehbrand', 'vehgas', 'density', 'region', 'claimamount', 'purepremium'])\n\nvar = ['vehpower', 'vehage', 'drivage', 'bonusmalus', 'density']\n\n# SCREEN EACH VARIABLE OF INTEREST\nrst = [{\"variable\": _, **mob.screen(data[_], data[\"purepremium\"])} for _ in var]\n\n# RANK VARIABLES BY DISTANCE CORRELATION\nfor _ in sorted(rst, key = lambda x: -abs(x[\"distance correlation\"])):\n  print(_)\n\n# {'variable': 'bonusmalus', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 115, 'coefficient of variation': 0.26165082, 'spearman correlation': 0.05716908, 'distance correlation': 0.0434537}\n# {'variable': 'drivage', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 83, 'coefficient of variation': 0.31071883, 'spearman correlation': -0.004906, 'distance correlation': 0.01428907}\n# {'variable': 'density', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 1607, 'coefficient of variation': 2.20854394, 'spearman correlation': 0.02022122, 'distance correlation': 0.01106909}\n# {'variable': 'vehage', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 78, 'coefficient of variation': 0.80437458, 'spearman correlation': 0.01952645, 'distance correlation': 0.01080137}\n# {'variable': 'vehpower', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 12, 'coefficient of variation': 0.31774149, 'spearman correlation': 0.00230745, 'distance correlation': 0.00356986}\n\n# GENERATE BINNING BASED ON GBM FOR EACH VARIABLE\nbout = dict((v, mob.gbm_bin(data[v], data[\"purepremium\"])) for v in var)\nmob.view_bin(bout[\"vehage\"])\n\n# |  bin  |   freq |   miss |           ysum |     yavg |        newx |         rule              |\n# |-------|--------|--------|----------------|----------|-------------|---------------------------|\n# |   1   | 356354 |      0 | 114686591.4672 | 321.8333 | -0.17468183 | $X$ \u003c= 6                  |\n# |   2   | 194371 |      0 |  69559830.5303 | 357.8714 | -0.06854178 | $X$ \u003e 6 and $X$ \u003c= 12     |\n# |   3   | 127288 |      0 |  75609359.3214 | 594.0023 |  0.43816751 | $X$ \u003e 12                  |\n\n# VARIABLE TRANSFORMATION\ndout = mob.cal_newx(data['vehage'], bout[\"vehage\"])\nmob.head(dout)\n\n# {'x': 1, 'bin': 1, 'newx': -0.17468183}\n# {'x': 5, 'bin': 1, 'newx': -0.17468183}\n# {'x': 0, 'bin': 1, 'newx': -0.17468183}\n\n# VALIDATE THE TRANSFORMATION\nmob.chk_newx(dout)\n\n# |  bin  |        newx |   freq |    dist    |         xrng              |\n# |-------|-------------|--------|------------|---------------------------|\n# |   1   | -0.17468183 | 356354 |   52.5586% |                  0 \u003c==\u003e 6 |\n# |   2   | -0.06854178 | 194371 |   28.6677% |                 7 \u003c==\u003e 12 |\n# |   3   |  0.43816751 | 127288 |   18.7737% |               13 \u003c==\u003e 100 |\n```\n\n####  Authors\n\n[WenSui Liu](mailto:liuwensui@gmail.com) is a seasoned data scientist with 15-year experience in the financial service industry. \n\n[Joyce Liu](mailto:joyce.jl.liu@gmail.com) is a college student majoring in Mathematics with a strong passion for data science.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Floss_mob","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatcompute%2Floss_mob","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatcompute%2Floss_mob/lists"}