{"id":18520235,"url":"https://github.com/mlr-org/mcboost","last_synced_at":"2026-03-15T21:03:27.514Z","repository":{"id":44936152,"uuid":"325074668","full_name":"mlr-org/mcboost","owner":"mlr-org","description":"Multi-Calibration \u0026 Multi-Accuracy Boosting for R","archived":false,"fork":false,"pushed_at":"2024-04-09T08:59:07.000Z","size":867,"stargazers_count":28,"open_issues_count":6,"forks_count":4,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-04-15T12:26:30.727Z","etag":null,"topics":["bias-correction","bias-detection","classification","ethics","fairness","fairness-ai","fairness-ml","machine-learning","post-processing","responsible-ai"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlr-org.png","metadata":{"funding":{"github":"mlr-org"},"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json"}},"created_at":"2020-12-28T17:30:34.000Z","updated_at":"2024-04-15T09:21:52.000Z","dependencies_parsed_at":"2024-01-16T08:12:17.741Z","dependency_job_id":"8e12b93d-c082-4413-b8a2-b4ccf7cf5ff0","html_url":"https://github.com/mlr-org/mcboost","commit_stats":{"total_commits":293,"total_committers":9,"mean_commits":32.55555555555556,"dds":0.4812286689419796,"last_synced_commit":"8ac638b8cc15b1f08fde2bdd54d488fb01058811"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmcboost","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmcboost/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmcboost/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmcboost/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlr-org","download_url":"https://codeload.github.com/mlr-org/mcboost/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248012619,"owners_count":21033230,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bias-correction","bias-detection","classification","ethics","fairness","fairness-ai","fairness-ml","machine-learning","post-processing","responsible-ai"],"created_at":"2024-11-06T17:19:11.665Z","updated_at":"2026-03-15T21:03:27.508Z","avatar_url":"https://github.com/mlr-org.png","language":"R","funding_links":["https://github.com/sponsors/mlr-org"],"categories":[],"sub_categories":[],"readme":"# mcboost\n\n\u003c!-- badges: start --\u003e\n[![tic](https://github.com/mlr-org/mcboost/workflows/tic/badge.svg?branch=main)](https://github.com/mlr-org/mcboost/actions)\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![CRAN Status](https://www.r-pkg.org/badges/version-ago/mcboost)](https://cran.r-project.org/package=mcboost)\n[![DOI](https://joss.theoj.org/papers/10.21105/joss.03453/status.svg)](https://doi.org/10.21105/joss.03453)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)\n\u003c!-- badges: end --\u003e\n\n## What does it do?\n\n**mcboost** implements Multi-Calibration Boosting ([Hebert-Johnson et al., 2018](https://proceedings.mlr.press/v80/hebert-johnson18a.html); [Kim et al., 2019](https://arxiv.org/pdf/1805.12317)) for the multi-calibration of a machine learning model's prediction. Multi-Calibration works best in scenarios where the underlying data \u0026 labels are unbiased but a bias is introduced within the algorithm's fitting procedure. This is often the case, e.g. when an algorithm fits a majority population while ignoring or under-fitting minority populations.\n\nFor more information and example, see the package's [website](https://mlr-org.github.io/mcboost/).\n\nMore details with respect to usage and the procedures can be found in the package vignettes.\n\n## Installation\n\nThe current version can be downloaded from CRAN using:\n\n```r\ninstall.packages(\"mcboost\")\n```\n\nYou can install the development version of mcboost from **Github** with:\n\n```r\nremotes::install_github(\"mlr-org/mcboost\")\n```\n\n## Usage\n\nPost-processing with `mcboost` needs three components. We start with an initial prediction model (1) and an auditing algorithm (2) that may be customized by the user. The auditing algorithm then runs Multi-Calibration-Boosting on a labeled auditing dataset (3). The resulting model can be used for obtaining multi-calibrated predictions.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/mlr-org/mcboost/raw/main/paper/MCBoost.png\" /\u003e\n\u003c/p\u003e\n\n## Example\n\nIn this simple example, our goal is to improve calibration\nfor an `initial predictor`, e.g. a ML algorithm trained on\nan initial task.\nInternally, `mcboost` often makes use of `mlr3` and learners that come with `mlr3learners`.\n\n\n``` r\nlibrary(mcboost)\nlibrary(mlr3)\n```\n\nFirst we set up an example dataset.\n\n```r\n  #  Example Data: Sonar Task\n  tsk = tsk(\"sonar\")\n  tid = sample(tsk$row_ids, 100) # 100 rows for training\n  train_data = tsk$data(cols = tsk$feature_names, rows = tid)\n  train_labels = tsk$data(cols = tsk$target_names, rows = tid)[[1]]\n```\n\nTo provide an example, we assume that we have already a learner `l` which we train below.\nWe can now wrap this initial learner's predict function for use with `mcboost`, since `mcboost` expects the initial model to be specified as a `function` with `data` as input.\n\n```r\n  l = lrn(\"classif.rpart\")\n  l$train(tsk$clone()$filter(tid))\n\n  init_predictor = function(data) {\n    # Get response prediction from Learner\n    p = l$predict_newdata(data)$response\n    # One-hot encode and take first column\n    one_hot(p)\n  }\n```\n\nWe can now run Multi-Calibration Boosting by instantiating the object and calling the `multicalibrate` method.\nNote, that typically, we would use Multi-Calibration on a separate validation set!\nWe furthermore select the auditor model, a `SubpopAuditorFitter`,\nin our case a `Decision Tree`:\n\n```r\n  mc = MCBoost$new(\n    init_predictor = init_predictor,\n    auditor_fitter = \"TreeAuditorFitter\")\n  mc$multicalibrate(train_data, train_labels)\n```\n\nLastly, we predict on new data.\n\n```r\ntstid = setdiff(tsk$row_ids, tid) # held-out data\ntest_data = tsk$data(cols = tsk$feature_names, rows = tstid)\nmc$predict_probs(test_data)\n```\n\n### Multi-Calibration\n\nWhile `mcboost` in its defaults implements Multi-Accuracy ([Kim et al., 2019](http://arxiv.org/pdf/1805.12317)),\nit can also multi-calibrate predictors ([Hebert-Johnson et al., 2018](http://proceedings.mlr.press/v80/hebert-johnson18a.html)).\nIn order to achieve this, we have to set the following hyperparameters:\n\n```r\n  mc = MCBoost$new(\n    init_predictor = init_predictor,\n    auditor_fitter = \"TreeAuditorFitter\",\n    num_buckets = 10,\n    multiplicative = FALSE\n  )\n```\n\n## MCBoost as a PipeOp\n\n`mcboost` can also be used within a `mlr3pipeline` in order to use at the full end-to-end pipeline (in the form of a `GraphLearner`).\n\n```r\n  library(mlr3)\n  library(mlr3pipelines)\n  gr = ppl_mcboost(lrn(\"classif.rpart\"))\n  tsk = tsk(\"sonar\")\n  tid = sample(1:208, 108)\n  gr$train(tsk$clone()$filter(tid))\n  gr$predict(tsk$clone()$filter(setdiff(1:208, tid)))\n```\n\n\n\n## Further Examples\n\nThe `mcboost` vignettes [**Basics and Extensions**](https://mlr-org.github.io/mcboost/articles/mcboost_basics_extensions.html) and [**Health Survey Example**](https://mlr-org.github.io/mcboost/articles/mcboost_example.html) demonstrate a lot of interesting showcases for applying `mcboost`.\n\n\n## Contributing\n\nThis R package is licensed under the LGPL-3.\nIf you encounter problems using this software (lack of documentation, misleading or wrong documentation, unexpected behaviour, bugs, …) or just want to suggest features, please open an issue in the issue tracker.\nPull requests are welcome and will be included at the discretion of the maintainers.\n\nAs this project is developed with [mlr3's](https://github.com/mlr-org/mlr3/) style guide in mind, the following resources can be helpful\nto individuals wishing to contribute: Please consult the [wiki](https://github.com/mlr-org/mlr3/wiki/) for a [style guide](https://github.com/mlr-org/mlr3/wiki/Style-Guide), a [roxygen guide](https://github.com/mlr-org/mlr3/wiki/Roxygen-Guide) and a [pull request guide](https://github.com/mlr-org/mlr3/wiki/PR-Guidelines).\n\n### Code of Conduct\n\nPlease note that the mcboost project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n## Citing mcboost\n\nIf you use `mcboost`, please cite our package as well as the two papers it is based on:\n\n```\n  @article{pfisterer2021,\n    author = {Pfisterer, Florian and Kern, Christoph and Dandl, Susanne and Sun, Matthew and\n    Kim, Michael P. and Bischl, Bernd},\n    title = {mcboost: Multi-Calibration Boosting for R},\n    journal = {Journal of Open Source Software},\n    doi = {10.21105/joss.03453},\n    url = {https://doi.org/10.21105/joss.03453},\n    year = {2021},\n    publisher = {The Open Journal},\n    volume = {6},\n    number = {64},\n    pages = {3453}\n  }\n  # Multi-Calibration\n  @inproceedings{hebert-johnson2018,\n    title = {Multicalibration: Calibration for the ({C}omputationally-Identifiable) Masses},\n    author = {Hebert-Johnson, Ursula and Kim, Michael P. and Reingold, Omer and Rothblum, Guy},\n    booktitle = {Proceedings of the 35th International Conference on Machine Learning},\n    pages = {1939--1948},\n    year = {2018},\n    editor = {Jennifer Dy and Andreas Krause},\n    volume = {80},\n    series = {Proceedings of Machine Learning Research},\n    address = {Stockholmsmässan, Stockholm Sweden},\n    publisher = {PMLR}\n  }\n  # Multi-Accuracy\n  @inproceedings{kim2019,\n    author = {Kim, Michael P. and Ghorbani, Amirata and Zou, James},\n    title = {Multiaccuracy: Black-Box Post-Processing for Fairness in Classification},\n    year = {2019},\n    isbn = {9781450363242},\n    publisher = {Association for Computing Machinery},\n    address = {New York, NY, USA},\n    url = {https://doi.org/10.1145/3306618.3314287},\n    doi = {10.1145/3306618.3314287},\n    booktitle = {Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society},\n    pages = {247--254},\n    location = {Honolulu, HI, USA},\n    series = {AIES '19}\n  }\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlr-org%2Fmcboost","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlr-org%2Fmcboost","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlr-org%2Fmcboost/lists"}