{"id":22848362,"url":"https://github.com/feedzai/fairgbm","last_synced_at":"2025-04-05T21:07:42.437Z","repository":{"id":44927026,"uuid":"475211914","full_name":"feedzai/fairgbm","owner":"feedzai","description":"Train Gradient Boosting models that are both high-performance *and* Fair!","archived":false,"fork":false,"pushed_at":"2024-06-21T09:30:11.000Z","size":45069,"stargazers_count":103,"open_issues_count":8,"forks_count":5,"subscribers_count":12,"default_branch":"main-fairgbm","last_synced_at":"2025-03-29T19:03:45.330Z","etag":null,"topics":["fairness","fairness-ml","gbm","gradient-boosting","lightgbm","tabular-data"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2209.07850","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feedzai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-28T23:29:04.000Z","updated_at":"2025-02-10T11:13:48.000Z","dependencies_parsed_at":"2024-12-27T06:06:39.887Z","dependency_job_id":"5dd41949-5428-4615-a894-e7ef2c76811b","html_url":"https://github.com/feedzai/fairgbm","commit_stats":{"total_commits":2553,"total_committers":228,"mean_commits":"11.197368421052632","dds":0.6878182530356444,"last_synced_commit":"0bf83fdea192151c4013eb65e4e30cd41f58132d"},"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffairgbm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffairgbm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffairgbm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffairgbm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feedzai","download_url":"https://codeload.github.com/feedzai/fairgbm/tar.gz/refs/heads/main-fairgbm","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399877,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fairness","fairness-ml","gbm","gradient-boosting","lightgbm","tabular-data"],"created_at":"2024-12-13T04:11:31.790Z","updated_at":"2025-04-05T21:07:42.415Z","avatar_url":"https://github.com/feedzai.png","language":"C++","readme":"# FairGBM\n\n![PyPI version](https://badgen.net/pypi/v/fairgbm)\n![OSI license](https://badgen.net/pypi/license/fairgbm)\n[![Downloads](https://static.pepy.tech/badge/fairgbm)](https://pepy.tech/project/fairgbm)\n\u003c!-- ![Python compatibility](https://badgen.net/pypi/python/fairgbm) --\u003e\n\n\u003e **Note**\n\u003e FairGBM has been accepted at **ICLR 2023**. Link to paper [here](https://arxiv.org/pdf/2209.07850.pdf).\n\u003e \n\nFairGBM is an easy-to-use and lightweight fairness-aware ML algorithm with state-of-the-art performance on tabular datasets.\n\nFairGBM builds upon the popular [LightGBM](https://github.com/microsoft/LightGBM) algorithm and adds customizable \nconstraints for group-wise fairness (_e.g._, equal opportunity, predictive equality, equalized odds) and other global goals (_e.g._, \nspecific Recall or FPR prediction targets).\n\n\nTable of contents:\n\n- [FairGBM](#fairgbm)\n  - [Install](#install)\n    - [Docker image](#docker-image)\n  - [Getting started](#getting-started)\n    - [Parameter list](#parameter-list)\n    - [_fit(X, Y, constraint\\_group=S)_](#fitx-y-constraint_groups)\n  - [Features](#features)\n    - [Fairness constraints](#fairness-constraints)\n    - [Global constraints](#global-constraints)\n  - [Technical Details](#technical-details)\n  - [Contact](#contact)\n  - [How to cite FairGBM](#how-to-cite-fairgbm)\n\n\n## Install\n\nFairGBM can be installed from [PyPI](https://pypi.org/project/fairgbm/):\n\n```pip install fairgbm```\n\nOr directly from GitHub:\n\n```\ngit clone --recurse-submodules https://github.com/feedzai/fairgbm.git\npip install fairgbm/python-package/\n```\n\n\u003e **Note**\n\u003e Compatibility is only maintained with **Linux OS**.\n\u003e \n\u003e If you don't have access to a Linux machine we advise using the free Google \n\u003e Colab service ([example Colab notebook here](https://colab.research.google.com/github/AndreFCruz/fairgbm-fork/blob/add-colab-example/examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_%5Bgoogle_colab%5D.ipynb)).\n\u003e\n\u003e We also provide a docker image that can be useful for non-linux platforms, run: ```docker run -p 8888:8888 ndrcrz/fairgbm-miniconda``` for a jupyter notebook environment with `fairgbm` installed.\n\n\n\u003e **Note**\n\u003e Follow [this link](https://github.com/microsoft/LightGBM/tree/master/python-package) \n\u003e for more details on the Python package installation instructions.\n\n\u003c!-- \u003e Install requires [CMake](https://cmake.org) and an up-to-date C++ compiler (gcc, clang, or mingw). --\u003e\n\n### Docker image\n\nWe provide a Docker image with python and miniconda installed, ready to run the example\nfairgbm jupyter notebooks.\n\nYou can get a jupyter notebook with fairgbm up and running on your local machine with:\n```\ndocker run -p 8888:8888 ndrcrz/fairgbm-miniconda\n```\n\nAlthough it is recommended to use the python package directly on your local x86-64 (non-arm) linux machine,\nusing this docker image is an option for users on other platforms (docker image was tested on an M1 Mac).\n\nThe Dockerfile is available [here](examples/FairGBM-python-notebooks/Dockerfile).\n\n\n## Getting started\n\n\u003e **Recommended** Python notebook example [here](examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_[google_colab].ipynb) (Google Colab link [here](https://colab.research.google.com/github/AndreFCruz/fairgbm-fork/blob/add-colab-example/examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_%5Bgoogle_colab%5D.ipynb)).\n\nYou can get FairGBM up and running in just a few lines of Python code:\n\n```python\nfrom fairgbm import FairGBMClassifier\n\n# Instantiate\nfairgbm_clf = FairGBMClassifier(\n    constraint_type=\"FNR\",    # constraint on equal group-wise TPR (equal opportunity)\n    n_estimators=200,         # core parameters from vanilla LightGBM\n    random_state=42,          # ...\n)\n\n# Train using features (X), labels (Y), and sensitive attributes (S)\nfairgbm_clf.fit(X, Y, constraint_group=S)\n# NOTE: labels (Y) and sensitive attributes (S) must be in numeric format\n\n# Predict\nY_test_pred = fairgbm_clf.predict_proba(X_test)[:, -1]  # Compute continuous class probabilities (recommended)\n# Y_test_pred = fairgbm_clf.predict(X_test)             # Or compute discrete class predictions\n```\n\n**For Python examples see the [_notebooks folder_](/examples/FairGBM-python-notebooks).**\n\nA more in-depth explanation and other usage examples (with python package or compiled binary) can be found in the [**_examples folder_**](/examples).\n\n\u003e **Note** \n\u003e FairGBM is a research project, so its default hyperparameters (key-word arguments) \n\u003e will expectedly not be as robust as the default hyperparameters in `sklearn` or \n\u003e `lightgbm` classifiers. \n\u003e We earnestly **recommend running hyperparameter-tuning to tune the \n\u003e `multiplier_learning_rate` hyperparameter** as well as the remaining GBM \n\u003e hyperparameters (example [here](examples/FairGBM-python-notebooks/UCI-Adult-example-with-hyperparameter-tuning.ipynb)).\n\n\n### Parameter list\n\nThe following parameters can be used as key-word arguments for the `FairGBMClassifier` Python class.\n\n| _Name_ | _Description_ | _Default_ |\n|:------:|---------------|:---------:|\n| `constraint_type` | The type of fairness (group-wise equality) constraint to use (if any). | `FPR,FNR` |\n| `global_constraint_type` | The type of global equality constraint to use (if any). | _None_ |\n| `multiplier_learning_rate` | The learning rate for the gradient ascent step (w.r.t. Lagrange multipliers). | `0.1` |\n| `constraint_fpr_tolerance` | The slack when fulfilling _group-wise_ FPR constraints. | `0.01` |\n| `constraint_fnr_tolerance` | The slack when fulfilling _group-wise_ FNR constraints. | `0.01` |\n| `global_target_fpr` | Target rate for the _global_ FPR (inequality) constraint. | _None_ |\n| `global_target_fnr` | Target rate for the _global_ FNR (inequality) constraint. | _None_ |\n| `constraint_stepwise_proxy` | Differentiable proxy for the step-wise function in _group-wise_ constraints. | `cross_entropy` |\n| `objective_stepwise_proxy` | Differentiable proxy for the step-wise function in _global_ constraints. | `cross_entropy` |\n| `stepwise_proxy_margin` | Intercept value for the proxy function: value at `f(logodds=0.0)` | `1.0` |\n| `score_threshold` | Score threshold used when assessing _group-wise_ FPR or FNR in training. | `0.5` |\n| `global_score_threshold` | Score threshold used when assessing _global_ FPR or FNR in training. | `0.5` |\n| `init_multipliers` | The initial value of the Lagrange multipliers. | `0` for each constraint |\n| ... | _Any core [`LGBMClassifier` parameter](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm-lgbmclassifier) can be used with FairGBM as well._ |  |\n\nPlease consult [this list](https://lightgbm.readthedocs.io/en/latest/Parameters.html#core-parameters) for a detailed\nview of all vanilla LightGBM parameters (_e.g._, `n_estimators`, `n_jobs`, ...).\n\n\u003e **Note** \n\u003e The `objective` is the only core LightGBM parameter that cannot be changed when using FairGBM, as you must use\n\u003e the constrained loss function `objective=\"constrained_cross_entropy\"`.\n\u003e Using a standard non-constrained objective will fallback to using standard LightGBM.\n\n\n### _fit(X, Y, constraint_group=S)_\n\nIn addition to the usual `fit` arguments, features `X` and labels `Y`, FairGBM takes in the sensitive attributes `S`\ncolumn for training.\n\n**Regarding the sensitive attributes column `S`:**\n- It should be in numeric format, and have each different protected group take a different integer value, starting at `0`.\n- It is not restricted to binary sensitive attributes: you can use _two or more_ different groups encoded in the same column;\n- It is only required for training and **not** for computing predictions;\n\nHere is an example pre-processing for the sensitive attributes on the UCI Adult dataset:\n```python\n# Given X, Y, S\nX, Y, S = load_dataset()\n\n# The sensitive attributes S must be in numeric format\nS = np.array([1 if val == \"Female\" else 0 for val in S])\n\n# The labels Y must be binary and in numeric format: {0, 1}\nY = np.array([1 if val == \"\u003e50K\" else 0 for val in Y])\n\n# And the features X may be numeric or categorical, but make sure categorical columns are in the correct format\nX: Union[pd.DataFrame, np.ndarray]      # any array-like can be used\n\n# Train FairGBM\nfairgbm_clf.fit(X, Y, constraint_group=S)\n```\n\n\n## Features\n\nFairGBM enables you to train a GBM model to **minimize a loss function** (_e.g._, cross-entropy) **subject to fairness\nconstraints** (_e.g._, equal opportunity).\n\nNamely, you can target equality of performance metrics (FPR, FNR, or both) across instances from _two or more_ different\nprotected groups (see [fairness constraints](#fairness-constraints) section).\nOptionally, you can simultaneously add global constraints on specific metrics (see [global constraints](#global-constraints) section).\n\n### Fairness constraints\n\nYou can use FairGBM to equalize the following metrics across _two or more_ protected groups:\n- Equalize FNR (equivalent to equalizing TPR or Recall)\n    - also known as _equal opportunity_ [(Hardt _et al._, 2016)](https://arxiv.org/abs/1610.02413)\n- Equalize FPR (equivalent to equalizing TNR or Specificity)\n    - also known as _predictive equality_ [(Corbett-Davies _et al._, 2017)](https://arxiv.org/abs/1701.08230)\n- Equalize both FNR and FPR simultaneously\n    - also known as _equalized odds_ [(Hardt _et al._, 2016)](https://arxiv.org/abs/1610.02413)\n\n\u003e **Example for _equality of opportunity_** in college admissions:\n\u003e your likelihood of getting admitted to a certain college (predicted positive) given that you're a qualified candidate\n\u003e (label positive) should be the same regardless of your race (sensitive attribute).\n\n\u003c!--\nTake the following hypothetical example:\n\nIf you're training an algorithm to predict mortgage defaults, a valuable fairness criterion may be equalizing FPR \namong people from different ethnicities.\nThis ensures that for two people that will successfully repay their loans, their likelihood of being wrongly denied\naccess to credit is the same regardless of ethnicity.\nThis is known as a _punitive_ setting, as a positive prediction (predicted to default) leads to a negative outcome\n(loan application denied).\n\nConversely, if you're training an ML model in an _assistive_ setting (_i.e._, a positive prediction leads to a \npositive outcome for the person), you may want to target equalizing FNR.\n--\u003e\n\n### Global constraints\n\nYou can also target specific FNR or FPR goals.\nFor example, in cases where high accuracy is trivially achieved (_e.g._, problems with high class imbalance),\nyou may want to maximize TPR with a constraint on FPR (_e.g._, \"maximize TPR with at most 5% FPR\").\nYou can set a constraint on global FPR ≤ 0.05 by using `global_target_fpr=0.05` and \n`global_constraint_type=\"FPR\"`.\n\nYou can simultaneously set constraints on group-wise metrics (fairness constraints) and constraints on global metrics.\n\u003c!-- TODO! [This notebook](/examples/FairGBM-python-notebooks) shows an example on a highly class imbalanced dataset that makes use of both group-level and global constraints. --\u003e\n\n\n## Technical Details\n\nFairGBM is a framework that enables _constrained optimization_ of Gradient Boosting Machines (GBMs).\nThis way, we can train a GBM model to minimize some loss function (usually the _binary cross-entropy_) subject to a set\nof constraints that should be met in the training dataset (_e.g._, equality of opportunity).\n\nFairGBM applies the [method of Lagrange multipliers](https://en.wikipedia.org/wiki/Lagrange_multiplier), and uses \niterative and interleaved steps of gradient descent (on the function space, by adding new trees to the GBM model) and \ngradient ascent (on the space of Lagrange multipliers, **Λ**).\n\nThe main obstacle with enforcing fairness constraints in training is that these constraints are often \n_non-differentiable_. To side-step this issue, we use a differentiable proxy of the step-wise function.\nThe following plot shows an example of _hinge-based_ and _cross-entropy-based_ proxies for the _false positive_ value\nof a _label negative_ instance.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://user-images.githubusercontent.com/13498941/189664020-70ebbae4-6b93-4f38-af7d-f870381a8a22.png\" width=\"40%\" alt=\"example of proxy FPR function\" /\u003e\n\u003c/p\u003e\n\nFor a more in-depth explanation of FairGBM please consult [the paper](https://arxiv.org/pdf/2209.07850.pdf).\n\n[comment]: \u003c\u003e (### Important C++ source files **TODO**)\n\n\n[comment]: \u003c\u003e (## Results)\n[comment]: \u003c\u003e (%% TODO: results and run-time comparisons against fairlearn, TFCO, and others)\n\n\n## Contact\n\nFor commercial uses of FairGBM please contact \u003coss-licenses@feedzai.com\u003e.\n\n\n## How to cite FairGBM\n\n```\n@inproceedings{cruz2023fairgbm,\n  author = {Cruz, Andr{\\'{e}} F. and Bel{\\'{e}}m, Catarina and Jesus, S{\\'{e}}rgio and Bravo, Jo{\\~{a}}o and Saleiro, Pedro and Bizarro, Pedro},\n  title={Fair{GBM}: Gradient Boosting with Fairness Constraints},\n  booktitle={The Eleventh International Conference on Learning Representations},\n  year={2023},\n  url={https://openreview.net/forum?id=x-mXzBgCX3a}\n}\n```\n\nThe paper is publicly available at this [arXiv link](https://arxiv.org/abs/2209.07850).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Ffairgbm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeedzai%2Ffairgbm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Ffairgbm/lists"}