https://github.com/feedzai/fairgbm

Train Gradient Boosting models that are both high-performance *and* Fair!
https://github.com/feedzai/fairgbm
fairness fairness-ml gbm gradient-boosting lightgbm tabular-data
Last synced: 6 months ago
JSON representation
Train Gradient Boosting models that are both high-performance *and* Fair!
Host: GitHub
URL: https://github.com/feedzai/fairgbm
Owner: feedzai
License: other
Created: 2022-03-28T23:29:04.000Z (over 3 years ago)
Default Branch: main-fairgbm
Last Pushed: 2024-06-21T09:30:11.000Z (over 1 year ago)
Last Synced: 2025-03-29T19:03:45.330Z (7 months ago)
Topics: fairness, fairness-ml, gbm, gradient-boosting, lightgbm, tabular-data
Language: C++
Homepage: https://arxiv.org/abs/2209.07850
Size: 43 MB
Stars: 103
Watchers: 12
Forks: 5
Open Issues: 8
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # FairGBM

![PyPI version](https://badgen.net/pypi/v/fairgbm)

![OSI license](https://badgen.net/pypi/license/fairgbm)

[![Downloads](https://static.pepy.tech/badge/fairgbm)](https://pepy.tech/project/fairgbm)

> **Note**

> FairGBM has been accepted at **ICLR 2023**. Link to paper [here](https://arxiv.org/pdf/2209.07850.pdf).

> 

FairGBM is an easy-to-use and lightweight fairness-aware ML algorithm with state-of-the-art performance on tabular datasets.

FairGBM builds upon the popular [LightGBM](https://github.com/microsoft/LightGBM) algorithm and adds customizable 

constraints for group-wise fairness (_e.g._, equal opportunity, predictive equality, equalized odds) and other global goals (_e.g._, 

specific Recall or FPR prediction targets).

Table of contents:

- [FairGBM](#fairgbm)

  - [Install](#install)

    - [Docker image](#docker-image)

  - [Getting started](#getting-started)

    - [Parameter list](#parameter-list)

    - [_fit(X, Y, constraint\_group=S)_](#fitx-y-constraint_groups)

  - [Features](#features)

    - [Fairness constraints](#fairness-constraints)

    - [Global constraints](#global-constraints)

  - [Technical Details](#technical-details)

  - [Contact](#contact)

  - [How to cite FairGBM](#how-to-cite-fairgbm)

## Install

FairGBM can be installed from [PyPI](https://pypi.org/project/fairgbm/):

```pip install fairgbm```

Or directly from GitHub:

```

git clone --recurse-submodules https://github.com/feedzai/fairgbm.git

pip install fairgbm/python-package/

```

> **Note**

> Compatibility is only maintained with **Linux OS**.

> 

> If you don't have access to a Linux machine we advise using the free Google 

> Colab service ([example Colab notebook here](https://colab.research.google.com/github/AndreFCruz/fairgbm-fork/blob/add-colab-example/examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_%5Bgoogle_colab%5D.ipynb)).

>

> We also provide a docker image that can be useful for non-linux platforms, run: ```docker run -p 8888:8888 ndrcrz/fairgbm-miniconda``` for a jupyter notebook environment with `fairgbm` installed.

> **Note**

> Follow [this link](https://github.com/microsoft/LightGBM/tree/master/python-package) 

> for more details on the Python package installation instructions.

### Docker image

We provide a Docker image with python and miniconda installed, ready to run the example

fairgbm jupyter notebooks.

You can get a jupyter notebook with fairgbm up and running on your local machine with:

```

docker run -p 8888:8888 ndrcrz/fairgbm-miniconda

```

Although it is recommended to use the python package directly on your local x86-64 (non-arm) linux machine,

using this docker image is an option for users on other platforms (docker image was tested on an M1 Mac).

The Dockerfile is available [here](examples/FairGBM-python-notebooks/Dockerfile).

## Getting started

> **Recommended** Python notebook example [here](examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_[google_colab].ipynb) (Google Colab link [here](https://colab.research.google.com/github/AndreFCruz/fairgbm-fork/blob/add-colab-example/examples/FairGBM-python-notebooks/FairGBM_example_for_equalized_odds_%5Bgoogle_colab%5D.ipynb)).

You can get FairGBM up and running in just a few lines of Python code:

```python

from fairgbm import FairGBMClassifier

# Instantiate

fairgbm_clf = FairGBMClassifier(

    constraint_type="FNR",    # constraint on equal group-wise TPR (equal opportunity)

    n_estimators=200,         # core parameters from vanilla LightGBM

    random_state=42,          # ...

)

# Train using features (X), labels (Y), and sensitive attributes (S)

fairgbm_clf.fit(X, Y, constraint_group=S)

# NOTE: labels (Y) and sensitive attributes (S) must be in numeric format

# Predict

Y_test_pred = fairgbm_clf.predict_proba(X_test)[:, -1]  # Compute continuous class probabilities (recommended)

# Y_test_pred = fairgbm_clf.predict(X_test)             # Or compute discrete class predictions

```

**For Python examples see the [_notebooks folder_](/examples/FairGBM-python-notebooks).**

A more in-depth explanation and other usage examples (with python package or compiled binary) can be found in the [**_examples folder_**](/examples).

> **Note** 

> FairGBM is a research project, so its default hyperparameters (key-word arguments) 

> will expectedly not be as robust as the default hyperparameters in `sklearn` or 

> `lightgbm` classifiers. 

> We earnestly **recommend running hyperparameter-tuning to tune the 

> `multiplier_learning_rate` hyperparameter** as well as the remaining GBM 

> hyperparameters (example [here](examples/FairGBM-python-notebooks/UCI-Adult-example-with-hyperparameter-tuning.ipynb)).

### Parameter list

The following parameters can be used as key-word arguments for the `FairGBMClassifier` Python class.

| _Name_ | _Description_ | _Default_ |

|:------:|---------------|:---------:|

| `constraint_type` | The type of fairness (group-wise equality) constraint to use (if any). | `FPR,FNR` |

| `global_constraint_type` | The type of global equality constraint to use (if any). | _None_ |

| `multiplier_learning_rate` | The learning rate for the gradient ascent step (w.r.t. Lagrange multipliers). | `0.1` |

| `constraint_fpr_tolerance` | The slack when fulfilling _group-wise_ FPR constraints. | `0.01` |

| `constraint_fnr_tolerance` | The slack when fulfilling _group-wise_ FNR constraints. | `0.01` |

| `global_target_fpr` | Target rate for the _global_ FPR (inequality) constraint. | _None_ |

| `global_target_fnr` | Target rate for the _global_ FNR (inequality) constraint. | _None_ |

| `constraint_stepwise_proxy` | Differentiable proxy for the step-wise function in _group-wise_ constraints. | `cross_entropy` |

| `objective_stepwise_proxy` | Differentiable proxy for the step-wise function in _global_ constraints. | `cross_entropy` |

| `stepwise_proxy_margin` | Intercept value for the proxy function: value at `f(logodds=0.0)` | `1.0` |

| `score_threshold` | Score threshold used when assessing _group-wise_ FPR or FNR in training. | `0.5` |

| `global_score_threshold` | Score threshold used when assessing _global_ FPR or FNR in training. | `0.5` |

| `init_multipliers` | The initial value of the Lagrange multipliers. | `0` for each constraint |

| ... | _Any core [`LGBMClassifier` parameter](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm-lgbmclassifier) can be used with FairGBM as well._ |  |

Please consult [this list](https://lightgbm.readthedocs.io/en/latest/Parameters.html#core-parameters) for a detailed

view of all vanilla LightGBM parameters (_e.g._, `n_estimators`, `n_jobs`, ...).

> **Note** 

> The `objective` is the only core LightGBM parameter that cannot be changed when using FairGBM, as you must use

> the constrained loss function `objective="constrained_cross_entropy"`.

> Using a standard non-constrained objective will fallback to using standard LightGBM.

### _fit(X, Y, constraint_group=S)_

In addition to the usual `fit` arguments, features `X` and labels `Y`, FairGBM takes in the sensitive attributes `S`

column for training.

**Regarding the sensitive attributes column `S`:**

- It should be in numeric format, and have each different protected group take a different integer value, starting at `0`.

- It is not restricted to binary sensitive attributes: you can use _two or more_ different groups encoded in the same column;

- It is only required for training and **not** for computing predictions;

Here is an example pre-processing for the sensitive attributes on the UCI Adult dataset:

```python

# Given X, Y, S

X, Y, S = load_dataset()

# The sensitive attributes S must be in numeric format

S = np.array([1 if val == "Female" else 0 for val in S])

# The labels Y must be binary and in numeric format: {0, 1}

Y = np.array([1 if val == ">50K" else 0 for val in Y])

# And the features X may be numeric or categorical, but make sure categorical columns are in the correct format

X: Union[pd.DataFrame, np.ndarray]      # any array-like can be used

# Train FairGBM

fairgbm_clf.fit(X, Y, constraint_group=S)

```

## Features

FairGBM enables you to train a GBM model to **minimize a loss function** (_e.g._, cross-entropy) **subject to fairness

constraints** (_e.g._, equal opportunity).

Namely, you can target equality of performance metrics (FPR, FNR, or both) across instances from _two or more_ different

protected groups (see [fairness constraints](#fairness-constraints) section).

Optionally, you can simultaneously add global constraints on specific metrics (see [global constraints](#global-constraints) section).

### Fairness constraints

You can use FairGBM to equalize the following metrics across _two or more_ protected groups:

- Equalize FNR (equivalent to equalizing TPR or Recall)

    - also known as _equal opportunity_ [(Hardt _et al._, 2016)](https://arxiv.org/abs/1610.02413)

- Equalize FPR (equivalent to equalizing TNR or Specificity)

    - also known as _predictive equality_ [(Corbett-Davies _et al._, 2017)](https://arxiv.org/abs/1701.08230)

- Equalize both FNR and FPR simultaneously

    - also known as _equalized odds_ [(Hardt _et al._, 2016)](https://arxiv.org/abs/1610.02413)

> **Example for _equality of opportunity_** in college admissions:

> your likelihood of getting admitted to a certain college (predicted positive) given that you're a qualified candidate

> (label positive) should be the same regardless of your race (sensitive attribute).

### Global constraints

You can also target specific FNR or FPR goals.

For example, in cases where high accuracy is trivially achieved (_e.g._, problems with high class imbalance),

you may want to maximize TPR with a constraint on FPR (_e.g._, "maximize TPR with at most 5% FPR").

You can set a constraint on global FPR ≤ 0.05 by using `global_target_fpr=0.05` and 

`global_constraint_type="FPR"`.

You can simultaneously set constraints on group-wise metrics (fairness constraints) and constraints on global metrics.

## Technical Details

FairGBM is a framework that enables _constrained optimization_ of Gradient Boosting Machines (GBMs).

This way, we can train a GBM model to minimize some loss function (usually the _binary cross-entropy_) subject to a set

of constraints that should be met in the training dataset (_e.g._, equality of opportunity).

FairGBM applies the [method of Lagrange multipliers](https://en.wikipedia.org/wiki/Lagrange_multiplier), and uses 

iterative and interleaved steps of gradient descent (on the function space, by adding new trees to the GBM model) and 

gradient ascent (on the space of Lagrange multipliers, **Λ**).

The main obstacle with enforcing fairness constraints in training is that these constraints are often 

_non-differentiable_. To side-step this issue, we use a differentiable proxy of the step-wise function.

The following plot shows an example of _hinge-based_ and _cross-entropy-based_ proxies for the _false positive_ value

of a _label negative_ instance.



    



For a more in-depth explanation of FairGBM please consult [the paper](https://arxiv.org/pdf/2209.07850.pdf).

[comment]: <> (### Important C++ source files **TODO**)

[comment]: <> (## Results)

[comment]: <> (%% TODO: results and run-time comparisons against fairlearn, TFCO, and others)

## Contact

For commercial uses of FairGBM please contact .

## How to cite FairGBM

```

@inproceedings{cruz2023fairgbm,

  author = {Cruz, Andr{\'{e}} F. and Bel{\'{e}}m, Catarina and Jesus, S{\'{e}}rgio and Bravo, Jo{\~{a}}o and Saleiro, Pedro and Bizarro, Pedro},

  title={Fair{GBM}: Gradient Boosting with Fairness Constraints},

  booktitle={The Eleventh International Conference on Learning Representations},

  year={2023},

  url={https://openreview.net/forum?id=x-mXzBgCX3a}

}

```

The paper is publicly available at this [arXiv link](https://arxiv.org/abs/2209.07850).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/feedzai/fairgbm

Awesome Lists containing this project

README