An open API service indexing awesome lists of open source software.

https://github.com/feedzai/fair-obnc

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.
https://github.com/feedzai/fair-obnc

Last synced: about 1 year ago
JSON representation

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.

Awesome Lists containing this project

README

          

# Fair-OBNC

## Description

This repository contains the code and instructions to reproduce the experiments and results presented in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.

## Table of Contents

- [Replicating the conducted experiments](#replicating-the-conducted-experiments)
- [Running your own experiments](#running-your-own-experiments)
- [Generating data](#generating-and-loading-data)
- [Generating config files](#generating-config-files)
- [Running experiments](#running-experiments)
- [Analyzing results](#analyzing-results)

## Replicating the conducted experiments

This section details how to replicate our experiments to obtain the results we present in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.

The first step is to install the Aequitas Flow package:

```bash
pip install git+https://github.com/dssg/aequitas.git
```

Then, one can download the necessary data by running:

```python
# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})
```

Finally, we include in this repository the configuration files we used in our experiments, so the only step left is to run the `fairobnc_experiment.py` script to run the experiments:

```python
# To run the experiments with the multiple injected noise scenarios
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection

# To run the experiments without noise injection
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment
```

## Running your own experiments

If you wish to test our method in addtional scenarios, our framework can be used to test more cases.

### Generating and loading data

The `generate_data` function loads the desired datasets from Aequitas, generates the IID versions of it and injects noise into the labels, storing the necessary files for using the `IIDDataset` and `NoisyDataset` classes.

```python
# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})

# To load an IID dataset
>>> from datasets import IIDDataset
>>> iid_dataset = IIDDataset("BankAccountFraud", "TypeII")
>>> iid_dataset.load_data()
>>> iid_dataset.create_splits()

# To load a noisy dataset, where noise is being applied only on the instances from the negative class, flipping 5% of the instances belonging to the negative sensitive group and 20% of the ones from the positive group
>>> from datasets import NoisyDataset
>>> noisy_dataset = NoisyDataset("BankAccountFraud", "TypeII", {0:0.05, 1:0.20}, [0])
>>> noisy_dataset.load_data()
>>> noisy_dataset.create_splits()
```
### Generating config files

The `configs`folder is organized into 2 subfolders, following the Aequitas experiment logic:
- `methods` contains the config files for each of the preprocessing methods being analyzed
- `datasets` which contains the config files for each noisy version of the used datasets. These configs can be automatically generated by calling the `generate_dataset_configs` function:
```python
>>> from generate_configs import generate_dataset_configs
>>> generate_dataset_configs({"BankAccountFraud":["TypeII"]})
```

Each specific type of injected noise must be run as a seperate experiment so that the same hyperparameters are sampled in each trial.

The experiment config files can be generated using the `generate_experiment_file` function:

```python
>>> from generate_configs import generate_experiment_files
>>> generate_experiment_files(
... methods = ["lightgbm", "OBNC", "Fair-OBNC", "PrevalenceSampling"],
... variants = {"BankAccountFraud":["TypeII"]},
... noise_injection = True,
... n_trials = 50,
)
```
### Running experiments

After setting up all the data and config files, one can run the `fairobnc_experiment.py` script to run the experiments:

```python
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection
```

## Analyzing results

The `result_analysis.py` file contains the definition of the functions used to analyze the obtained results and generate the plot presented in the paper.