https://github.com/feedzai/fair-obnc
Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.
https://github.com/feedzai/fair-obnc
Last synced: about 1 year ago
JSON representation
Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.
- Host: GitHub
- URL: https://github.com/feedzai/fair-obnc
- Owner: feedzai
- License: other
- Created: 2024-08-14T10:56:21.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-26T08:54:46.000Z (almost 2 years ago)
- Last Synced: 2025-02-06T10:27:31.060Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 482 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fair-OBNC
## Description
This repository contains the code and instructions to reproduce the experiments and results presented in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.
## Table of Contents
- [Replicating the conducted experiments](#replicating-the-conducted-experiments)
- [Running your own experiments](#running-your-own-experiments)
- [Generating data](#generating-and-loading-data)
- [Generating config files](#generating-config-files)
- [Running experiments](#running-experiments)
- [Analyzing results](#analyzing-results)
## Replicating the conducted experiments
This section details how to replicate our experiments to obtain the results we present in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.
The first step is to install the Aequitas Flow package:
```bash
pip install git+https://github.com/dssg/aequitas.git
```
Then, one can download the necessary data by running:
```python
# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})
```
Finally, we include in this repository the configuration files we used in our experiments, so the only step left is to run the `fairobnc_experiment.py` script to run the experiments:
```python
# To run the experiments with the multiple injected noise scenarios
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection
# To run the experiments without noise injection
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment
```
## Running your own experiments
If you wish to test our method in addtional scenarios, our framework can be used to test more cases.
### Generating and loading data
The `generate_data` function loads the desired datasets from Aequitas, generates the IID versions of it and injects noise into the labels, storing the necessary files for using the `IIDDataset` and `NoisyDataset` classes.
```python
# To store the necessary data
>>> from generate_data import generate_data
>>> generate_data({"BankAccountFraud": ["TypeII"]})
# To load an IID dataset
>>> from datasets import IIDDataset
>>> iid_dataset = IIDDataset("BankAccountFraud", "TypeII")
>>> iid_dataset.load_data()
>>> iid_dataset.create_splits()
# To load a noisy dataset, where noise is being applied only on the instances from the negative class, flipping 5% of the instances belonging to the negative sensitive group and 20% of the ones from the positive group
>>> from datasets import NoisyDataset
>>> noisy_dataset = NoisyDataset("BankAccountFraud", "TypeII", {0:0.05, 1:0.20}, [0])
>>> noisy_dataset.load_data()
>>> noisy_dataset.create_splits()
```
### Generating config files
The `configs`folder is organized into 2 subfolders, following the Aequitas experiment logic:
- `methods` contains the config files for each of the preprocessing methods being analyzed
- `datasets` which contains the config files for each noisy version of the used datasets. These configs can be automatically generated by calling the `generate_dataset_configs` function:
```python
>>> from generate_configs import generate_dataset_configs
>>> generate_dataset_configs({"BankAccountFraud":["TypeII"]})
```
Each specific type of injected noise must be run as a seperate experiment so that the same hyperparameters are sampled in each trial.
The experiment config files can be generated using the `generate_experiment_file` function:
```python
>>> from generate_configs import generate_experiment_files
>>> generate_experiment_files(
... methods = ["lightgbm", "OBNC", "Fair-OBNC", "PrevalenceSampling"],
... variants = {"BankAccountFraud":["TypeII"]},
... noise_injection = True,
... n_trials = 50,
)
```
### Running experiments
After setting up all the data and config files, one can run the `fairobnc_experiment.py` script to run the experiments:
```python
>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection
```
## Analyzing results
The `result_analysis.py` file contains the definition of the functions used to analyze the obtained results and generate the plot presented in the paper.