https://github.com/feedzai/fair-obnc

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.
https://github.com/feedzai/fair-obnc

Last synced: about 1 year ago
JSON representation

Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.

Host: GitHub
URL: https://github.com/feedzai/fair-obnc
Owner: feedzai
License: other
Created: 2024-08-14T10:56:21.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-26T08:54:46.000Z (almost 2 years ago)
Last Synced: 2025-02-06T10:27:31.060Z (over 1 year ago)
Language: Python
Homepage:
Size: 482 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Fair-OBNC

## Description

This repository contains the code and instructions to reproduce the experiments and results presented in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.

## Table of Contents

- [Replicating the conducted experiments](#replicating-the-conducted-experiments)

- [Running your own experiments](#running-your-own-experiments)

    - [Generating data](#generating-and-loading-data)

    - [Generating config files](#generating-config-files)

    - [Running experiments](#running-experiments)

- [Analyzing results](#analyzing-results)

## Replicating the conducted experiments

This section details how to replicate our experiments to obtain the results we present in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.

The first step is to install the Aequitas Flow package:

```bash

pip install git+https://github.com/dssg/aequitas.git

```

Then, one can download the necessary data by running:

```python

# To store the necessary data

>>> from generate_data import generate_data

>>> generate_data({"BankAccountFraud": ["TypeII"]})

```

Finally, we include in this repository the configuration files we used in our experiments, so the only step left is to run the `fairobnc_experiment.py` script to run the experiments:

```python

# To run the experiments with the multiple injected noise scenarios

>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection

# To run the experiments without noise injection

>>> python -m fairobnc_experiment baf typeii noise_injection_experiment

```

## Running your own experiments

If you wish to test our method in addtional scenarios, our framework can be used to test more cases. 

### Generating and loading data

The `generate_data` function loads the desired datasets from Aequitas, generates the IID versions of it and injects noise into the labels, storing the necessary files for using the `IIDDataset` and `NoisyDataset` classes.

```python

# To store the necessary data

>>> from generate_data import generate_data

>>> generate_data({"BankAccountFraud": ["TypeII"]})

# To load an IID dataset 

>>> from datasets import IIDDataset

>>> iid_dataset = IIDDataset("BankAccountFraud", "TypeII")

>>> iid_dataset.load_data()

>>> iid_dataset.create_splits()

# To load a noisy dataset, where noise is being applied only on the instances from the negative class, flipping 5% of the instances belonging to the negative sensitive group and 20% of the ones from the positive group

>>> from datasets import NoisyDataset

>>> noisy_dataset = NoisyDataset("BankAccountFraud", "TypeII", {0:0.05, 1:0.20}, [0])

>>> noisy_dataset.load_data()

>>> noisy_dataset.create_splits()

```

### Generating config files

The `configs`folder is organized into 2 subfolders, following the Aequitas experiment logic:

-  `methods` contains the config files for each of the preprocessing methods being analyzed

- `datasets` which contains the config files for each noisy version of the used datasets. These configs can be automatically generated by calling the `generate_dataset_configs` function:

    ```python

    >>> from generate_configs import generate_dataset_configs

    >>> generate_dataset_configs({"BankAccountFraud":["TypeII"]})

    ```

Each specific type of injected noise must be run as a seperate experiment so that the same hyperparameters are sampled in each trial.

The experiment config files can be generated using the `generate_experiment_file` function:

```python

>>> from generate_configs import generate_experiment_files

>>> generate_experiment_files(

...     methods = ["lightgbm", "OBNC", "Fair-OBNC", "PrevalenceSampling"],

...     variants = {"BankAccountFraud":["TypeII"]},

...     noise_injection = True,

...     n_trials = 50,

)

```

### Running experiments

After setting up all the data and config files, one can run the `fairobnc_experiment.py` script to run the experiments:

```python

>>> python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection

```

## Analyzing results

The `result_analysis.py` file contains the definition of the functions used to analyze the obtained results and generate the plot presented in the paper.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/feedzai/fair-obnc

Awesome Lists containing this project

README