{"id":22848352,"url":"https://github.com/feedzai/fair-obnc","last_synced_at":"2025-03-31T06:11:29.641Z","repository":{"id":254818965,"uuid":"842437878","full_name":"feedzai/fair-obnc","owner":"feedzai","description":"Supplementary Material and Code for the paper “Fair-OBNC: Correcting Label Noise for Fairer Datasets”.","archived":false,"fork":false,"pushed_at":"2024-08-26T08:54:46.000Z","size":494,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-06T10:27:31.060Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feedzai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-14T10:56:21.000Z","updated_at":"2024-08-26T08:54:50.000Z","dependencies_parsed_at":"2024-08-26T11:17:16.298Z","dependency_job_id":null,"html_url":"https://github.com/feedzai/fair-obnc","commit_stats":null,"previous_names":["feedzai/fair-obnc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffair-obnc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffair-obnc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffair-obnc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Ffair-obnc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feedzai","download_url":"https://codeload.github.com/feedzai/fair-obnc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246423729,"owners_count":20774820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-13T04:11:29.733Z","updated_at":"2025-03-31T06:11:29.624Z","avatar_url":"https://github.com/feedzai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fair-OBNC\n\n## Description\n\nThis repository contains the code and instructions to reproduce the experiments and results presented in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.\n\n## Table of Contents\n\n- [Replicating the conducted experiments](#replicating-the-conducted-experiments)\n- [Running your own experiments](#running-your-own-experiments)\n    - [Generating data](#generating-and-loading-data)\n    - [Generating config files](#generating-config-files)\n    - [Running experiments](#running-experiments)\n- [Analyzing results](#analyzing-results)\n\n## Replicating the conducted experiments\n\nThis section details how to replicate our experiments to obtain the results we present in the paper *Fair-OBNC: Correcting Label Noise for Fairer Datasets*.\n\nThe first step is to install the Aequitas Flow package:\n\n```bash\npip install git+https://github.com/dssg/aequitas.git\n```\n\nThen, one can download the necessary data by running:\n\n```python\n# To store the necessary data\n\u003e\u003e\u003e from generate_data import generate_data\n\u003e\u003e\u003e generate_data({\"BankAccountFraud\": [\"TypeII\"]})\n```\n\nFinally, we include in this repository the configuration files we used in our experiments, so the only step left is to run the `fairobnc_experiment.py` script to run the experiments:\n\n```python\n# To run the experiments with the multiple injected noise scenarios\n\u003e\u003e\u003e python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection\n\n# To run the experiments without noise injection\n\u003e\u003e\u003e python -m fairobnc_experiment baf typeii noise_injection_experiment\n```\n\n## Running your own experiments\n\nIf you wish to test our method in addtional scenarios, our framework can be used to test more cases. \n\n### Generating and loading data\n\nThe `generate_data` function loads the desired datasets from Aequitas, generates the IID versions of it and injects noise into the labels, storing the necessary files for using the `IIDDataset` and `NoisyDataset` classes.\n\n```python\n# To store the necessary data\n\u003e\u003e\u003e from generate_data import generate_data\n\u003e\u003e\u003e generate_data({\"BankAccountFraud\": [\"TypeII\"]})\n\n# To load an IID dataset \n\u003e\u003e\u003e from datasets import IIDDataset\n\u003e\u003e\u003e iid_dataset = IIDDataset(\"BankAccountFraud\", \"TypeII\")\n\u003e\u003e\u003e iid_dataset.load_data()\n\u003e\u003e\u003e iid_dataset.create_splits()\n\n# To load a noisy dataset, where noise is being applied only on the instances from the negative class, flipping 5% of the instances belonging to the negative sensitive group and 20% of the ones from the positive group\n\u003e\u003e\u003e from datasets import NoisyDataset\n\u003e\u003e\u003e noisy_dataset = NoisyDataset(\"BankAccountFraud\", \"TypeII\", {0:0.05, 1:0.20}, [0])\n\u003e\u003e\u003e noisy_dataset.load_data()\n\u003e\u003e\u003e noisy_dataset.create_splits()\n```\n### Generating config files\n\nThe `configs`folder is organized into 2 subfolders, following the Aequitas experiment logic:\n-  `methods` contains the config files for each of the preprocessing methods being analyzed\n- `datasets` which contains the config files for each noisy version of the used datasets. These configs can be automatically generated by calling the `generate_dataset_configs` function:\n    ```python\n    \u003e\u003e\u003e from generate_configs import generate_dataset_configs\n    \u003e\u003e\u003e generate_dataset_configs({\"BankAccountFraud\":[\"TypeII\"]})\n    ```\n\nEach specific type of injected noise must be run as a seperate experiment so that the same hyperparameters are sampled in each trial.\n\nThe experiment config files can be generated using the `generate_experiment_file` function:\n\n```python\n\u003e\u003e\u003e from generate_configs import generate_experiment_files\n\u003e\u003e\u003e generate_experiment_files(\n...     methods = [\"lightgbm\", \"OBNC\", \"Fair-OBNC\", \"PrevalenceSampling\"],\n...     variants = {\"BankAccountFraud\":[\"TypeII\"]},\n...     noise_injection = True,\n...     n_trials = 50,\n)\n```\n### Running experiments\n\nAfter setting up all the data and config files, one can run the `fairobnc_experiment.py` script to run the experiments:\n\n```python\n\u003e\u003e\u003e python -m fairobnc_experiment baf typeii noise_injection_experiment --noise_injection\n```\n\n## Analyzing results\n\nThe `result_analysis.py` file contains the definition of the functions used to analyze the obtained results and generate the plot presented in the paper.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Ffair-obnc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeedzai%2Ffair-obnc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Ffair-obnc/lists"}