{"id":22848423,"url":"https://github.com/feedzai/data-bias-fraud-study","last_synced_at":"2025-03-31T06:11:41.453Z","repository":{"id":150487640,"uuid":"596966307","full_name":"feedzai/data-bias-fraud-study","owner":"feedzai","description":null,"archived":false,"fork":false,"pushed_at":"2023-02-20T16:15:19.000Z","size":2110,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-06T10:28:01.059Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feedzai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-03T10:24:10.000Z","updated_at":"2024-02-12T10:14:50.000Z","dependencies_parsed_at":"2023-07-28T23:31:16.626Z","dependency_job_id":null,"html_url":"https://github.com/feedzai/data-bias-fraud-study","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fdata-bias-fraud-study","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fdata-bias-fraud-study/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fdata-bias-fraud-study/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fdata-bias-fraud-study/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feedzai","download_url":"https://codeload.github.com/feedzai/data-bias-fraud-study/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246423729,"owners_count":20774820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-13T04:12:09.112Z","updated_at":"2025-03-31T06:11:41.428Z","avatar_url":"https://github.com/feedzai.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A Data-Centric Study on Unfairness in Fraud Detection\n\nThis is the repository for the KDD 2023 Applied Data Science Track submission _\"A Data-Centric Study on Unfairness in Fraud Detection\"_.\n\nThis repository contains:\n- Code and data to reproduce the plots shown in the results section of the paper.\n- Code on how to reproduce the paper's experiments on a realistic, publicly-available, state-of-the-art bank account fraud dataset suite.\n\n\n## Key Contributions\n\n- A formal taxonomy to characterize data bias between a protected attribute, other features, and the target variable.\n- Experimental results for a comprehensive suite of scenarios regarding fairness-accuracy trade-offs ML models make under distinct types of data bias, pertinent, but not restricted to, fraud detection.\n- Demonstrating how models can shape data bias, and consequently unfairness, in dynamic environments.\n- Showing how, by changing data bias settings, the picture of algorithmic fairness changes, and how comparisons among algorithms differ.\n- Raising awareness to the issue of variance in fairness measurements, underlining the importance of employing robust models and metrics.\n- Evaluation of the utility of simple unfairness mitigation methods under distinct data bias conditions.\n\n\n## Plot Reproducibility\n\n![Scenario 5 plot from the paper (fraudster adversarial behaviour).](paper_plot.png)\n\n- [paper_plots.ipynb](notebooks/paper_plots.ipynb) contains code to reproduce each plot in the results section of the paper. \n- [results_data/](results_data/) folder contains the trained models' evaluation results for each experiment, which are used to create the plots.\n\n\n## Running experiments on a public dataset.\n\n- The notebook [baf_experiments.ipynb](notebooks/baf_reproduction.ipynb) contains code to reproduce the experiments of the paper on [Bank Account Fraud (BAF)](https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022), a publicly-available bank account fraud dataset suite (the most similar to the one we used).\n    - This suite contains a total of 6 realistic fraud datasets (one base dataset, and 5 variants), in which each dataset has a type of data bias, such that they can be used to reproduce some of the experiments conducted in our paper.\n    - For example, the following correspondence can be made between the suite's dataset and the data bias Scenarios we analyzed in the paper:\n        - Base dataset for the baseline\n        - Variant I for Scenario 1\n        - Variant II for Scenario 2\n        - Variant III for Scenario 3\n        - Variant V for Scenario 5\n    \n    Reproducibility for Scenarios 4 and 6 are a work in progress.\n\nThis code considers the hyperparameter configurations used in the paper's experiments (sampled from the grids in folder hyperparameter_spaces).\n\nWe are unable to provide further information on the original data due to privacy concerns.\n\n\n## Citing\nWIP.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Fdata-bias-fraud-study","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeedzai%2Fdata-bias-fraud-study","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Fdata-bias-fraud-study/lists"}