{"id":13704301,"url":"https://modeloriented.github.io/FairPAN/","last_synced_at":"2025-05-05T09:33:41.118Z","repository":{"id":84196341,"uuid":"394198426","full_name":"ModelOriented/FairPAN","owner":"ModelOriented","description":null,"archived":false,"fork":false,"pushed_at":"2021-10-04T16:54:51.000Z","size":2872,"stargazers_count":7,"open_issues_count":1,"forks_count":3,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-08-03T21:04:54.427Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://modeloriented.github.io/FairPAN/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ModelOriented.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-08-09T07:45:26.000Z","updated_at":"2024-04-17T19:56:04.000Z","dependencies_parsed_at":"2023-05-23T22:00:34.902Z","dependency_job_id":null,"html_url":"https://github.com/ModelOriented/FairPAN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FFairPAN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FFairPAN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FFairPAN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FFairPAN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ModelOriented","download_url":"https://codeload.github.com/ModelOriented/FairPAN/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224439884,"owners_count":17311542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T21:01:07.173Z","updated_at":"2024-11-13T11:31:11.578Z","avatar_url":"https://github.com/ModelOriented.png","language":"R","funding_links":[],"categories":["Tools"],"sub_categories":["Fairness"],"readme":"# FairPAN - Fair Predictive Adversarial Network\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/ModelOriented/FairPAN/workflows/R-CMD-check/badge.svg)](https://github.com/ModelOriented/FairPAN/actions)\n[![Codecov test coverage](https://codecov.io/gh/ModelOriented/FairPAN/branch/master/graph/badge.svg)](https://codecov.io/gh/ModelOriented/FairPAN?branch=master)\n\u003c!-- badges: end --\u003e\n\n## Overview\n\nHave you just created a model which is biased against some subgroup? Or have\nyou just tried to fight the bias, but models performance dropped significantly?\nUse `FairPAN` to create neural network model that provides fair predictions and\nachieves outstanding performance! With `pretrain()` you can create or provide\nyour own neural networks and then use them in `fair_train()` to achieve fair\noutcomes. R package FairPAN additionally allows you to use lots of \n[DALEX](https://github.com/ModelOriented/DALEX) \nand [fairmodels](https://github.com/ModelOriented/fairmodels)\nfunctions such as `DALEX::model_performance()` or `fairmodels::fairness_check()`.\n\n*If you have problems with the training process remember to use monitor parameter and plot_monitor function for parameter adjustments.*\n\nCheck [FairPAN Website](https://modeloriented.github.io/FairPAN/)!\n\n## Theoretical introduction\n\n### Introduction to Fairness\n\nConsider the idea of the algorithm that has to predict whether giving credit to a person is risky or not. It is learning on real data of giving credits which were biased against females (historical fact). In that case, the model learns this bias, which is not only included in the simple sex variable but also is hidden inside other variables. Fairness enables us to detect such bias and handles a few methods to fight it. To learn more, I recommend the article ['Fairmodels: A Flexible Tool For Bias Detection, Visualization, And Mitigation' by Jakub Wisniewski and Przemysław Biecek](\"https://arxiv.org/pdf/2104.00507.pdf\").\n\n### Introduction to GANs\n\nGenerative Adversarial Networks are two neural networks that learn together. The Generator has to generate new samples that are indistinguishable from original data and the adversarial has to distinguish if the observation is original or generated. The generator is punished whenever the adversarial makes the correct prediction. After such process generator eventually learns how to make indistinguishable predictions and adversaries' accuracy drops down to 50% when a model cannot distinguish the two classes. The idea of GANs\nwas proposed in [Generative Adversarial Nets, Ian Goodfellow](https://arxiv.org/pdf/1406.2661.pdf).\n\n### FairPAN\n\nFairPANs are the solution to bring fairness into neural networks. We mimic the GANs by subsetting generator with classifier (predictor) and adversarial has to predict the sensitive value (such as sex, race, etc) from the output of the predictor. This process eventually leads the classifier to make predictions with indistinguishable sensitive values. The idea comes from blogs: [Towards fairness in ML with adversarial networks, Stijn Tonk](https://godatadriven.com/blog/towards-fairness-in-ml-with-adversarial-networks/) and [Fairness in Machine Learning with PyTorch, Henk Griffoen](https://godatadriven.com/blog/fairness-in-machine-learning-with-pytorch/) however, our implementation in R offers slightly different solutions. And the exact idea behind using GANs for Fairness is described in [Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction, Christina Wadsworth, Francesca Vera, Chris Piech](https://stanford.edu/~cpiech/bio/papers/fairnessAdversary.pdf).\n\n\u003ccenter\u003e\n\u003cimg src=\"./man/images/architecture_PAN.png\" alt=\"drawing\"/\u003e\n\u003c/center\u003e\n\nThe diagram above represents the architecture of our model and is strongly inspired by aforementioned blogs.\n\n### Custom Loss Function\n\nThe crucial part of this model is the metric we use to engage the two models into a zero-sum game. This is captured by the following objective function: \n\n\u003ccenter\u003e\n\u003cimg src=\"./man/images/equation.png\" alt=\"drawing\" height=\"40\"/\u003e\n\u003c/center\u003e\n\nSo, it learns to minimize its prediction losses while maximizing that of the adversarial (due to lambda being positive and minimizing a negated loss is the same as maximizing it). The objective during the game is simpler for the adversarial: predict sex based on the income level predictions of the classifier. This is captured in the following objective function:\n\n\u003ccenter\u003e\n\u003cimg src=\"./man/images/equation2.png\" alt=\"drawing\" height=\"40\"/\u003e\n\u003c/center\u003e\n\nThe adversarial does not care about the prediction accuracy of the classifier. It is only concerned with minimizing its prediction losses.\nFirstly we pretrain classifier and adversarial. Later we begin the proper PAN training with both networks: we train the adversarial, provide its loss to the classifier, and after that, we train the classifier. This method shall lead us to fair predictions of the FairPAN model.\n\n## Why?\n\nRegular mitigation techniques tend to worsen performance of the classifier a lot\nby decreasing accuracy for example, whereas FairPAN has no such drawback and\nworsening of the performance is really small. Moreover, our package is very \nflexible because it enables to provide your own neural networks, but also\nto create one with our functions. The outcomes are also created with the usage of\n`DALEX` and `fairmodels`, so one can use their methods and visualizations. \nAdditionally the workflow of the package is really simple and clean, because of \nmultiple features available for user, such as `preprocess` function.\n\n## Installation\n\nInstall the developer version from GitHub:\n\n``` r\ndevtools::install_github(\"ModelOriented/FairPAN\",build_vignettes = TRUE)\n```\n\n## Workflow\n\nThe graph below represents how the workflow inside the package looks like. \nFirstly we have to provide data and use `preprocess()` which creates all sets\nneeded for this package to work. One can also skip that step, however it is not\nadvisable to do so. Later we have to create a `dataset_loader()` which organises\nour data to be ready for torch usage. The nest step is really flexible, because\nwe can choose whether we want to create our functions with the package openly via\n`create_model()` and `pretrain_net()`, hidden inside `pretrain()` or we want\nto provide neural networks create on our own, which can be pretrained or not,\ndepending on our needs. It is extremely powerful, because we can provide some\nwell known and pretrained classifiers. Later, we engage the `fair_train()` process\nwhich outcomes we can visualize by setting `monitor` to true and using \n`plot_monitor()`. Although we can finish the process at his spot, we can also\nanalyse the outcomes a bit more with `explain_pan()` and use all `DALEX` functions\non the returned explainer. This explainer can also be used to apply \n`fairmodels::fairness_check()` and other functions from this package.\n\n\u003ccenter\u003e\n\u003cimg src=\"./man/images/workflow_diagram.png\" alt=\"drawing\"/\u003e\n\u003c/center\u003e\n\n\n## Example\n\nAchieve fairness and save performance!\n\n``` r\nlibrary(fairpan)\n\nadult \u003c- fairmodels::adult\n\n# ------------------- step 1 - prepare data  ------------------------\n\ndata \u003c- preprocess( data = adult,\n                    target_name = \"salary\",\n                    sensitive_name = \"sex\",\n                    privileged = \"Male\",\n                    discriminated = \"Female\",\n                    drop_also = c(\"race\"),\n                    sample = 0.02,\n                    train_size = 0.6,\n                    test_size = 0.4,\n                    validation_size = 0,\n                    seed = 7\n)\n\ndev \u003c- \"cpu\"\n\ndsl \u003c- dataset_loader(train_x = data$train_x,\n                      train_y = data$train_y,\n                      test_x = data$test_x,\n                      test_y = data$test_y,\n                      batch_size = 5,\n                      dev = dev\n)\n\n# ------------ step 2 - create and pretrain models  -----------------\n\nmodels \u003c- pretrain(clf_model = NULL,\n                   adv_model = NULL,\n                   clf_optimizer = NULL,\n                   trained = FALSE,\n                   train_x = data$train_x,\n                   train_y = data$train_y,\n                   sensitive_train = data$sensitive_train,\n                   sensitive_test = data$sensitive_test,\n                   batch_size = 5,\n                   partition = 0.6,\n                   neurons_clf = c(32, 32, 32),\n                   neurons_adv = c(32, 32, 32),\n                   dimension_clf = 2,\n                   dimension_adv = 1,\n                   learning_rate_clf = 0.001,\n                   learning_rate_adv = 0.001,\n                   n_ep_preclf = 10,\n                   n_ep_preadv = 10,\n                   dsl = dsl,\n                   dev = dev,\n                   verbose = TRUE,\n                   monitor = TRUE\n)\n\n# --------------- step 3 - train for fairness  --------------------\n\nmonitor \u003c- fair_train( n_ep_pan = 17,\n                       dsl = dsl,\n                       clf_model = models$clf_model,\n                       adv_model = models$adv_model, \n                       clf_optimizer = models$clf_optimizer,\n                       adv_optimizer = models$adv_optimizer,\n                       dev = dev,\n                       sensitive_train = data$sensitive_train,\n                       sensitive_test = data$sensitive_test,  \n                       batch_size = 5,   \n                       learning_rate_adv = 0.001,  \n                       learning_rate_clf = 0.001, \n                       lambda = 130,\n                       verbose = TRUE,\n                       monitor = TRUE\n)\n\n# --------- step 4 - prepare outcomes and plot them  --------------\n\nplot_monitor(STP = monitor$STP,\n             adversary_acc = monitor$adversary_acc,\n             adversary_losses = monitor$adversary_losses,\n             classifier_acc = monitor$classifier_acc)\n\nexp_clf \u003c- explain_pan(y = data$test_y,\n                       model = models$clf_model,\n                       label = \"PAN\",\n                       data = data$data_test,\n                       data_scaled = data$data_scaled_test,\n                       batch_size = 5,\n                       dev = dev,\n                       verbose = TRUE\n)\n\nfobject \u003c- fairmodels::fairness_check(exp_PAN,\n                            protected = data$protected_test,\n                            privileged = \"Male\",\n                            verbose = TRUE)\nplot(fobject)\n\n```\n\n## Fair training is flexible\n\n`pretrain` function has optional parameters:\n\n* `clf_model`      nn_module describing classifiers neural network architecture\n\n* `adv_model`      nn_module describing adversaries neural network architecture\n\n* `clf_optimizer`  torch object providing classifier optimizer from pretrain\n\n* `trained`        settles whether clf_model is trained or not\n\nwhich enables users to provide their own and even pretrained neural network\nmodels.\n\nOn the other hand, you can use FairPAN package from the very beginning starting\nfrom data preprocessing with `preprocess()` function which provides every\ndataset that you will need for provided features.\n\n## Proper evaluation\n\nAlthough there are many metrics that measure fairness, our method focuses\non optimizing *Statistical Parity ratio* ( (TP+FP)/(TP+FP+TN+FN) ) which \ndescribes the similarity between distributions of privileged and discriminated \nvariables.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/modeloriented.github.io%2FFairPAN%2F","html_url":"https://awesome.ecosyste.ms/projects/modeloriented.github.io%2FFairPAN%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/modeloriented.github.io%2FFairPAN%2F/lists"}