https://github.com/j2kun/fkl-sdm16
Code and experiments for "A confidence-based approach for balancing fairness and accuracy"
https://github.com/j2kun/fkl-sdm16
fairness machine-learning research-paper
Last synced: 11 days ago
JSON representation
Code and experiments for "A confidence-based approach for balancing fairness and accuracy"
- Host: GitHub
- URL: https://github.com/j2kun/fkl-sdm16
- Owner: j2kun
- Created: 2016-01-19T21:15:32.000Z (almost 10 years ago)
- Default Branch: main
- Last Pushed: 2020-06-09T03:20:24.000Z (over 5 years ago)
- Last Synced: 2025-10-09T15:33:04.761Z (11 days ago)
- Topics: fairness, machine-learning, research-paper
- Language: Python
- Homepage: https://arxiv.org/abs/1601.05764
- Size: 818 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Code and experiments for "A confidence-based approach for balancing fairness and accuracy"
All experiments used in this paper were implemented in Python 3 with the following
dependenciesnumpy
matplotlib
scikitlearn## One-click rerun of all experiments
To re-run all experiments used in the paper, run the following from the command line
./run-all.sh
This will re-run all the experiments and output the data to plaintext
files in the results/ subdirectory.To generate all plots used in the paper, run the following from the command line:
python plot-all.py
## Datasets
The datasets are given the following names
adult
german
singles### Loading into Python
For each dataset there is a data loader module and a baseline (see the
Baselines section below). We will use `adult` as the prototype, and unless
otherwise stated all datasets operate the same way with `adult` replaced by the
dataset name. The raw data files are `adult.train` and `adult.test`. If
preprocessing occurred to split a dataset into training and testing subsets,
then the unprocessed data files are in the `preprocessing/` subdirectory along
with python scripts to perform the (randomized) preprocessing. Additional
preprocessing is performed to turn categorical features into (possibly many)
binary features.To load a dataset, you can run the following commands from the base directory
of the project.$ python
Python 3.3.3 (default, Dec 30 2013, 23:51:18)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
>>> from data import adult
>>> trainingData, testData = adult.load()
>>> adult.protectedIndex
1
>>> len(trainingData)
32561
>>> trainingData[0]
((39, 1, 0, 0, 0, 0, 0, 1, 0, 0, 13, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2174, 0, 40, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), -1)## An example experiment
An example experiment, testing the linear regression learner on the German dataset.
```
Python 3.6.3 (default, Oct 4 2017, 06:09:15)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
>>> from data import german
>>> train, test = german.load()
>>> from margin import *
>>> def lrLearner(train, protectedIndex, protectedValue):
... marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)
... shift = marginAnalyzer.optimalShift()
... print('best shift is: %r' % (shift,))
... return marginAnalyzer.conditionalShiftClassifier(shift)
...
>>> h = lrLearner(train, german.protectedIndex, german.protectedValue)
best shift is: -0.19250157835095894
>>> from errorfunctions import signedStatisticalParity, labelError, individualFairness
>>> labelError(test, h)
0.25825825825825827
```Copy-pastable:
```
from data import german
from errorfunctions import signedStatisticalParity, labelError, individualFairness
from margin import *train, test = german.load()
def lrLearner(train, protectedIndex, protectedValue):
marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)
shift = marginAnalyzer.optimalShift()
print('best shift is: %r' % (shift,))
return marginAnalyzer.conditionalShiftClassifier(shift)
h = lrLearner(train, german.protectedIndex, german.protectedValue)
labelError(test, h)
```## Experiments
The experiments are organized by method, using the acronyms from the paper. So
random relabeling (RR) is in the `experiment-RR.py` file. Each experiment has a
`runAll()` function that runs all of the experiments for every dataset and
learner (SVM, logistic regression, and AdaBoost). Note that boosting and SVM
take ~5-30 minutes per run on large datasets, and each experiment averages over
10 runs.## Plots
The main plots in the paper are produced by the MarginAnalyzer class in
`margin.py`. See the `MarginAnalyzer.plotMarginHistogram` and
`MarginAnalyzer.plotTradeoff` functions for details.