https://github.com/j2kun/fkl-sdm16

Code and experiments for "A confidence-based approach for balancing fairness and accuracy"
https://github.com/j2kun/fkl-sdm16

fairness machine-learning research-paper

Last synced: 11 days ago
JSON representation

Code and experiments for "A confidence-based approach for balancing fairness and accuracy"

Host: GitHub
URL: https://github.com/j2kun/fkl-sdm16
Owner: j2kun
Created: 2016-01-19T21:15:32.000Z (almost 10 years ago)
Default Branch: main
Last Pushed: 2020-06-09T03:20:24.000Z (over 5 years ago)
Last Synced: 2025-10-09T15:33:04.761Z (11 days ago)
Topics: fairness, machine-learning, research-paper
Language: Python
Homepage: https://arxiv.org/abs/1601.05764
Size: 818 KB
Stars: 4
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Code and experiments for "A confidence-based approach for balancing fairness and accuracy"

All experiments used in this paper were implemented in Python 3 with the following

dependencies

    numpy

    matplotlib

    scikitlearn

## One-click rerun of all experiments

To re-run all experiments used in the paper, run the following from the command line

    ./run-all.sh

This will re-run all the experiments and output the data to plaintext

files in the results/ subdirectory.

To generate all plots used in the paper, run the following from the command line:

    python plot-all.py

## Datasets

The datasets are given the following names

    adult

    german 

    singles 

### Loading into Python

For each dataset there is a data loader module and a baseline (see the

Baselines section below). We will use `adult` as the prototype, and unless

otherwise stated all datasets operate the same way with `adult` replaced by the

dataset name. The raw data files are `adult.train` and `adult.test`. If

preprocessing occurred to split a dataset into training and testing subsets,

then the unprocessed data files are in the `preprocessing/` subdirectory along

with python scripts to perform the (randomized) preprocessing. Additional

preprocessing is performed to turn categorical features into (possibly many)

binary features.

To load a dataset, you can run the following commands from the base directory

of the project.

    $ python

    Python 3.3.3 (default, Dec 30 2013, 23:51:18) 

    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin

    >>> from data import adult

    >>> trainingData, testData = adult.load()

    >>> adult.protectedIndex

    1

    >>> len(trainingData)

    32561

    >>> trainingData[0]

    ((39, 1, 0, 0, 0, 0, 0, 1, 0, 0, 13, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2174, 0, 40, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    0, 0, 0, 0, 0), -1)

## An example experiment

An example experiment, testing the linear regression learner on the German dataset.

```

Python 3.6.3 (default, Oct  4 2017, 06:09:15) 

[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin

>>> from data import german

>>> train, test = german.load()

>>> from margin import *

>>> def lrLearner(train, protectedIndex, protectedValue):

...    marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)

...    shift = marginAnalyzer.optimalShift()

...    print('best shift is: %r' % (shift,))

...    return marginAnalyzer.conditionalShiftClassifier(shift)

... 

>>> h = lrLearner(train, german.protectedIndex, german.protectedValue)

best shift is: -0.19250157835095894

>>> from errorfunctions import signedStatisticalParity, labelError, individualFairness

>>> labelError(test, h)

0.25825825825825827 

```

Copy-pastable:

```

from data import german

from errorfunctions import signedStatisticalParity, labelError, individualFairness

from margin import *

train, test = german.load()

def lrLearner(train, protectedIndex, protectedValue):

    marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)

    shift = marginAnalyzer.optimalShift()

    print('best shift is: %r' % (shift,))

    return marginAnalyzer.conditionalShiftClassifier(shift)

 

h = lrLearner(train, german.protectedIndex, german.protectedValue)

labelError(test, h)

```

## Experiments

The experiments are organized by method, using the acronyms from the paper.  So

random relabeling (RR) is in the `experiment-RR.py` file. Each experiment has a

`runAll()` function that runs all of the experiments for every dataset and

learner (SVM, logistic regression, and AdaBoost). Note that boosting and SVM

take ~5-30 minutes per run on large datasets, and each experiment averages over

10 runs.

## Plots

The main plots in the paper are produced by the MarginAnalyzer class in

`margin.py`. See the `MarginAnalyzer.plotMarginHistogram` and

`MarginAnalyzer.plotTradeoff` functions for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/j2kun/fkl-sdm16

Awesome Lists containing this project

README