Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sergioisidoro/aequalis

Discrimination free Naive Bayes - Replication of methods in the research paper Calders10
https://github.com/sergioisidoro/aequalis

Last synced: 23 days ago
JSON representation

Discrimination free Naive Bayes - Replication of methods in the research paper Calders10

Host: GitHub
URL: https://github.com/sergioisidoro/aequalis
Owner: sergioisidoro
License: gpl-2.0
Created: 2015-11-14T23:10:42.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-01-18T22:16:11.000Z (almost 9 years ago)
Last Synced: 2024-10-05T20:41:13.174Z (3 months ago)
Language: Python
Homepage:
Size: 1.86 MB
Stars: 3
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.MD
- License: LICENSE

Awesome Lists containing this project

README

## Discrimination free Naive Bayes

This is a project for a seminar on Fairness-aware machine learning (Autumn 2015)

It aims to implement some of the methods described in the paper Calders10 (see research folder),
introducing ways to make a Naive Bayes classifier non discriminatory

## Code

The majority of the code can be found in bayes.py

Some auxiliary functions for the models can be found in bayes_utils. (some
credit to Jason Brownlee -
http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/ -
as some of his code was used to reduce boilerplate code)

# Binary Bayes Model

A simple Naive Bayes Model

# Split Fair Bayes Model (2M Model)

This model splits the dataset into ```n``` subsets, one for each of the values
of the sensitive parameter that we do not want to discriminate against.

By creating a model for each of the subset, it minimizes the discrimination
of the model.

# Balanced Bayes Model (MODIFIED BAYES)

This model balances the dataset, making it more fair, tweaking the likelihood by
changing the number of occurrences in the model. It makes it, however, without
disturbing the probability of classifying sample x as the positive class (eg.
if we talk about loan attribution, we want to keep the total number of
loans the same as before)

# RESULTS
```
NORMAL MODEL:
Accuracy: 78.4165591794
Discrimination score: 0.398512741018

2M MODEL:
Accuracy: 78.5332596278
Discrimination score: 0.165316892258

MODIFIED MODEL:
Accuracy: 75.9412812481
Discrimination score: -0.0101233590263

```