https://github.com/big-shahmir/bayesian_networks

This project implements a Naive Bayesian model and the Variable Elimination algorithm to predict salaries based on demographic attributes from the 1994 US Census, while exploring fairness in machine learning predictions across gender to address potential biases in decision-making.
https://github.com/big-shahmir/bayesian_networks

bayesian-network bayesian-statistics predictive-modeling python3 statistics

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/big-shahmir/bayesian_networks
Owner: Big-ShahMir
License: mit
Created: 2024-12-23T19:27:39.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-23T20:53:19.000Z (over 1 year ago)
Last Synced: 2025-01-24T08:13:43.572Z (over 1 year ago)
Topics: bayesian-network, bayesian-statistics, predictive-modeling, python3, statistics
Language: Python
Homepage:
Size: 356 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Naive Bayesian Salary Prediction and Fairness Evaluation

This project implements a **Naive Bayesian Model** and the **Variable Elimination algorithm** to predict salaries based on demographic attributes from the 1994 US Census dataset. Additionally, the project evaluates the fairness of the model's predictions, particularly with respect to gender, exploring concepts such as demographic parity and sufficiency.

## Table of Contents

- [Introduction](#introduction)
- [Problem Statement](#problem-statement)
- [Solution Overview](#solution-overview)
- [Key Features](#key-features)
- [Implementation Details](#implementation-details)
- [How to Run the Project](#how-to-run-the-project)
- [Dataset](#dataset)
- [Evaluation of Fairness](#evaluation-of-fairness)
- [Conclusion](#conclusion)

---

## Introduction

The purpose of this project is to build a machine learning model using Bayesian Networks to predict whether an individual earns more or less than $50K annually, based on demographic and professional attributes. The model uses a Naive Bayesian approach and incorporates fairness evaluations to assess and address potential biases in its predictions.

---

## Problem Statement

The 1994 US Census dataset provides various attributes about individuals, such as their education, marital status, and occupation. The problem is to:
1. Predict salaries based on these attributes using a Naive Bayesian model.
2. Evaluate the fairness of the predictions with respect to gender, ensuring that the model does not inadvertently introduce or perpetuate bias.

---

## Solution Overview

This project solves the problem by:
1. Constructing a **Naive Bayesian Network** with salary as the root node and other attributes as conditional nodes.
2. Implementing the **Variable Elimination algorithm** to perform probabilistic inference and make salary predictions.
3. Training the model on `adult-train.csv` and testing it on `adult-test.csv`.
4. Evaluating fairness using measures like demographic parity and sufficiency.

---

## Key Features

- **Naive Bayesian Model**: Models salary predictions based on conditional probabilities of demographic attributes.
- **Variable Elimination Algorithm**: Implements probabilistic inference for query variables and evidence.
- **Fairness Evaluation**: Assesses fairness using demographic parity, separation, and sufficiency metrics.
- **Exploratory Analysis**: Compares predictions across genders using core and extended evidence sets.

---

## Implementation Details

### Key Functions

The following functions are implemented in `naive_bayes_solution.py`:

1. **normalize**: Normalizes a Factor object without modifying the input.
2. **restrict**: Restricts a factor based on a variable and its value.
3. **sum_out**: Sums out a variable from a factor.
4. **multiply**: Multiplies a list of factors into a single factor.
5. **ve**: Performs variable elimination to compute probabilities for query variables.
6. **naive_bayes_model**: Constructs the Naive Bayesian Network using training data.
7. **explore**: Evaluates fairness metrics and answers exploratory questions.

---

## How to Run the Project

Follow these steps to run the project locally:

### 1. Clone the Repository
```bash
git clone https://github.com/your-username/naive-bayes-salary-prediction.git
cd naive-bayes-salary-prediction
```
### 2. Run the Training and Testing
```bash
python naive_bayes_solution.py
```

### 3. View the Results
Results should be printed into the terminal and answer the following questioned in order:
1. What percentage of the women in the test data set end up with a P(S=">=$50K"|E1) that is strictly greater than P(S=">=$50K"|E2)?
2. What percentage of the men in the test data set end up with a P(S=">=$50K"|E1) that is strictly greater than P(S=">=$50K"|E2)?
3. What percentage of the women in the test data set with P(S=">=$50K"|E1) > 0.5 actually have a salary over $50K?
4. What percentage of the men in the test data set with P(S=">=$50K"|E1) > 0.5 actually have a salary over $50K?
5. What percentage of the women in the test data set are assigned a P(Salary=">=$50K"|E1) > 0.5, overall?
6. What percentage of the men in the test data set are assigned a P(Salary=">=$50K"|E1) > 0.5, overall?

## Dataset

The project uses the 1994 US Census dataset, which is split into two files:

- **`adult-train.csv`**: This file contains training data, including demographic attributes and salary information.
- **`adult-test.csv`**: This file is used to evaluate the performance of the model and its fairness.

## Evaluation of Fairness

### Metrics for Fairness Assessment

1. **Demographic Parity**:
The probability of predicting a salary above $50K should be the same for both genders. For instance:
- ( P({Salary} > 50K {Gender} = {Male}) = P({Salary} > 50K {Gender} = {Female}) )

2. **Separation**:
The predictions should be conditionally independent of gender when evidence is provided. This ensures fairness when considering additional factors.

3. **Sufficiency**:
The predictions should be equally accurate for individuals of all genders, reflecting true salary levels accurately.

## Conclusion

This project highlights the potential of Bayesian Networks for predictive tasks while addressing fairness concerns in machine learning. By incorporating fairness metrics, the project emphasizes the importance of equitable AI systems in real-world applications.

Future improvements include:
1. Exploring advanced Bayesian Network structures.
2. Adding fairness-aware modifications to the model.
3. Testing on larger and more diverse datasets.

## Contributing
Contributions are welcome! Feel free to submit issues or pull requests to improve this solver.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/big-shahmir/bayesian_networks

Awesome Lists containing this project

README