{"id":24589107,"url":"https://github.com/big-shahmir/bayesian_networks","last_synced_at":"2025-08-22T22:15:32.523Z","repository":{"id":269474976,"uuid":"907523558","full_name":"Big-ShahMir/Bayesian_Networks","owner":"Big-ShahMir","description":"This project implements a Naive Bayesian model and the Variable Elimination algorithm to predict salaries based on demographic attributes from the 1994 US Census, while exploring fairness in machine learning predictions across gender to address potential biases in decision-making.","archived":false,"fork":false,"pushed_at":"2024-12-23T20:53:19.000Z","size":365,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-24T08:13:43.572Z","etag":null,"topics":["bayesian-network","bayesian-statistics","predictive-modeling","python3","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Big-ShahMir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-23T19:27:39.000Z","updated_at":"2024-12-23T20:53:23.000Z","dependencies_parsed_at":"2024-12-23T20:35:56.138Z","dependency_job_id":"90d50cb3-bdde-41c6-ae33-d0aa5c58299d","html_url":"https://github.com/Big-ShahMir/Bayesian_Networks","commit_stats":null,"previous_names":["big-shahmir/bayesian_networks"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Big-ShahMir%2FBayesian_Networks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Big-ShahMir%2FBayesian_Networks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Big-ShahMir%2FBayesian_Networks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Big-ShahMir%2FBayesian_Networks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Big-ShahMir","download_url":"https://codeload.github.com/Big-ShahMir/Bayesian_Networks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244117526,"owners_count":20400742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-network","bayesian-statistics","predictive-modeling","python3","statistics"],"created_at":"2025-01-24T08:13:44.861Z","updated_at":"2025-03-17T22:00:12.207Z","avatar_url":"https://github.com/Big-ShahMir.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Naive Bayesian Salary Prediction and Fairness Evaluation\n\nThis project implements a **Naive Bayesian Model** and the **Variable Elimination algorithm** to predict salaries based on demographic attributes from the 1994 US Census dataset. Additionally, the project evaluates the fairness of the model's predictions, particularly with respect to gender, exploring concepts such as demographic parity and sufficiency.\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Problem Statement](#problem-statement)\n- [Solution Overview](#solution-overview)\n- [Key Features](#key-features)\n- [Implementation Details](#implementation-details)\n- [How to Run the Project](#how-to-run-the-project)\n- [Dataset](#dataset)\n- [Evaluation of Fairness](#evaluation-of-fairness)\n- [Conclusion](#conclusion)\n\n---\n\n## Introduction\n\nThe purpose of this project is to build a machine learning model using Bayesian Networks to predict whether an individual earns more or less than $50K annually, based on demographic and professional attributes. The model uses a Naive Bayesian approach and incorporates fairness evaluations to assess and address potential biases in its predictions.\n\n---\n\n## Problem Statement\n\nThe 1994 US Census dataset provides various attributes about individuals, such as their education, marital status, and occupation. The problem is to:\n1. Predict salaries based on these attributes using a Naive Bayesian model.\n2. Evaluate the fairness of the predictions with respect to gender, ensuring that the model does not inadvertently introduce or perpetuate bias.\n\n---\n\n## Solution Overview\n\nThis project solves the problem by:\n1. Constructing a **Naive Bayesian Network** with salary as the root node and other attributes as conditional nodes.\n2. Implementing the **Variable Elimination algorithm** to perform probabilistic inference and make salary predictions.\n3. Training the model on `adult-train.csv` and testing it on `adult-test.csv`.\n4. Evaluating fairness using measures like demographic parity and sufficiency.\n\n---\n\n## Key Features\n\n- **Naive Bayesian Model**: Models salary predictions based on conditional probabilities of demographic attributes.\n- **Variable Elimination Algorithm**: Implements probabilistic inference for query variables and evidence.\n- **Fairness Evaluation**: Assesses fairness using demographic parity, separation, and sufficiency metrics.\n- **Exploratory Analysis**: Compares predictions across genders using core and extended evidence sets.\n\n---\n\n## Implementation Details\n\n### Key Functions\n\nThe following functions are implemented in `naive_bayes_solution.py`:\n\n1. **normalize**: Normalizes a Factor object without modifying the input.\n2. **restrict**: Restricts a factor based on a variable and its value.\n3. **sum_out**: Sums out a variable from a factor.\n4. **multiply**: Multiplies a list of factors into a single factor.\n5. **ve**: Performs variable elimination to compute probabilities for query variables.\n6. **naive_bayes_model**: Constructs the Naive Bayesian Network using training data.\n7. **explore**: Evaluates fairness metrics and answers exploratory questions.\n\n---\n\n## How to Run the Project\n\nFollow these steps to run the project locally:\n\n### 1. Clone the Repository\n```bash\ngit clone https://github.com/your-username/naive-bayes-salary-prediction.git\ncd naive-bayes-salary-prediction\n```\n### 2. Run the Training and Testing\n```bash\npython naive_bayes_solution.py\n```\n\n### 3. View the Results\nResults should be printed into the terminal and answer the following questioned in order:\n1. What percentage of the women in the test data set end up with a P(S=\"\u003e=$50K\"|E1) that is strictly greater than P(S=\"\u003e=$50K\"|E2)?\n2. What percentage of the men in the test data set end up with a P(S=\"\u003e=$50K\"|E1) that is strictly greater than P(S=\"\u003e=$50K\"|E2)?\n3. What percentage of the women in the test data set with P(S=\"\u003e=$50K\"|E1) \u003e 0.5 actually have a salary over $50K?\n4. What percentage of the men in the test data set with P(S=\"\u003e=$50K\"|E1) \u003e 0.5 actually have a salary over $50K?\n5. What percentage of the women in the test data set are assigned a P(Salary=\"\u003e=$50K\"|E1) \u003e 0.5, overall?\n6. What percentage of the men in the test data set are assigned a P(Salary=\"\u003e=$50K\"|E1) \u003e 0.5, overall?\n\n## Dataset\n\nThe project uses the 1994 US Census dataset, which is split into two files:\n\n- **`adult-train.csv`**: This file contains training data, including demographic attributes and salary information.\n- **`adult-test.csv`**: This file is used to evaluate the performance of the model and its fairness.\n\n## Evaluation of Fairness\n\n### Metrics for Fairness Assessment\n\n1. **Demographic Parity**:  \n   The probability of predicting a salary above $50K should be the same for both genders. For instance:  \n   - ( P({Salary} \u003e 50K {Gender} = {Male}) = P({Salary} \u003e 50K {Gender} = {Female}) )\n\n2. **Separation**:  \n   The predictions should be conditionally independent of gender when evidence is provided. This ensures fairness when considering additional factors.\n\n3. **Sufficiency**:  \n   The predictions should be equally accurate for individuals of all genders, reflecting true salary levels accurately.\n\n## Conclusion\n\nThis project highlights the potential of Bayesian Networks for predictive tasks while addressing fairness concerns in machine learning. By incorporating fairness metrics, the project emphasizes the importance of equitable AI systems in real-world applications.\n\nFuture improvements include:\n1. Exploring advanced Bayesian Network structures.\n2. Adding fairness-aware modifications to the model.\n3. Testing on larger and more diverse datasets.\n\n## Contributing\nContributions are welcome! Feel free to submit issues or pull requests to improve this solver.\n\n## License\nThis project is licensed under the MIT License. See the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbig-shahmir%2Fbayesian_networks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbig-shahmir%2Fbayesian_networks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbig-shahmir%2Fbayesian_networks/lists"}