https://github.com/r-mahesh45/salary-prediction-using-naive-bayes
This project uses the Naive Bayes classification algorithm to predict an individual's salary based on features like age, education, occupation, and more. It evaluates model accuracy on training and test datasets. The model achieved a 77% accuracy on both sets.
https://github.com/r-mahesh45/salary-prediction-using-naive-bayes
extract-transform-load machine-learning-algorithms naive-bayes-algorithm python3
Last synced: 6 months ago
JSON representation
This project uses the Naive Bayes classification algorithm to predict an individual's salary based on features like age, education, occupation, and more. It evaluates model accuracy on training and test datasets. The model achieved a 77% accuracy on both sets.
- Host: GitHub
- URL: https://github.com/r-mahesh45/salary-prediction-using-naive-bayes
- Owner: R-Mahesh45
- Created: 2024-03-07T10:16:40.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-04T14:13:28.000Z (9 months ago)
- Last Synced: 2025-01-30T07:16:10.780Z (8 months ago)
- Topics: extract-transform-load, machine-learning-algorithms, naive-bayes-algorithm, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 2.91 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Salary Prediction using Naive Bayes
## Project Overview
This project applies a Naive Bayes classification model to predict an individual's salary based on various features such as age, education, occupation, marital status, and more. The model is trained and tested using the `MultinomialNB` algorithm, and the accuracy of the model is evaluated using training and test datasets.
## Dataset Description
The dataset contains the following features:
- **Age**: Age of the individual
- **Workclass**: Type of work (e.g., private, government)
- **Education**: Education level (e.g., Bachelor's, Master's)
- **Marital Status**: Marital status of the individual (e.g., married, divorced)
- **Occupation**: Job type (e.g., tech, healthcare)
- **Relationship**: Relationship status (e.g., husband, wife)
- **Race**: Race of the individual (e.g., White, Black)
- **Sex**: Gender of the individual (e.g., male, female)
- **Capital Gain**: Profit from investment
- **Capital Loss**: Loss from investment
- **Hours per Week**: Number of working hours per week
- **Native**: Native country
- **Salary**: Predicted salary category (e.g., >50K, <=50K)## Requirements
- Python 3.x
- Libraries:
- `pandas`
- `numpy`
- `scikit-learn`
- `matplotlib` (for visualization, if needed)## Installation
```bash
pip install pandas numpy scikit-learn matplotlib
```## Approach
1. **Data Preprocessing**:
- Clean the data by handling missing values and encoding categorical variables.
- Split the dataset into training and testing sets.2. **Modeling**:
- Use `MultinomialNB` from scikit-learn to train the Naive Bayes model.3. **Evaluation**:
- Evaluate model performance using accuracy score on training and test datasets.## Results
- **Training accuracy score**: 0.77
- **Test accuracy score**: 0.77## Conclusion
The Naive Bayes classifier performed well with an accuracy of 77% on both the training and test datasets. This indicates that the model can predict an individual's salary category based on the given features with reasonable accuracy.