https://github.com/bhaveshbhakta/personality-classification-using-ml
Personality Classification
https://github.com/bhaveshbhakta/personality-classification-using-ml
data-visualization machine-learning machine-learning-algorithms personality-classification
Last synced: 1 day ago
JSON representation
Personality Classification
- Host: GitHub
- URL: https://github.com/bhaveshbhakta/personality-classification-using-ml
- Owner: BhaveshBhakta
- Created: 2025-02-24T08:33:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-08T16:54:48.000Z (10 months ago)
- Last Synced: 2025-10-04T13:56:40.424Z (9 months ago)
- Topics: data-visualization, machine-learning, machine-learning-algorithms, personality-classification
- Language: Jupyter Notebook
- Homepage:
- Size: 2.27 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Personality Classification
### Project Overview
This project aims to classify **personality types** based on responses to a psychological test. Using a dataset of responses to a 60-question test (with scores ranging from -3 to 3), the goal is to develop a machine learning model that can accurately predict one of the 16 distinct personality types (MBTI types). This is a challenging multi-class classification task with applications in psychology, human resources, and self-assessment tools.
-----
### Technical Highlights
* **Dataset**: [Kaggle - 60k responses of 16 Personalities Test (MBTI)](https://www.kaggle.com/datasets/anshulmehtakaggl/60k-responses-of-16-personalities-test-mbt)
* **Size**: 59,999 entries, 62 columns.
* **Key Features**:
* 60 numerical features representing responses to a personality test.
* **Approach**:
* **Data Cleaning**: The dataset was clean with no missing values or duplicates. The `Response Id` column was dropped as it is a unique identifier.
* **Exploratory Data Analysis**: The code checks basic statistics, null values, duplicates, and unique values for all columns. The target variable `Personality` is well-balanced across all 16 classes.
* **Label Encoding**: Applied to the target `Personality` column to convert it into a numerical format for multi-class classification.
* **Multi-class Classification**: The target variable `Personality` has 16 distinct categories.
* **Models Used**:
* Logistic Regression, Ridge Classifier, SVC, Random Forest, XGBoost, AdaBoost, Gradient Boosting, Bagging, Decision Tree.
* **Best Accuracy**:
* **97.7%** with XGBoost Classifier.
* **97.4%** with Random Forest Classifier.
* **94.5%** with Gradient Boosting Classifier.
* The very high accuracies for the ensemble models suggest that the test responses provide very strong discriminative power for personality classification.
-----
### Purpose and Applications
* **Automated Personality Assessment**: Enables a quick and accurate classification of personality types from test responses.
* **Psychological Research**: Supports research in personality psychology and behavior analysis.
* **Human Resources**: Assists in team building, career guidance, and job-role matching.
* **Self-Improvement**: Provides a tool for individuals to better understand their own personality traits.
-----
### Installation
Clone the repository and extract the data from the zip file.
Install the necessary libraries:
```bash
pip install pandas numpy seaborn matplotlib scikit-learn xgboost
```
-----
### Collaboration
We welcome contributions to improve the project. You can help by:
* Performing comprehensive hyperparameter tuning and cross-validation for the top-performing models to ensure robustness.
* Investigating the impact of different preprocessing techniques.
* Adding explainability (e.g., SHAP or LIME) to understand which questions or groups of questions are the most critical for classifying a specific personality type.