https://github.com/bhaveshbhakta/personality-classification-using-ml

Personality Classification
https://github.com/bhaveshbhakta/personality-classification-using-ml

data-visualization machine-learning machine-learning-algorithms personality-classification

Last synced: 1 day ago
JSON representation

Personality Classification

Host: GitHub
URL: https://github.com/bhaveshbhakta/personality-classification-using-ml
Owner: BhaveshBhakta
Created: 2025-02-24T08:33:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-09-08T16:54:48.000Z (10 months ago)
Last Synced: 2025-10-04T13:56:40.424Z (9 months ago)
Topics: data-visualization, machine-learning, machine-learning-algorithms, personality-classification
Language: Jupyter Notebook
Homepage:
Size: 2.27 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Personality Classification

### Project Overview

This project aims to classify **personality types** based on responses to a psychological test. Using a dataset of responses to a 60-question test (with scores ranging from -3 to 3), the goal is to develop a machine learning model that can accurately predict one of the 16 distinct personality types (MBTI types). This is a challenging multi-class classification task with applications in psychology, human resources, and self-assessment tools.

-----

### Technical Highlights

* **Dataset**: [Kaggle - 60k responses of 16 Personalities Test (MBTI)](https://www.kaggle.com/datasets/anshulmehtakaggl/60k-responses-of-16-personalities-test-mbt)
* **Size**: 59,999 entries, 62 columns.
* **Key Features**:
* 60 numerical features representing responses to a personality test.
* **Approach**:
* **Data Cleaning**: The dataset was clean with no missing values or duplicates. The `Response Id` column was dropped as it is a unique identifier.
* **Exploratory Data Analysis**: The code checks basic statistics, null values, duplicates, and unique values for all columns. The target variable `Personality` is well-balanced across all 16 classes.
* **Label Encoding**: Applied to the target `Personality` column to convert it into a numerical format for multi-class classification.
* **Multi-class Classification**: The target variable `Personality` has 16 distinct categories.
* **Models Used**:
* Logistic Regression, Ridge Classifier, SVC, Random Forest, XGBoost, AdaBoost, Gradient Boosting, Bagging, Decision Tree.
* **Best Accuracy**:
* **97.7%** with XGBoost Classifier.
* **97.4%** with Random Forest Classifier.
* **94.5%** with Gradient Boosting Classifier.
* The very high accuracies for the ensemble models suggest that the test responses provide very strong discriminative power for personality classification.

-----

### Purpose and Applications

* **Automated Personality Assessment**: Enables a quick and accurate classification of personality types from test responses.
* **Psychological Research**: Supports research in personality psychology and behavior analysis.
* **Human Resources**: Assists in team building, career guidance, and job-role matching.
* **Self-Improvement**: Provides a tool for individuals to better understand their own personality traits.

-----

### Installation

Clone the repository and extract the data from the zip file.

Install the necessary libraries:

```bash
pip install pandas numpy seaborn matplotlib scikit-learn xgboost
```

-----

### Collaboration

We welcome contributions to improve the project. You can help by:

* Performing comprehensive hyperparameter tuning and cross-validation for the top-performing models to ensure robustness.
* Investigating the impact of different preprocessing techniques.
* Adding explainability (e.g., SHAP or LIME) to understand which questions or groups of questions are the most critical for classifying a specific personality type.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bhaveshbhakta/personality-classification-using-ml

Awesome Lists containing this project

README