An open API service indexing awesome lists of open source software.

https://github.com/shivamkumar818/mbti-personality-prediction-from-text-data

MBTI Personality Prediction from Text Data This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts.
https://github.com/shivamkumar818/mbti-personality-prediction-from-text-data

confusion-matrix correlation-matrix data-visualization dataset knn-classification linear-regression logistic-regression modeltraining navebayes numpy pandas python

Last synced: 2 months ago
JSON representation

MBTI Personality Prediction from Text Data This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts.

Awesome Lists containing this project

README

        

MBTI Personality Prediction from Text Data

This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts. The objective is to replace traditional, self-reported MBTI questionnaires with a data-driven approach that offers a less intrusive, objective, and scalable solution for personality prediction.

Key Features:

Problem Definition: Automate MBTI personality typing using text data to enhance various applications, including personalized recommendations, team dynamics, mental health support, and marketing strategies.

Data Processing: Loaded text data and preprocessed it by encoding MBTI types into binary features. Vectorization was achieved using CountVectorizer and TF-IDF.

Modeling Approach: Trained individual models for each MBTI dimension (IE, NS, TF, JP) using machine learning classifiers, including Naive Bayes, Logistic Regression, and K-Nearest Neighbors.

Hyperparameter Tuning: Implemented GridSearchCV for model optimization to improve classification accuracy.

Evaluation: Assessed performance using classification reports, confusion matrices, and ROC curves.

Libraries Used
Data Processing: pandas, numpy, nltk, re

Machine Learning: scikit-learn (Naive Bayes, Logistic Regression, KNN), TruncatedSVD for dimensionality reduction

Visualization: matplotlib, seaborn, plotly