https://github.com/shivamkumar818/mbti-personality-prediction-from-text-data
MBTI Personality Prediction from Text Data This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts.
https://github.com/shivamkumar818/mbti-personality-prediction-from-text-data
confusion-matrix correlation-matrix data-visualization dataset knn-classification linear-regression logistic-regression modeltraining navebayes numpy pandas python
Last synced: 2 months ago
JSON representation
MBTI Personality Prediction from Text Data This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts.
- Host: GitHub
- URL: https://github.com/shivamkumar818/mbti-personality-prediction-from-text-data
- Owner: shivamkumar818
- Created: 2024-11-09T15:54:45.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-11-09T16:14:09.000Z (6 months ago)
- Last Synced: 2025-01-26T15:29:23.304Z (4 months ago)
- Topics: confusion-matrix, correlation-matrix, data-visualization, dataset, knn-classification, linear-regression, logistic-regression, modeltraining, navebayes, numpy, pandas, python
- Language: Jupyter Notebook
- Homepage:
- Size: 25 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
MBTI Personality Prediction from Text Data
This project leverages machine learning to predict Myers-Briggs Type Indicator (MBTI) personality types based on textual data, specifically from social media posts. The objective is to replace traditional, self-reported MBTI questionnaires with a data-driven approach that offers a less intrusive, objective, and scalable solution for personality prediction.
Key Features:
Problem Definition: Automate MBTI personality typing using text data to enhance various applications, including personalized recommendations, team dynamics, mental health support, and marketing strategies.
Data Processing: Loaded text data and preprocessed it by encoding MBTI types into binary features. Vectorization was achieved using CountVectorizer and TF-IDF.
Modeling Approach: Trained individual models for each MBTI dimension (IE, NS, TF, JP) using machine learning classifiers, including Naive Bayes, Logistic Regression, and K-Nearest Neighbors.
Hyperparameter Tuning: Implemented GridSearchCV for model optimization to improve classification accuracy.
Evaluation: Assessed performance using classification reports, confusion matrices, and ROC curves.
Libraries Used
Data Processing: pandas, numpy, nltk, reMachine Learning: scikit-learn (Naive Bayes, Logistic Regression, KNN), TruncatedSVD for dimensionality reduction
Visualization: matplotlib, seaborn, plotly