https://github.com/snigdho8869/multiclass-text-classification
Natural Language Processing for Multiclass Classification: A repository containing NLP techniques for multiclass classification of text data.
https://github.com/snigdho8869/multiclass-text-classification
adaboost-classifier bert-fine-tuning cnn-model deep-learning ensemble-learning flask flask-application gradient-boosting-classifier gru keras-tensorflow logistic-regression machine-learning natural-language-processing neural-network nlp random-forest-classifier rnn-lstm support-vector-machines text-classification xlnet-fine-tuning
Last synced: 12 days ago
JSON representation
Natural Language Processing for Multiclass Classification: A repository containing NLP techniques for multiclass classification of text data.
- Host: GitHub
- URL: https://github.com/snigdho8869/multiclass-text-classification
- Owner: Snigdho8869
- Created: 2023-04-03T19:14:54.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-16T21:07:19.000Z (3 months ago)
- Last Synced: 2025-04-15T06:49:16.993Z (2 months ago)
- Topics: adaboost-classifier, bert-fine-tuning, cnn-model, deep-learning, ensemble-learning, flask, flask-application, gradient-boosting-classifier, gru, keras-tensorflow, logistic-regression, machine-learning, natural-language-processing, neural-network, nlp, random-forest-classifier, rnn-lstm, support-vector-machines, text-classification, xlnet-fine-tuning
- Language: Jupyter Notebook
- Homepage:
- Size: 11.3 MB
- Stars: 24
- Watchers: 2
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multiclass Text Classification Project
## Project Overview
The goal of this project is to classify text data into predefined categories using a combination of traditional machine learning models and deep learning architectures. The project includes:
- A **Flask-based web application** for interactive text classification.
- **Preprocessing** of text data, including cleaning, tokenization, and lemmatization.
- Training and evaluation of multiple models, including:
- Traditional ML models: Logistic Regression, SVM, Naive Bayes, Random Forest, Gradient Boosting, AdaBoost, and an Ensemble model.
- Deep learning models: LSTM, GRU, CNN, and a hybrid LSTM+CNN model.
- Fine-tuning of transformer-based models: BERT and XLNet using **ktrain**.
- Visualization of results, including confusion matrices, accuracy plots, and word clouds.---
# Requirements:
* Python
* Scikit-learn
* TensorFlow
* Keras
# Dataset:
The dataset used in this project is the bbc-tex dataset, which consists of approximately 2225 text.
# Results:
The results of each model on the bbc-text dataset are as follows:| Model | Accuracy |
|----------|----------|
| Logistic Regression | 96.58% |
| Support Vector Machine | 96.94% |
| Multinomial Naive Bayes | 94.97% |
| Randomforest | 95.15% |
| GradientBoostingClassifier | 94.25% |
| Ensemble Classifier | 97.12% |
| AdaBoost | 94.43% |
| LSTM 1-Layer | 99.22% |
| LSTM 2-Layers | 97.78% |
| GRU | 91.74% |
| CNN+LSTM | 98.73% |
| BERT | 99.60% |
| XLNet | 99.46% |# Application Interface
![]()