An open API service indexing awesome lists of open source software.

https://github.com/snigdho8869/multiclass-text-classification

Natural Language Processing for Multiclass Classification: A repository containing NLP techniques for multiclass classification of text data.
https://github.com/snigdho8869/multiclass-text-classification

adaboost-classifier bert-fine-tuning cnn-model deep-learning ensemble-learning flask flask-application gradient-boosting-classifier gru keras-tensorflow logistic-regression machine-learning natural-language-processing neural-network nlp random-forest-classifier rnn-lstm support-vector-machines text-classification xlnet-fine-tuning

Last synced: 12 days ago
JSON representation

Natural Language Processing for Multiclass Classification: A repository containing NLP techniques for multiclass classification of text data.

Awesome Lists containing this project

README

        

# Multiclass Text Classification Project

## Project Overview
The goal of this project is to classify text data into predefined categories using a combination of traditional machine learning models and deep learning architectures. The project includes:
- A **Flask-based web application** for interactive text classification.
- **Preprocessing** of text data, including cleaning, tokenization, and lemmatization.
- Training and evaluation of multiple models, including:
- Traditional ML models: Logistic Regression, SVM, Naive Bayes, Random Forest, Gradient Boosting, AdaBoost, and an Ensemble model.
- Deep learning models: LSTM, GRU, CNN, and a hybrid LSTM+CNN model.
- Fine-tuning of transformer-based models: BERT and XLNet using **ktrain**.
- Visualization of results, including confusion matrices, accuracy plots, and word clouds.

---

# Requirements:

* Python

* Scikit-learn

* TensorFlow

* Keras

# Dataset:

The dataset used in this project is the bbc-tex dataset, which consists of approximately 2225 text.

# Results:
The results of each model on the bbc-text dataset are as follows:

| Model | Accuracy |
|----------|----------|
| Logistic Regression | 96.58% |
| Support Vector Machine | 96.94% |
| Multinomial Naive Bayes | 94.97% |
| Randomforest | 95.15% |
| GradientBoostingClassifier | 94.25% |
| Ensemble Classifier | 97.12% |
| AdaBoost | 94.43% |
| LSTM 1-Layer | 99.22% |
| LSTM 2-Layers | 97.78% |
| GRU | 91.74% |
| CNN+LSTM | 98.73% |
| BERT | 99.60% |
| XLNet | 99.46% |

# Application Interface

Original Image