https://github.com/amiriiw/text_classification

Welcome to the Text Classification Project! This project is designed to train a model for classifying texts based on their emotional content and then using it to categorize new texts into corresponding emotional categories.
https://github.com/amiriiw/text_classification

keras numpy pandas pickle scikit-learn tensorflow text-classification

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/amiriiw/text_classification
Owner: amiriiw
License: mit
Created: 2024-08-13T05:03:12.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-10-28T07:34:39.000Z (8 months ago)
Last Synced: 2025-02-13T17:18:09.007Z (5 months ago)
Topics: keras, numpy, pandas, pickle, scikit-learn, tensorflow, text-classification
Language: Python
Homepage:
Size: 26.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Sentiment Detection using LSTM and PostgreSQL

This project includes two main scripts for detecting sentiment (positive or negative) in text: `train.py` for training the model and `detect.py` for predicting the sentiment in new text inputs, then saving the results to a PostgreSQL database.

## Features
- **Sentiment Classification**: Classifies text into positive or negative sentiment using an LSTM model.
- **Database Integration**: Stores detected sentiments in a PostgreSQL database for record-keeping.
- **Tokenization**: Uses tokenizers to prepare text data for model input.

## Libraries Used
- [TensorFlow](https://www.tensorflow.org/) for deep learning model development and training.
- [Pandas](https://pandas.pydata.org/) for data loading and preprocessing.
- [scikit-learn](https://scikit-learn.org/) for data splitting into training and testing sets.
- [Psycopg2](https://www.psycopg.org/) for database interaction with PostgreSQL.
- [Numpy](https://numpy.org/) for numerical operations and data handling.

## Introduction

### File: `train.py`
This file is responsible for training an LSTM model to detect sentiment in text. Key classes and functions include:

- **Class ModelTrainer**: Manages tokenizer setup, data preparation, model building, and training.
- `__init__(self, vocab_size, max_length, embedding_dim)`: Initializes model parameters.
- `load_data(self, file_path, test_size)`: Loads data from a CSV file and splits it for training and testing.
- `build_model(self)`: Builds and configures the LSTM model.
- `train_model(self, train_data, train_labels, test_data, test_labels, epochs, batch_size)`: Trains the model with input data.
- `save_model(self, file_path)`: Saves the trained model to the specified path.
- `save_tokenizers(self, tokenizer_path, label_tokenizer_path)`: Saves the text and label tokenizers.

### File: `detect.py`
This file is used for detecting sentiment in new text inputs and saving the results to a PostgreSQL database. Key classes and functions include:

- **Class EmotionClassifier**: Manages the model, tokenizer, database connection, and prediction functions.
- `__init__(self, model_path, tokenizer_path, label_tokenizer_path, db_params)`: Initializes the model, tokenizers, and database connection.
- `connect_db(self)`: Connects to a PostgreSQL database.
- `create_table(self)`: Creates a table for storing sentiment results if it doesn’t already exist.
- `predict_emotion(self, sentence)`: Predicts the sentiment of a sentence.
- `classify_sentences(self, input_file)`: Classifies sentences from a file and saves them to the database.
- `close_db(self)`: Closes the database connection.

## Usage

### Training the Model
1. Ensure there is a `dataset.csv` file with `text` and `label` columns in the project directory. Labels should be "positive" or "negative."
2. Run `train.py`:

```bash
python3 train.py
```

3. This will train the model and save the trained model and tokenizers to the current directory.

### Detecting Sentiment
1. Ensure PostgreSQL is set up and connection parameters are configured in `detect.py`.
2. Run `detect.py`:

```bash
python3 detect.py
```

3. This script will classify the sentences in `text.txt` and save the sentiment results to the database.

## Installation
1. Clone this repository:

```bash
git clone https://github.com/amiriiw/text_classification
cd text_classification
cd Text_classification
```

2. Install the required packages:

```bash
pip3 install -r requirements.txt
```

3. Ensure PostgreSQL is installed, and create a database for this project.

4. Download the dataset via this link: [Drive](https://drive.google.com/drive/folders/1wazzbRMNZFaLdFZCWYyLhKHUBzfc4adV?usp=sharing)

## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amiriiw/text_classification

Awesome Lists containing this project

README