Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amiriiw/text_classification
Welcome to the Text Classification Project! This project is designed to train a model for classifying texts based on their emotional content and then using it to categorize new texts into corresponding emotional categories.
https://github.com/amiriiw/text_classification
keras numpy pandas pickle scikit-learn tensorflow text-classification
Last synced: about 2 months ago
JSON representation
Welcome to the Text Classification Project! This project is designed to train a model for classifying texts based on their emotional content and then using it to categorize new texts into corresponding emotional categories.
- Host: GitHub
- URL: https://github.com/amiriiw/text_classification
- Owner: amiriiw
- License: mit
- Created: 2024-08-13T05:03:12.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-28T07:34:39.000Z (4 months ago)
- Last Synced: 2024-11-03T03:43:01.633Z (3 months ago)
- Topics: keras, numpy, pandas, pickle, scikit-learn, tensorflow, text-classification
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Detection using LSTM and PostgreSQL
This project includes two main scripts for detecting sentiment (positive or negative) in text: `train.py` for training the model and `detect.py` for predicting the sentiment in new text inputs, then saving the results to a PostgreSQL database.
## Features
- **Sentiment Classification**: Classifies text into positive or negative sentiment using an LSTM model.
- **Database Integration**: Stores detected sentiments in a PostgreSQL database for record-keeping.
- **Tokenization**: Uses tokenizers to prepare text data for model input.## Libraries Used
- [TensorFlow](https://www.tensorflow.org/) for deep learning model development and training.
- [Pandas](https://pandas.pydata.org/) for data loading and preprocessing.
- [scikit-learn](https://scikit-learn.org/) for data splitting into training and testing sets.
- [Psycopg2](https://www.psycopg.org/) for database interaction with PostgreSQL.
- [Numpy](https://numpy.org/) for numerical operations and data handling.## Introduction
### File: `train.py`
This file is responsible for training an LSTM model to detect sentiment in text. Key classes and functions include:- **Class ModelTrainer**: Manages tokenizer setup, data preparation, model building, and training.
- `__init__(self, vocab_size, max_length, embedding_dim)`: Initializes model parameters.
- `load_data(self, file_path, test_size)`: Loads data from a CSV file and splits it for training and testing.
- `build_model(self)`: Builds and configures the LSTM model.
- `train_model(self, train_data, train_labels, test_data, test_labels, epochs, batch_size)`: Trains the model with input data.
- `save_model(self, file_path)`: Saves the trained model to the specified path.
- `save_tokenizers(self, tokenizer_path, label_tokenizer_path)`: Saves the text and label tokenizers.### File: `detect.py`
This file is used for detecting sentiment in new text inputs and saving the results to a PostgreSQL database. Key classes and functions include:- **Class EmotionClassifier**: Manages the model, tokenizer, database connection, and prediction functions.
- `__init__(self, model_path, tokenizer_path, label_tokenizer_path, db_params)`: Initializes the model, tokenizers, and database connection.
- `connect_db(self)`: Connects to a PostgreSQL database.
- `create_table(self)`: Creates a table for storing sentiment results if it doesn’t already exist.
- `predict_emotion(self, sentence)`: Predicts the sentiment of a sentence.
- `classify_sentences(self, input_file)`: Classifies sentences from a file and saves them to the database.
- `close_db(self)`: Closes the database connection.## Usage
### Training the Model
1. Ensure there is a `dataset.csv` file with `text` and `label` columns in the project directory. Labels should be "positive" or "negative."
2. Run `train.py`:```bash
python3 train.py
```3. This will train the model and save the trained model and tokenizers to the current directory.
### Detecting Sentiment
1. Ensure PostgreSQL is set up and connection parameters are configured in `detect.py`.
2. Run `detect.py`:```bash
python3 detect.py
```3. This script will classify the sentences in `text.txt` and save the sentiment results to the database.
## Installation
1. Clone this repository:```bash
git clone https://github.com/amiriiw/text_classification
cd text_classification
cd Text_classification
```2. Install the required packages:
```bash
pip3 install -r requirements.txt
```3. Ensure PostgreSQL is installed, and create a database for this project.
4. Download the dataset via this link: [Drive](https://drive.google.com/drive/folders/1wazzbRMNZFaLdFZCWYyLhKHUBzfc4adV?usp=sharing)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.