Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shaheennabi/end-to-end-nlp-project-with-lstm
🎇 NLP Hate Speech Detection with LSTM 🎆 An LSTM-based model for hate speech detection 💬, tackling dataset imbalance through data combination, applying tokenization, stopword removal, and using Keras embedding layers for effective text classification. Continuously improving with each experiment! 🔥
https://github.com/shaheennabi/end-to-end-nlp-project-with-lstm
circcleci docker hate-speech-detection keras tensorflow
Last synced: 12 days ago
JSON representation
🎇 NLP Hate Speech Detection with LSTM 🎆 An LSTM-based model for hate speech detection 💬, tackling dataset imbalance through data combination, applying tokenization, stopword removal, and using Keras embedding layers for effective text classification. Continuously improving with each experiment! 🔥
- Host: GitHub
- URL: https://github.com/shaheennabi/end-to-end-nlp-project-with-lstm
- Owner: shaheennabi
- License: mit
- Created: 2024-10-15T07:42:54.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-11-15T08:35:47.000Z (3 months ago)
- Last Synced: 2025-01-21T09:11:55.070Z (12 days ago)
- Topics: circcleci, docker, hate-speech-detection, keras, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 5.6 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🎆🎉 End-to-End-NLP-Project-with-RNN 🎉🎆
This project aims to classify hate speech using a deep learning model. The dataset for this project was sourced from Kaggle and underwent significant preprocessing to ensure a balanced and clean training set.
### 🎇 Project Overview 🎇
In this project, I tackled the challenge of detecting hate speech using an LSTM-based model. The dataset contained inherent imbalance, so I combined two separate datasets into one during preprocessing to ensure more robust model performance.### ✨ Key Features:
* **Model Architecture**: LSTM (Long Short-Term Memory) neural network for text classification.
* **Embedding Layer**: Used Keras' embedding layer to handle word representations.
* **Model Size**: The model consists of approximately 5,080,501 total parameters.
* **Data Preprocessing**:
- Combined two datasets to address imbalance issues.
- Applied text preprocessing steps such as tokenization, lowercasing, and removal of stopwords.# 🎉 Project Tree Structure 🎉
``` bash.
├── END-TO-END-NLP-PROJECT-WITH-LSTM
├── .circleci/
│ └── config.yml
├── artifact/
│ ├── 10_05_2024_03_23_14 (or time Stamp)/
│ │ ├── DataIngestionArtifacts/
│ │ │ ├── dataset.zip
│ │ │ ├── imbalanced_data.csv
│ │ │ └── raw_data.csv
│ │ ├── DataValidationArtifacts/
│ │ │ └── status.txt
│ │ ├── DataTransformationArtifacts/
│ │ │ └── final.csv
│ │ ├── ModelTrainerArtifacts/
│ │ │ ├── model.h5
│ │ │ ├── x_test.csv
│ │ │ ├── x_train.csv
│ │ │ └── y_test.csv
│ │ └── ModelEvaluationArtifacts /
│ │ └── best_model/
│ │ └── model.h5
│ └── PredicModel /
│ └── model.h5
├── data/
│ └── dataset.zip
├── Hate or src/
│ ├── components/
│ │ ├── __pychache__/
│ │ ├── __init__.py
│ │ ├── data_ingestion.py
│ │ ├── data_validation.py
│ │ ├── data_transformation.py
│ │ ├── model_evaluation.py
│ │ ├── model_pusher.py
│ │ └── model_trainer.py
│ ├── configuration/
│ │ ├── __pycache__
│ │ ├── __init__.py
│ │ └── s3_syncer.py
│ ├── constants/
│ │ ├── __pycache__/
│ │ └── __init__.py
│ ├── entity/
│ │ ├── __pycache__/
│ │ ├── __init__.py
│ │ ├── artifact_entity.py
│ │ └── config_entity.py
│ ├── exception/
│ │ ├── __pycache__/
│ │ └── __init__.py
│ ├── logger/
│ │ ├── __pycache__/
│ │ └── __init__py
│ ├── ml/
│ │ ├── __init__.py
│ │ └── model.py
│ └── pipeline/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── training_pipeline.py
│ └── prediction_pipeline.py
├── logs/
│ └── 10_05_2024_03_23_14.log/
│ └── 10_05_2024_03_23_14.log
├── Notebook/
│ └── Hate_speech_experiment.ipynb
├── s3_downloads/
│ └── dataset.zip
├── app.py
├── circleci_setup_template.sh
├── Dockerfile
├── README.md
├── requirements.txt
├── setup.py
└── template.py```
## How to run?
``` bash
conda create -n hate python=3.8 -y
`````` bash
conda activate hate
`````` bash
pip install -r requirements.txt
```### Export the environment variable(gitbash)
``` bash
export AWS_ACCESS_KEY_ID="your access key"
```
``` bash
export AWS_SECRET_ACCESS_KEY="your secret access key"
```
``` bash
export AWS_DEFAULT_REGION="e.g, us-east-1"
```🎆🎉 # Workflow 🎉🎆
After creating the project template:
* 🔥 Update constants
* 🔥 Update Entity modules
* 🔥 Update respective component
* 🔥 Update pipeline## 🎆 Deployment 🎆
1. Setting up CircleCI 🎉
2. Switching on self-hosted runner 💥
3. Creating Project 🎊
4. Configuring EC2 🚀
5. Writing `config.yml` 📜
6. Setting environment variables 🔒## Code Training & Prediction pipeline working properly, updating my circleCI CICD deployment in future.