https://github.com/prince2004patel/sentiment-spam-detection
https://github.com/prince2004patel/sentiment-spam-detection
flask machine-learning natural-language-processing python sentiment-analysis streamlit
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/prince2004patel/sentiment-spam-detection
- Owner: prince2004patel
- Created: 2025-01-08T15:52:04.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-01-23T15:10:57.000Z (over 1 year ago)
- Last Synced: 2025-03-01T22:28:12.259Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage: https://sentiment-spam-detection-by-prince.streamlit.app/
- Size: 37.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sentiment Analysis and Spam Detection
## Live Demo :-
[](https://sentiment-spam-detection-by-prince.streamlit.app/)
## Best Practices (As Suggested by Krish Naik)
### 1. Preprocessing and Cleaning
- **Text Cleaning**: Remove stop words, punctuation, special characters, and handle case sensitivity.
- **Tokenization**: Split the text into individual words (tokens).
- **Lemmatization**: Reduce words to their base or root form to improve uniformity.
- **Removing Duplicates**: Remove duplicate entries to avoid biases in model training.
### 2. Train Test Split
- **Data Split**: Split the dataset into a training set (typically 80%) and a test set (20%) to evaluate the model's performance on unseen data.
### 3. Feature Extraction: BOW, TF-IDF, Word2Vec
- **Bag of Words (BOW)**: Convert the text into a set of features based on word frequencies.
- **TF-IDF**: Weigh words based on their importance using term frequency and inverse document frequency.
- **Word2Vec**: Generate word vectors that capture semantic relationships between words.
### 4. Training ML Algorithms
- Start with simple and effective models like **Multinomial Naive Bayes (MNB)**, and explore other algorithms like **Support Vector Machines (SVM)**, **Logistic Regression**, or **Random Forest** for more complex tasks.
## Models Used
### Sentiment Analysis
1. **Bag of Words (BoW) with Multinomial Naive Bayes (MNB)**
- **Accuracy**: 85%
- This model uses the Bag of Words approach to convert text into feature vectors and then classifies sentiments using Multinomial Naive Bayes.
2. **TF-IDF (Term Frequency-Inverse Document Frequency)**
- **Accuracy**: 70%
- TF-IDF evaluates the importance of words based on their frequency and inverse document frequency, and uses a machine learning classifier to predict sentiment.
3. **Average Word2Vec**
- **Accuracy**: 75%
- This model uses Word2Vec to convert words into vectors and averages the vectors to classify the sentiment of the text.
### Spam Detection
1. **Bag of Words (BoW) with Multinomial Naive Bayes (MNB)**
- **Accuracy**: 98%
- This model uses the Bag of Words approach to convert text into feature vectors and then classifies messages as Spam or Ham using Multinomial Naive Bayes.
2. **TF-IDF**
- **Accuracy**: 97%
- This model evaluates the importance of words and classifies messages based on term frequency and inverse document frequency.
3. **Word2Vec**
- **Accuracy**: 94%
- This model uses Word2Vec to generate word embeddings and classifies the messages.
## Setup Instructions
### 1. Clone the Repository
To clone the repository, use the following command:
```bash[
git clone https://github.com/prince2004patel/sentiment-spam-detection.git
```
### 2. Install Dependencies
To install the required dependencies for this project, use the following command:
1. Ensure you are in the project directory:
```bash
cd sentiment-spam-detection
```
2. This will install all the necessary Python libraries required for both the Flask back-end and Streamlit front-end to work properly:
```bash
pip install -r requirements.txt
```
### 3. Run the Application
1. Ensure you are in the project directory:
```bash
cd sentiment-spam-detection
```
2. Start the Streamlit app:
```bash
streamlit run app.py
```
## Some Screenshots


