https://github.com/mianmharoon/sentimentanalysis_coreml_emotionclassifier

Emotion classification iOS app using CoreML and SwiftUI – demo for sentiment and emotion analysis, with the model converted from Scikit-learn using coremltools.
https://github.com/mianmharoon/sentimentanalysis_coreml_emotionclassifier
ai coreml coreml-models emotionclassification ios machinelearning nlp python3 scikit-learn sentimentanalysis swift swiftui
Last synced: 2 months ago
JSON representation
Emotion classification iOS app using CoreML and SwiftUI – demo for sentiment and emotion analysis, with the model converted from Scikit-learn using coremltools.
Host: GitHub
URL: https://github.com/mianmharoon/sentimentanalysis_coreml_emotionclassifier
Owner: MianMHaroon
License: mit
Created: 2025-10-28T13:22:50.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-10-28T15:16:43.000Z (8 months ago)
Last Synced: 2025-10-28T15:24:00.464Z (8 months ago)
Topics: ai, coreml, coreml-models, emotionclassification, ios, machinelearning, nlp, python3, scikit-learn, sentimentanalysis, swift, swiftui
Language: Swift
Homepage:
Size: 1.28 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # SentimentAnalysis_CoreMLDemo_EmotionClassifier_iOS_SwiftUI

This is an **iOS demo application** for **emotion classification** using **CoreML** and **SwiftUI**.

The project follows **MVVM architecture** with **async/await** and **actor-based model management**.

The **CoreML model** was converted from a **Scikit-learn pipeline** using **coremltools Python package**, including:

* `DictVectorizer` for feature extraction

* `LinearSVC` for classification



 



---

## Dataset

The training dataset is from [Kaggle: Emotions Dataset for NLP](https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp/data). But I have added the preprocessed and formatted CSV data in the ***converter*** folder for direct use in training.

### Steps:

1. Original text files converted to **CSV** with two columns:

   * `text` → user sentence or phrase

   * `emotion` → label (e.g., joy, sadness, anger, fear, love, surprise)

2. Python example to convert `.txt` → `.csv`:

```python

import csv

from pathlib import Path

# Input and output file paths

input_file = Path("val.txt")

output_file = Path("val.csv")

# Ensure input exists

if not input_file.exists():

    raise FileNotFoundError(f"❌ Input file not found: {input_file}")

# Open the input text file and create the output CSV file

with input_file.open("r", encoding="utf-8") as infile, output_file.open("w", newline="", encoding="utf-8") as outfile:

    writer = csv.writer(outfile)

    writer.writerow(["text", "emotion"])  # CSV header

    

    count = 0

    for line in infile:

        line = line.strip()

        if not line:

            continue

        if ";" in line:

            text, emotion = line.split(";", 1)

            writer.writerow([text.strip(), emotion.strip()])

            count += 1

        else:

            print(f"⚠️ Skipping malformed line: {line}")

    

print(f"✅ Conversion complete! {count} entries saved to '{output_file.name}'")

```

---

## Python Setup

1. Install **Python 3.9+**

2. Install dependencies:

```bash

pip install pandas numpy nltk scikit-learn coremltools

```

3. Download NLTK data:

Step 1: Open Python 3 in your terminal

```python

python3

```

Step 2: Run the following commands in the Python prompt (>>>)

```python

>>> import nltk

>>> nltk.download('punkt')

>>> nltk.download('stopwords')

```

Step 3: Exit Python

```python

>>> exit()

```

---

## Training the Model

Run:

```bash

python3 train_emotion_model.py

```

**Steps performed:**

* Load CSV data

* Preprocess text

* Extract bag-of-words features

* Train **LinearSVC** using GridSearchCV

* Save **CoreML model** as `EmotionClassifier.mlmodel`

---

# How the Model Learns (Step by Step Example)

This section explains how the emotion classification model learns from training data using a step-by-step example with multiple iterations.

---

## Example Training Dataset

| Text                         | Emotion  |

| ---------------------------- | -------- |

| "I am happy today"           | joy      |

| "I feel sad"                 | sadness  |

| "I am really angry"          | anger    |

| "I love my family"           | love     |

| "I am scared of the dark"    | fear     |

| "Wow, I didn’t expect that!" | surprise |

| "I am ecstatic and joyful!"  | joy      |

| "I am very upset and angry"  | anger    |

| "I adore my friends"         | love     |

| "The night makes me anxious" | fear     |

| "I am shocked by the news"   | surprise |

| "Feeling sad and lonely"     | sadness  |

---

## Step 1: Convert Text → Features (Bag-of-Words)

Take `"I am happy today"` → tokenize, remove stopwords:

```

['happy', 'today']

```

Feature dict (all words from dataset considered):

```

{'happy':1, 'today':1, 'sad':0, 'angry':0, 'love':0, 'scared':0, 'wow':0, 'expect':0, 'ecstatic':0, 'joyful':0, 'upset':0, 'family':0, 'friends':0, 'anxious':0, 'shocked':0, 'lonely':0}

```

---

## Step 2: Initialize Weights

Each emotion has a weight vector for all words (initialized to 0):

```

weights_joy = [happy=0, today=0, sad=0, angry=0, love=0, scared=0, wow=0, expect=0, ecstatic=0, joyful=0, upset=0, family=0, friends=0, anxious=0, shocked=0, lonely=0]

weights_sadness = same...

weights_anger = same...

weights_love = same...

weights_fear = same...

weights_surprise = same...

```

---

## Step 3: Training Iterations

### **Iteration 1**: "I am happy today" → joy

**Update weights:**

* Increase weights for the true emotion for the words present.

* Decrease weights for all other emotions for the words present.

```

weights_joy      = [happy=2, today=1, ...]

weights_sadness  = [happy=-1, today=-0.5, ...]

weights_anger    = [happy=-1, today=-0.5, ...]

weights_love     = [happy=-0.5, today=-0.2, ...]

weights_fear     = [happy=-0.5, today=-0.2, ...]

weights_surprise = [happy=-0.5, today=-0.2, ...]

```

### **Iteration 2**: "I feel sad" → sadness

```

weights_sadness  = [happy=-1, today=-0.5, sad=2, ...]

weights_joy      = [happy=2, today=1, sad=-1, ...]

weights_anger    = [happy=-1, today=-0.5, sad=-1, ...]

weights_love     = [happy=-0.5, today=-0.2, sad=-0.5, ...]

weights_fear     = [happy=-0.5, today=-0.2, sad=-0.5, ...]

weights_surprise = [happy=-0.5, today=-0.2, sad=-0.5, ...]

```

### **Iteration 3**: "I am really angry" → anger

```

weights_anger   = [angry=2, really=1, ...]

weights_joy     = decrease weights for angry, really

weights_sadness = decrease weights for angry, really

weights_love    = decrease weights for angry, really

weights_fear    = decrease weights for angry, really

weights_surprise= decrease weights for angry, really

```

### **Iteration 4**: "I love my family" → love

```

weights_love    = [love=2, family=1, ...]

```

### **Iteration 5**: "I am scared of the dark" → fear

```

weights_fear    = [scared=2, dark=1, ...]

```

### **Iteration 6**: "Wow, I didn’t expect that!" → surprise

```

weights_surprise = [wow=2, expect=1, ...]

```

### **Iteration 7**: "I am ecstatic and joyful!" → joy

```

weights_joy      = [happy=2, today=1, ecstatic=1, joyful=1, ...]

weights_other   = decrease weights for ecstatic, joyful

```

### **Iteration 8**: "I am very upset and angry" → anger

```

weights_anger    = [angry=3, upset=2, ...]

weights_other    = decrease weights for angry, upset

```

### **Iteration 9**: "I adore my friends" → love

```

weights_love     = [love=3, family=1, friends=1, ...]

```

### **Iteration 10**: "The night makes me anxious" → fear

```

weights_fear     = [scared=2, dark=1, anxious=1, ...]

```

### **Iteration 11**: "I am shocked by the news" → surprise

```

weights_surprise = [wow=2, expect=1, shocked=1, ...]

```

### **Iteration 12**: "Feeling sad and lonely" → sadness

```

weights_sadness  = [sad=3, lonely=1, ...]

```

## Final Weights Table (Word → Emotion)

| Word      | Joy | Sadness | Anger | Love | Fear | Surprise |

| --------- | --- | ------- | ----- | ---- | ---- | -------- |

| happy     | 2   | -1      | -1    | -0.5 | -0.5 | -0.5     |

| today     | 1   | -0.5    | -0.5  | -0.2 | -0.2 | -0.2     |

| sad       | -1  | 3       | -1    | -0.5 | -0.5 | -0.5     |

| angry     | -1  | -1      | 3     | -0.5 | -0.5 | -0.5     |

| love      | -0.5| -0.5    | -0.5  | 3    | -0.5 | -0.5     |

| scared    | -0.5| -0.5    | -0.5  | -0.5 | 2    | -0.5     |

| wow       | -0.5| -0.5    | -0.5  | -0.5 | -0.5 | 2        |

| expect    | -0.5| -0.5    | -0.5  | -0.5 | -0.5 | 1        |

| ecstatic  | 1   | -0.5    | -0.5  | -0.5 | -0.5 | -0.5     |

| joyful    | 1   | -0.5    | -0.5  | -0.5 | -0.5 | -0.5     |

| upset     | -0.5| -0.5    | 2     | -0.5 | -0.5 | -0.5     |

| family    | -0.5| -0.5    | -0.5  | 1    | -0.5 | -0.5     |

| friends   | -0.5| -0.5    | -0.5  | 1    | -0.5 | -0.5     |

| anxious   | -0.5| -0.5    | -0.5  | -0.5 | 1    | -0.5     |

| shocked   | -0.5| -0.5    | -0.5  | -0.5 | -0.5 | 1        |

| lonely    | -0.5| 1       | -0.5  | -0.5 | -0.5 | -0.5     |

> **Note:** Positive values indicate the strength of association between a word and an emotion. Negative values reduce association with other emotions. This table is used for predicting emotions of new sentences.

---

## Step 4: Prediction Formula

When a new sentence comes in, the model computes:

```

score_emotion = sum(word_count * weight_for_that_emotion)

```

### Example: Emotion Prediction for a Sentence

Sentence: "I feel so happy and excited today!"

Steps:

1. Tokenize

   ['i', 'feel', 'so', 'happy', 'and', 'excited', 'today']

2. Remove stopwords

   ['happy', 'excited', 'today']

3. Create feature dictionary

   {'happy': 1, 'excited': 1, 'today': 1}

4. Compute `score_emotion` for each class

| Emotion  | Score Calculation                         | Score |

| -------- | ----------------------------------------- | ----- |

| Joy      | 2(happy) + 1(excited) + 1(today)          | 4     |

| Sadness  | -1(happy) + -0.5(excited) + -0.5(today)   | -2    |

| Anger    | -1(happy) + -0.5(excited) + -0.5(today)   | -2    |

| Love     | -0.5(happy) + -0.5(excited) + -0.5(today) | -1.5  |

| Fear     | -0.5(happy) + -0.5(excited) + -0.5(today) | -1.5  |

| Surprise | -0.5(happy) + -0.5(excited) + -0.5(today) | -1.5  |

Predicted Emotion: Joy (highest score = 4)

---

# Evaluation Metrics Explained

This section provides detailed explanations of all the evaluation metrics used in the training report of the emotion classification model.

---

## 1. Precision

* **Definition:** Measures how many of the predicted instances for a class are actually correct.

* **Formula:**

```math

Precision = True Positives / (True Positives + False Positives)

```

* **Example:**

  * Model predicts 10 texts as `joy`

  * 8 of them are actually `joy`

  * Precision = 8 / 10 = 0.8

* **Interpretation:** High precision means fewer **false positives**.

---

## 2. Recall (Sensitivity or True Positive Rate)

* **Definition:** Measures how many of the actual instances of a class were correctly predicted.

* **Formula:**

```math

Recall = True Positives / (True Positives + False Negatives)

```

* **Example:**

  * There are 12 actual `joy` texts

  * Model correctly predicted 8 of them

  * Recall = 8 / 12 ≈ 0.667

* **Interpretation:** High recall means fewer **false negatives**.

---

## 3. F1-Score

* **Definition:** Harmonic mean of precision and recall, balancing both metrics.

* **Formula:**

```math

F1 = 2 * (Precision * Recall) / (Precision + Recall)

```

* **Example:**

  * Precision = 0.8, Recall = 0.667

  * F1 = 2 * (0.8 * 0.667) / (0.8 + 0.667) ≈ 0.727

* **Interpretation:** High F1-score → model is both accurate and complete for that class.

---

## 4. Support

* **Definition:** Number of actual instances of each class in the dataset.

* **Example:**

  * If there are 432 `anger` texts in validation set → support = 432

* **Interpretation:** Helps understand class distribution and performance per class.

---

## 5. Accuracy

* **Definition:** Overall ratio of correct predictions to total predictions.

* **Formula:**

```math

Accuracy = Total Correct Predictions / Total Predictions

```

* **Interpretation:** Overall measure of model correctness.

---

## 6. Macro Average (Macro Avg)

* **Definition:** Average of precision, recall, and F1-score across all classes equally, ignoring class imbalance.

* **Interpretation:** Treats all classes as equally important.

---

## 7. Weighted Average (Weighted Avg)

* **Definition:** Average of precision, recall, and F1-score weighted by number of instances (support) per class.

* **Interpretation:** Reflects performance considering class distribution.

---

## 8. Confusion Matrix

* **Definition:** Table showing the counts of actual vs predicted classes.

* **Structure:** Rows = actual classes, Columns = predicted classes

* **Example:**

| Actual \ Predicted | joy | sadness | anger | love | fear | surprise |

| ------------------ | --- | ------- | ----- | ---- | ---- | -------- |

| joy                | 8   | 3       | 1     | 0    | 0    | 0        |

* **Interpretation:** Helps identify which classes the model confuses. Each cell indicates how many instances of an actual class were predicted as a particular class.

---

# Actual Test Report Discussion

This section explains the training and validation results obtained from the emotion classification model.

---

## Training and Validation Accuracy

* **Training Accuracy:** 0.9817

  * Indicates that the model correctly predicted 98.17% of training examples.

  * High accuracy shows the model has learned the patterns in the training data well.

* **Validation Accuracy:** 0.8978

  * Indicates 89.78% correct predictions on unseen validation data.

  * Slightly lower than training accuracy, showing a good balance and not much overfitting.

---

## Classification Report

| Emotion  | Precision | Recall | F1-Score | Support |

| -------- | --------- | ------ | -------- | ------- |

| anger    | 0.889     | 0.873  | 0.881    | 432     |

| fear     | 0.869     | 0.889  | 0.879    | 387     |

| joy      | 0.901     | 0.927  | 0.914    | 1072    |

| love     | 0.804     | 0.785  | 0.795    | 261     |

| sadness  | 0.937     | 0.929  | 0.933    | 933     |

| surprise | 0.887     | 0.748  | 0.811    | 115     |

### Observations

* **Best performing classes:** `joy` and `sadness` with high precision, recall, and F1-score. The model easily recognizes these due to larger support.

* **Lower performing class:** `surprise` with lower recall (0.748), likely due to fewer examples and overlapping features with other classes.

* **Balanced classes:** `anger` and `fear` show balanced performance (precision ~0.87-0.89, recall ~0.87-0.89).

* **Love:** Moderate performance, possibly due to limited vocabulary and overlap with joy/love sentiments.

---

## Confusion Matrix

```

[[377   7  21   3  24   0]

 [ 11 344   9   2  14   7]

 [ 18   3 994  39  15   3]

 [  2   1  50 205   3   0]

 [ 16  19  24   6 867   1]

 [  0  22   5   0   2  86]]

```

Confusion Matrix with Emotion Labels

```

| Actual \ Predicted | joy | sadness | anger | love | fear | surprise |

| ------------------ | --- | ------- | ----- | ---- | ---- | -------- |

| joy                | 377 | 7       | 21    | 3    | 24   | 0        |

| sadness            | 11  | 344     | 9     | 2    | 14   | 7        |

| anger              | 18  | 3       | 994   | 39   | 15   | 3        |

| love               | 2   | 1       | 50    | 205  | 3    | 0        |

| fear               | 16  | 19      | 24    | 6    | 867  | 1        |

| surprise           | 0   | 22      | 5     | 0    | 2    | 86       |

```

### Interpretation

* **Rows** = Actual emotions

* **Columns** = Predicted emotions

* **Diagonal values** = Correct predictions

* **Off-diagonal values** = Misclassifications

### Example Insights

* 21 `joy` texts misclassified as `anger`

* 22 `surprise` texts misclassified as `sadness`

* Diagonal values indicate how well each class is predicted

---

## Key Takeaways

1. The model is **highly accurate** for common emotions like joy and sadness.

2. Performance drops for **rare or subtle emotions** like surprise and love.

3. Confusion matrix provides insight for **future improvements** such as adding more training examples for underrepresented classes.

4. Overall, the model generalizes well to unseen data with 89.78% validation accuracy.

---

## iOS App Usage

1. Open `EmotionClassifier.xcodeproj` in Xcode 14+

2. Run on simulator/device

3. Enter text → tap **Analyze** → shows predicted emotion & confidence

---

## Tech Stack

* Swift 5 / SwiftUI

* CoreML

* MVVM (async/await, actor)

* Python (Scikit-learn → CoreML)

---

## License

MIT License

---

## Author

* Muhammad Haroon

* Email: mianmharoon72@gmail.com

* LinkedIn: https://www.linkedin.com/in/mian-haroon

---

## References

* [CoreML Tools Documentation](https://coremltools.readme.io/)

* [Emotions Dataset on Kaggle](https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp/data)

* [Scikit-learn Documentation](https://scikit-learn.org/)

* [SwiftUI Documentation](https://developer.apple.com/documentation/swiftui)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mianmharoon/sentimentanalysis_coreml_emotionclassifier

Awesome Lists containing this project

README