https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization

Machine Learning Web Application. Helps to visualize a character-by-character breakdown of how sentiment analysis classifies text
https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization

bentoml keras keras-neural-networks lstm lstm-sentiment-analysis machine-learning machine-learning-algorithms sentiment-analysis-visualization visualizations

Last synced: 5 months ago
JSON representation

Machine Learning Web Application. Helps to visualize a character-by-character breakdown of how sentiment analysis classifies text

Host: GitHub
URL: https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization
Owner: MLH-Fellowship
License: mit
Created: 2020-06-02T19:37:21.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2023-03-25T00:24:39.000Z (about 2 years ago)
Last Synced: 2024-05-01T15:34:02.633Z (about 1 year ago)
Topics: bentoml, keras, keras-neural-networks, lstm, lstm-sentiment-analysis, machine-learning, machine-learning-algorithms, sentiment-analysis-visualization, visualizations
Language: Python
Homepage: https://mlh-fellowship.github.io/0.1.2-sentiment-analysis-visualization/
Size: 43.2 MB
Stars: 5
Watchers: 3
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Sentiment Analysis Visualization

  

[![Status](https://img.shields.io/badge/status-active-success.svg)]()

[![GitHub Issues](https://img.shields.io/github/issues/MLH-Fellowship/0.1.2-sentiment-analysis-visualization.svg)](https://github.com/MLH-Fellowship/0.1.2-sentiment-analysis-visualization/issues)

[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/MLH-Fellowship/0.1.2-sentiment-analysis-visualization.svg)](https://github.com/MLH-Fellowship/0.1.2-sentiment-analysis-visualization/pulls)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

-------

## Pod 0.1.2

A web-app that helps to visualize a word-by-word breakdown of how sentiment analysis classifies text

![Frontend View](https://user-images.githubusercontent.com/23178940/83907584-858fbf00-a71a-11ea-8476-7445c0e16ffe.png)

-------

## Major goals

- [x] Research and decide on a machine learning model/architecture

- [x] Pick out 2-3 datasets we can use to train

- [x] Build a training pipeline

- [x] Train and implement the model

- [x] Serve the model using BentoML as an API

- [x] Create a web app to take in input and visualize the output

-------

## Calling the api

Our endpoint is at https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/

Our prediction endpoint can be accessed through making a `POST` request to `https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict`.

```bash

# e.g. 

curl -X POST "https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict" \

     -H "accept: */*" -H "Content-Type: application/json" \

     -d "{\"text\":\"Some example text.\"}"

```

Basically, make sure to set the content type to JSON and send a JSON in the format

```json

{

  "text": "content"

}

```

If successful, you should get a `200 OK` status and a body with something along the lines of `[[0.8614905476570129], [0.7018478512763977], [0.617088258266449]]` where each entry represents the sentiment from 0 (negative) to 1 (positive) of each word.

-------

## Training a new model

Currently, we have only implemented a training pipeline for the IMDB dataset but this is subject to change in the future. You can train a new classifier on the dataset by doing 

```bash

python train.py

```

This will replace the current model in `/model`. `model.json` stores the model architecture, `weights.h5` stores trained weights, and `tokenizer.json` stores word indices.

-------

## Packaging it with bentoML

BentoML helps us to easily serve our Keras model through an API. You can package a new API by running 

```python

python bento_service_packager.py

> ...

> [0.07744759]

> [0.1166597 ]

> [0.18447165]

> [0.20329727]

> [0.24308157]

> [0.25030023]]

> _____

> saved model path: /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2

```

If you'd like to save the packaged API, just copy the contents into `/bento_deploy`

```bash

cp -r /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2/* bento_deploy

# or whatever the autogenerated URI is

```

There are a few dependency nuances to be aware of before building the actual Docker image. To make sure the build doesn't error out, edit `bento_deploy/requirements.txt` is

```pip

tensorflow==2.1.0

sklearn

bentoml==0.7.8

```

Then, we can build and push and run the image as follows

```bash

docker build -t bento-classifier:latest .

docker run -p 5000:5000 bento-classifier:latest

```

Then, visit `localhost:5000` to see the BentoML server!

-------

## Simple deep LSTM architecture

```python

> model.summary()

_________________________________________________________________

Layer (type)                 Output Shape              Param #

=================================================================

embedding (Embedding)        (None, 100, 64)           320000

_________________________________________________________________

lstm (LSTM)                  (None, 100, 64)           33024

_________________________________________________________________

dropout (Dropout)            (None, 100, 64)           0

_________________________________________________________________

lstm_1 (LSTM)                (None, 64)                33024

_________________________________________________________________

FC1 (Dense)                  (None, 256)               16640

_________________________________________________________________

dropout_1 (Dropout)          (None, 256)               0

_________________________________________________________________

out_layer (Dense)            (None, 1)                 257

_________________________________________________________________

activation (Activation)      (None, 1)                 0

=================================================================

Total params: 402,945

Trainable params: 402,945

Non-trainable params: 0

_________________________________________________________________

```

-------

## Data and training process

* 85% / 15% train-test split

* dataset is balanced (25k positive, 25k negative)

* RMSProp with 1e-3 Learning Rate and early stopping with patience of 2 epochs

* preprocessing

  * to lowercase

  * removed punctuation

  * removed `
` tags

  * tokenized with vocab size of 5k

  * max sequence length of 100

* achieved 82.2% accuracy

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization

Awesome Lists containing this project

README