Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization
Machine Learning Web Application. Helps to visualize a character-by-character breakdown of how sentiment analysis classifies text
https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization
bentoml keras keras-neural-networks lstm lstm-sentiment-analysis machine-learning machine-learning-algorithms sentiment-analysis-visualization visualizations
Last synced: about 1 month ago
JSON representation
Machine Learning Web Application. Helps to visualize a character-by-character breakdown of how sentiment analysis classifies text
- Host: GitHub
- URL: https://github.com/mlh-fellowship/0.1.2-sentiment-analysis-visualization
- Owner: MLH-Fellowship
- License: mit
- Created: 2020-06-02T19:37:21.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-25T00:24:39.000Z (almost 2 years ago)
- Last Synced: 2024-05-01T15:34:02.633Z (9 months ago)
- Topics: bentoml, keras, keras-neural-networks, lstm, lstm-sentiment-analysis, machine-learning, machine-learning-algorithms, sentiment-analysis-visualization, visualizations
- Language: Python
- Homepage: https://mlh-fellowship.github.io/0.1.2-sentiment-analysis-visualization/
- Size: 43.2 MB
- Stars: 5
- Watchers: 3
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Analysis Visualization
[![Status](https://img.shields.io/badge/status-active-success.svg)]()
[![GitHub Issues](https://img.shields.io/github/issues/MLH-Fellowship/0.1.2-sentiment-analysis-visualization.svg)](https://github.com/MLH-Fellowship/0.1.2-sentiment-analysis-visualization/issues)
[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/MLH-Fellowship/0.1.2-sentiment-analysis-visualization.svg)](https://github.com/MLH-Fellowship/0.1.2-sentiment-analysis-visualization/pulls)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)-------
## Pod 0.1.2
A web-app that helps to visualize a word-by-word breakdown of how sentiment analysis classifies text![Frontend View](https://user-images.githubusercontent.com/23178940/83907584-858fbf00-a71a-11ea-8476-7445c0e16ffe.png)
-------
## Major goals
- [x] Research and decide on a machine learning model/architecture
- [x] Pick out 2-3 datasets we can use to train
- [x] Build a training pipeline
- [x] Train and implement the model
- [x] Serve the model using BentoML as an API
- [x] Create a web app to take in input and visualize the output-------
## Calling the api
Our endpoint is at https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/
Our prediction endpoint can be accessed through making a `POST` request to `https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict`.```bash
# e.g.
curl -X POST "https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict" \
-H "accept: */*" -H "Content-Type: application/json" \
-d "{\"text\":\"Some example text.\"}"
```Basically, make sure to set the content type to JSON and send a JSON in the format
```json
{
"text": "content"
}
```If successful, you should get a `200 OK` status and a body with something along the lines of `[[0.8614905476570129], [0.7018478512763977], [0.617088258266449]]` where each entry represents the sentiment from 0 (negative) to 1 (positive) of each word.
-------
## Training a new model
Currently, we have only implemented a training pipeline for the IMDB dataset but this is subject to change in the future. You can train a new classifier on the dataset by doing```bash
python train.py
```
This will replace the current model in `/model`. `model.json` stores the model architecture, `weights.h5` stores trained weights, and `tokenizer.json` stores word indices.-------
## Packaging it with bentoML
BentoML helps us to easily serve our Keras model through an API. You can package a new API by running```python
python bento_service_packager.py
> ...
> [0.07744759]
> [0.1166597 ]
> [0.18447165]
> [0.20329727]
> [0.24308157]
> [0.25030023]]
> _____
> saved model path: /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2
```
If you'd like to save the packaged API, just copy the contents into `/bento_deploy````bash
cp -r /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2/* bento_deploy
# or whatever the autogenerated URI is
```There are a few dependency nuances to be aware of before building the actual Docker image. To make sure the build doesn't error out, edit `bento_deploy/requirements.txt` is
```pip
tensorflow==2.1.0
sklearn
bentoml==0.7.8
```Then, we can build and push and run the image as follows
```bash
docker build -t bento-classifier:latest .
docker run -p 5000:5000 bento-classifier:latest
```Then, visit `localhost:5000` to see the BentoML server!
-------
## Simple deep LSTM architecture
```python
> model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 100, 64) 320000
_________________________________________________________________
lstm (LSTM) (None, 100, 64) 33024
_________________________________________________________________
dropout (Dropout) (None, 100, 64) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 64) 33024
_________________________________________________________________
FC1 (Dense) (None, 256) 16640
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
out_layer (Dense) (None, 1) 257
_________________________________________________________________
activation (Activation) (None, 1) 0
=================================================================
Total params: 402,945
Trainable params: 402,945
Non-trainable params: 0
_________________________________________________________________
```-------
## Data and training process
* 85% / 15% train-test split
* dataset is balanced (25k positive, 25k negative)
* RMSProp with 1e-3 Learning Rate and early stopping with patience of 2 epochs
* preprocessing
* to lowercase
* removed punctuation
* removed `
` tags
* tokenized with vocab size of 5k
* max sequence length of 100
* achieved 82.2% accuracy