Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ronaldleung1/reflect-nlp

reflect's backend - determine intent validity
https://github.com/ronaldleung1/reflect-nlp

Last synced: 16 days ago
JSON representation

reflect's backend - determine intent validity

Host: GitHub
URL: https://github.com/ronaldleung1/reflect-nlp
Owner: ronaldleung1
License: agpl-3.0
Created: 2021-08-10T23:08:39.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2021-08-11T01:15:51.000Z (over 3 years ago)
Last Synced: 2023-08-13T23:31:57.326Z (over 1 year ago)
Language: Python
Homepage:
Size: 36.4 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# reflect-nlp

[![CircleCI](https://circleci.com/gh/jackyzha0/reflect-nlp.svg?style=svg)](https://circleci.com/gh/jackyzha0/reflect-nlp) [![](http://godoc.org/github.com/jackyzha0/reflect-nlp/ingress?status.svg)](http://godoc.org/github.com/jackyzha0/reflect-nlp/ingress) [![Go Report Card](https://goreportcard.com/badge/github.com/jackyzha0/reflect-nlp)](https://goreportcard.com/report/github.com/jackyzha0/reflect-nlp)

The backend of reflect which determines intent validity and does stats collection.
[the main repo.](https://github.com/jackyzha0/reflect-chrome)

![K8s cluster](readme_sources/diagram.png)

Anything related to the ingress controller can be found in `/ingress`. All the NLP stuff can be found in `/nlp`.

### Local Docker build instructions
Note: There is no need to do this when deploying as the Docker images will be rebuilt by `CircleCi`.
1. Build ingress proxy image: `docker build -t jzhao2k19/reflect-nlp-ingress:latest ingress`
2. Build NLP model image: `docker build -t jzhao2k19/reflect-nlp:latest nlp`
3. Push both Docker Images to Docker Hub

### Running the K8s deployment locally
First, ensure you have [Docker Desktop](https://www.docker.com/products/docker-desktop) installed. There are also a few other requirements that you should install as well.

1. VirtualBox
You can get this through homebrew by doing `brew install virtualbox`. VirtualBox allows us to run the VMs.
2. kubectl
Pronounced cube-control, kubectl is the command line interface for talking to K8s. Install it by doing `brew install kubectl`.
3. minikube
minikube allows you to run a K8s cluster right on your laptop! Install it by going `brew install minikube`.

Finally, enable some addons for minikube which allow us to configure the horizontal pod autoscalers (HPAs).
`minikube addons enable heapster`
`minikube addons enable metrics-servce`

To spin up the K8s cluster, start minikube then use kubectl to apply our config.
`minikube start`
`kubectl apply -f k8s_local.yml`
To see if this is done successfully, run `kubectl get pods`. It should give you something that looks like the following.

```bash
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
ingress-54f9b89fc4-nxl47 2/2 Running 0 5m
nlp-5f77b4946-vs4l2 1/1 Running 0 5m
```

Note that `ingress` has `2/2` pods as there is a CloudSQL sidecar. Wait a few minutes for the LoadBalancer to be assigned an external IP address, then run `kubectl get svc` to list running services.

```bash
➜ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-service LoadBalancer 10.106.149.250 80:31274/TCP 7m
kubernetes ClusterIP 10.96.0.1 443/TCP 7m
nlp-service NodePort 10.103.174.142 30000:32610/TCP 7m
```

Note that on GKE, the `EXTERNAL-IP` of `ingress-service` would be configured for you. However, if you're developing locally on `minikube`, you can just access the service by doing `minikube service ingress-service`.

You can check to which HPAs are running by doing `kubectl get hpa`.

```bash
➜ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ingress Deployment/ingress 0%/80% 1 3 1 9m
nlp Deployment/nlp 32%/50% 1 10 1 9m
```

To test if everything is healthy, run `minikube service ingress-service` which will open a new browser window. This page should give you a `404` as nothing is listening on `/`. However, if you visit `/healthcheck`, you should see a nice JSON like so:

```json
{
"proxyAlive": true,
"modelAlive": true
}
```

### Exporting intents
Local logged intents can be exported as a CSV by hitting `/export`. This endpoint is rate-limited.

### Running the NLP Model
This project depends on a bunch of Python libraries. Install them by doing `pip install sklearn keras pandas numpy matplotlib`

For local development, you can run the server by doing `python server.py`, which will start a local server on port 5000.

### Training the NLP Model

You can train a new version of the neural network on the `data/survey.csv` data by doing `python train.py`. This will begin training of a basic 64 cell LSTM model (which is defined in `net.py`). You can configure the training parameters which are constants at the top of `train.py`.

```python
TOKENIZER_VOCAB_SIZE = 500 # Vocabulary size of the tokenizer
SEQUENCE_MAX_LENGTH = 75 # Maximum sequence length, all seqs are padded to this
BATCH_SIZE = 128 # number of examples per batch
NUM_EPOCHS = 10 # number of epochs to train for (an epoch is one iteration of the entire dataset)
TRAIN_TEST_SPLIT = 0.1 # percentage of data to use for testing
VALIDATION_SPLIT = 0.1 # percentage of training data to use for validation
```

Trained models are stored in the `models` folder. Each model is under its own folder whose folder structure looks as follows:

### Converting models for use in `tensorflow.js`
Tensorflow.js requires a different format for saved models. We can convert these using the `tensorflowjs_converter` tool. Run `./convert_to_js.sh ` to convert said model into a tensorflow.js usable format. You can find the output in `nlp/converted_models`.

### NLP Model CLI

You can also run the NLP model through the command line (given the model exists) by just providing arguments to `serve_model.py`. Example usage is as follows,

```bash
# e.g.
# serve_model.py -m -t -i

python serve_model.py -i "I need to make a marketing post"
# Predicting using model acc81.08 with threshold 0.50 on intent "I need to make a marketing post"
# Output -> True

python serve_model.py -i "I want to browse memes"
# Predicting using model acc81.08 with threshold 0.50 on intent "I want to browse memes"
# Output -> False
```

### Using different NLP models on the server.

Currently, the server is running a default model of the `acc85.95` model. This is defined in `server.py` as follows,

```python
if __name__ == '__main__':
logging.info("Starting server...")
m = Model("acc85.95", threshold=0.5)
app.run()
```

You may change the model name and threshold however you may see fit.

### Data Usage

All data found in `data/survey.csv` collected from [this survey](http://bit.ly/reflectdata) that our team sent out in January of 2020. You may use this data to train your own models.