Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/t04glovern/gpt2-k8s-cloud-run

Containerising PyTorch models in a repeatable way. Deploy OpenAI's GPT-2 model and expose it over a Flask API. Finally deploy it to GCP repositories and publish it on a k8s cluster using Cloud Run.
https://github.com/t04glovern/gpt2-k8s-cloud-run

cloud-run gcp gpt2 k8s k8s-cluster serverless

Last synced: 2 months ago
JSON representation

Containerising PyTorch models in a repeatable way. Deploy OpenAI's GPT-2 model and expose it over a Flask API. Finally deploy it to GCP repositories and publish it on a k8s cluster using Cloud Run.

Host: GitHub
URL: https://github.com/t04glovern/gpt2-k8s-cloud-run
Owner: t04glovern
License: mit
Created: 2019-04-11T13:53:44.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-05-22T21:56:35.000Z (over 1 year ago)
Last Synced: 2024-07-30T17:41:38.088Z (6 months ago)
Topics: cloud-run, gcp, gpt2, k8s, k8s-cluster, serverless
Language: Python
Homepage: https://devopstar.com/2019/04/12/serverless-containers-with-google-cloud-run/
Size: 1.33 MB
Stars: 10
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Cloud Run - GPT2 on k8s

Containerising PyTorch models in a repeatable way. Deploy OpenAI's GPT-2 model and expose it over a Flask API. Finally deploy it to GCP repositories and publish it on a k8s cluster using Cloud Run.

First, before anything else download the model

```bash
mkdir models
curl --output models/gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
```

---

## Local

---

### Local Python

```bash
python3 -m venv ./venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

Then run the python flask server using the following

```bash
cd deployment
python run_server.py
```

### Conda

```bash
conda env create -f=environment.yml --n ml-flask
source activate ml-flask
```

Then run the python flask server using the following

```bash
cd deployment
python run_server.py
```

### docker-compose

#### Setup

```bash
docker-compose up --build flask
```

Go to [http://localhost:5000](http://localhost:5000)

#### Shutdown

```bash
docker-compose down -v
```

---

## GCP

---

First build and push the container to GCR (make sure to update the variables at the to of the script first)

```bash
./container_push.sh
# #!/usr/bin/env bash
# PROJECT_ID=devopstar

# # Set gcloud project
# gcloud config set project $PROJECT_ID

# # Authenticate Docker
# gcloud auth configure-docker

# # Build Container
# docker build -t gpt-2-flask-api .

# # Tag Image for GCR
# docker tag gpt-2-flask-api:latest \
# asia.gcr.io/$PROJECT_ID/gpt-2-flask-api:latest

# # Push to GCR
# docker push asia.gcr.io/$PROJECT_ID/gpt-2-flask-api:latest
```

## Create the cluster

Making use of the following guide: [https://cloud.google.com/run/docs/quickstarts/prebuilt-deploy-gke](https://cloud.google.com/run/docs/quickstarts/prebuilt-deploy-gke)

Run `./gke.sh` after changing the project ID (and region if you want somewhere closer)

```bash
#!/usr/bin/env bash
PROJECT_ID=devopstar
REGION=australia-southeast1

gcloud beta container clusters create "$PROJECT_ID-gpt2-demo" \
--project "$PROJECT_ID" \
--zone "$REGION-a" \
--no-enable-basic-auth \
--cluster-version "1.12.6-gke.10" \
--machine-type "n1-standard-4" \
--image-type "COS" \
--disk-type "pd-standard" \
--disk-size "100" \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/cloud-platform" \
--num-nodes "3" \
--enable-stackdriver-kubernetes \
--enable-ip-alias \
--network "projects/$PROJECT_ID/global/networks/default" \
--subnetwork "projects/$PROJECT_ID/regions/$REGION/subnetworks/default" \
--default-max-pods-per-node "110" \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,Istio,CloudRun \
--istio-config auth=MTLS_PERMISSIVE \
--enable-autoupgrade \
--enable-autorepair
```

Get Istio IP

```bash
kubectl get svc istio-ingressgateway -n istio-system

export GATEWAY_IP=$(kubectl -n istio-system get service \
istio-ingressgateway \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')

echo $GATEWAY_IP

curl -v -H "Host: gpt-2-flask-api.default.example.com" http://$GATEWAY_IP
```

Test the endpoint by using [https://chrome.google.com/webstore/detail/virtual-hosts/aiehidpclglccialeifedhajckcpedom?hl=en](https://chrome.google.com/webstore/detail/virtual-hosts/aiehidpclglccialeifedhajckcpedom?hl=en)

Set the virtual host to `gpt-2-flask-api.default.example.com` and the IP to the one you recieved from the istio gateway.

![Virtual Host](img/virtual-host-01.png)

### Delete Cluster

```bash
gcloud container clusters delete devopstar-gpt2-demo
```

## Deploy to Cloud Run (GKE)

```bash
gcloud beta run deploy \
--image asia.gcr.io/devopstar/gpt-2-flask-api \
--cluster devopstar-gpt2-demo \
--cluster-location australia-southeast1-a

# Service name: (gpt-2-flask-api):
# Deploying container to Cloud Run on GKE service [gpt-2-flask-api] in namespace [default] of cluster [devopstar-gpt2-demo]
# ⠧ Deploying new service... Configuration "gpt-2-flask-api" is waiting for a Revision to become ready.
# ⠧ Creating Revision...
# . Routing traffic...
```

## Deploy to Cloud Run (No GKE)

Make sure to first enable Cloud Run APIs: [https://console.developers.google.com/apis/api/run.googleapis.com](https://console.developers.google.com/apis/api/run.googleapis.com)

```bash
# Set Cloud Run region
gcloud config set run/region us-central1

# Run
gcloud beta run deploy \
--image asia.gcr.io/devopstar/gpt-2-flask-api \
--memory 2Gi

# Service name: (gpt-2-flask-api):
# Deploying container to Cloud Run service [gpt-2-flask-api] in project [devopstar] region [us-central1]
# Allow unauthenticated invocations to new service [gpt-2-flask-api]?
# (y/N)? y

# ✓ Deploying new service... Done.
# ✓ Creating Revision...
# ✓ Routing traffic...
# Done.
# Service [gpt-2-flask-api] revision [gpt-2-flask-api-9eb49475-778f-4f11-8a5c-60d1ed3bd2ff] has been deployed and is serving traffic at https://gpt-2-flask-api-ulobqfivxa-uc.a.run.app
```

Navigate to URL [https://gpt-2-flask-api-ulobqfivxa-uc.a.run.app/](https://gpt-2-flask-api-ulobqfivxa-uc.a.run.app/)

### Issues with Memory

Unfortunately due to [memory limits](https://cloud.google.com/run/docs/configuring/memory-limits) it doesn't look like we can use Cloud Run for this purpose at the moment...

```bash
# 2019-04-11T13:08:14.652058Z 16%|█▌ | 80/512 [04:23<29:04, 4.04s/it]
# 2019-04-11T13:08:18.862046Z 16%|█▌ | 81/512 [04:27<29:07, 4.06s/it]
# 2019-04-11T13:08:22.155164Z 16%|█▌ | 82/512 [04:31<29:24, 4.10s/it]
# 2019-04-11T13:08:26.152569Z 16%|█▌ | 83/512 [04:34<27:35, 3.86s/it]
# 2019-04-11T13:08:30.952013Z 16%|█▋ | 84/512 [04:38<27:49, 3.90s/it]
# 2019-04-11T13:08:34.552013Z 17%|█▋ | 85/512 [04:43<29:40, 4.17s/it]
# 2019-04-11T13:08:39.355836Z 17%|█▋ | 86/512 [04:47<28:23, 4.00s/it]
# 2019-04-11T13:08:43.051604Z 17%|█▋ | 87/512 [04:52<30:02, 4.24s/it]
# 2019-04-11T13:08:46.922621ZPOST504 234 B 300 s Chrome 73 /

upstream request timeout
```

However... It does appear that a [request timeout](https://cloud.google.com/run/docs/configuring/request-timeout) can be set using the following update command. It allows a maximum of 15 minutes to be set.

```bash
# On Update
gcloud beta run services update gpt-2-flask-api \
--timeout=15m

# On Creation
gcloud beta run deploy \
--image asia.gcr.io/devopstar/gpt-2-flask-api \
--memory 2Gi \
--timeout=15m
```

And... No dice

```bash
# 15 minute request only gets halfway

# 2019-04-12T12:59:52.451876Z 55%|█████▌ | 283/512 [14:23<06:42, 1.76s/it]
# 2019-04-12T12:59:57.242055Z 55%|█████▌ | 284/512 [14:26<08:26, 2.22s/it]
# 2019-04-12T13:00:02.343637Z 56%|█████▌ | 285/512 [14:31<11:19, 2.99s/it]
# 2019-04-12T13:00:06.045967Z 56%|█████▌ | 286/512 [14:36<13:39, 3.63s/it]
# 2019-04-12T13:00:09.955038Z 56%|█████▌ | 287/512 [14:40<13:40, 3.65s/it]
# 2019-04-12T13:00:14.548409Z 56%|█████▋ | 288/512 [14:44<13:54, 3.73s/it]
```

### Custom Domain

If you have a custom domain verified already you can also attached a subdomain to the endpoint. Check / verify a domain you own to start with

```bash
gcloud domains verify devopstar.com
```

Map the service to a subdomain

```bash
gcloud beta run domain-mappings create \
--service gpt-2-flask-api \
--domain gpt2.devopstar.com

# Creating......done.
# Mapping successfully created. Waiting for certificate provisioning. You must configure your DNS records for certificate issuance to begin.
# RECORD TYPE CONTENTS
# CNAME ghs.googlehosted.com
```

In your DNS provider, add a CNAME entry for your service

![DNS](img/dns-01.png)

Access your endpoint on the custom domain [https://gpt2.devopstar.com](https://gpt2.devopstar.com)

---

## Bugs

---

- Can't set PORT variable in GUI
- **Error**: (Name cannot be one of reserved names (i.e. GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_REGION, HOME, K_CONFIGURATION, K_REVISION, K_SERVICE, PATH, PORT, PWD and TMPDIR)
- Service needs to be setup on 8080 because of this

---

## Attribution

---

- [WillKoehrsen/recurrent-neural-networks](https://github.com/WillKoehrsen/recurrent-neural-networks)
- [graykode/gpt-2-Pytorch](https://github.com/graykode/gpt-2-Pytorch)
- [Deploying a Keras Deep Learning Model as a Web Application in Python](https://morioh.com/p/bbbc75c00f96/deploying-a-keras-deep-learning-model-as-a-web-application-in-python)