https://github.com/tchaton/sagemaker-pytorch-boilerplate

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/tchaton/sagemaker-pytorch-boilerplate
Owner: tchaton
License: mit
Created: 2020-08-15T20:58:45.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-08-20T09:07:42.000Z (almost 6 years ago)
Last Synced: 2024-12-30T07:51:29.408Z (over 1 year ago)
Language: Jupyter Notebook
Size: 488 KB
Stars: 9
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # sagemaker-pytorch-boilerplate

Production ML, as a field, has matured. It’s increasingly common for companies to have at least one model in production. As more teams deploy models, the conversation around tooling has shifted from “What gets the job done?” to “What does it take to deploy a model at production scale?”

This project is a boilerplate codebase to `train / serve / publish` Pytorch Model using AWS Sagemaker.

We aim at simplifying MLOps worflow by providing a template for production ready development, allowing ML engineer to focus uniquely on their models and datasets. 

We rely on [Hydra](https://hydra.cc) for elegantly configuring our application and [Pytorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/), a lightweight PyTorch wrapper for ML researchers to scale their experiments with less boilerplate.

# How to use this project

This project implements a 1-layer MLP on iris dataset as a baby demo.

```bash

sh build_local_env.sh 3.7.8 # It will create a local env to ease local dev

```

```bash

sh build_and_push.sh {IMAGE_NAME} {MODEL} {DATASET}

# It will build the folder container and push the image to AWS Elastic Container Registry (ECR)

```

# Local development

## Training

Used to make quick dev.

```bash

source .venv/bin/activate

python src/train model={MODEL} dataset={DATASET}

```

or within docker image

Used to make sure the docker image is correcly working

```bash

sh local_test/train_local.sh ${IMAGE_NAME} ${ARGS_1} ${ARGS_2} ${ARGS_3} ...

```

## Local Serving

Terminal 1

```bash 

In:

sh build_and_push.sh {IMAGE_NAME} {MODEL} {DATASET}.

cd local_test

sh serve_local.sh {IMAGE_NAME}

```

``` bash

Out:

Starting the inference server with 4 workers.

[2020-08-19 11:41:31 +0000] [9] [INFO] Starting gunicorn 20.0.4

[2020-08-19 11:41:31 +0000] [9] [INFO] Listening at: unix:/tmp/gunicorn.sock (9)

[2020-08-19 11:41:31 +0000] [9] [INFO] Using worker: gevent

[2020-08-19 11:41:31 +0000] [13] [INFO] Booting worker with pid: 13

[2020-08-19 11:41:31 +0000] [14] [INFO] Booting worker with pid: 14

[2020-08-19 11:41:31 +0000] [15] [INFO] Booting worker with pid: 15

[2020-08-19 11:41:31 +0000] [16] [INFO] Booting worker with pid: 16

```

Terminal 2

```bash 

In:

cd local_test

sh predict.sh {SAMPLE_DATA} # Currently support only 'text/csv'

```

Train on AWS:

Run workflow.ipynb notebook

```

jupyter lab

```

# CAREFUL: Work in progress

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tchaton/sagemaker-pytorch-boilerplate

Awesome Lists containing this project

README