https://github.com/nsakki55/aws-mlops-handson
This repository provides a comprehensive ML infrastructure for CTR prediction, focusing on AWS services and offering practical learning experience for MLOps.
https://github.com/nsakki55/aws-mlops-handson
aws ecs mlops
Last synced: about 1 year ago
JSON representation
This repository provides a comprehensive ML infrastructure for CTR prediction, focusing on AWS services and offering practical learning experience for MLOps.
- Host: GitHub
- URL: https://github.com/nsakki55/aws-mlops-handson
- Owner: nsakki55
- Created: 2023-05-13T05:00:01.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-27T13:14:53.000Z (almost 3 years ago)
- Last Synced: 2025-04-15T16:50:22.862Z (about 1 year ago)
- Topics: aws, ecs, mlops
- Language: Python
- Homepage:
- Size: 12.5 MB
- Stars: 62
- Watchers: 3
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AWS MLOps Handson
This repository is designed to provide a comprehensive ML infrastructure for CTR (Click-Through Rate) prediction.
With a focus on AWS services, this repository offer practical learning experience for MLOps.
Slide[japanese]: https://speakerdeck.com/nsakki55/cyberagent-aishi-ye-ben-bu-mlopsyan-xiu-ying-yong-bian
## Key Features
### Python Development Environment
We guide you through setting up a Python development environment that ensures code quality and maintainability.
This environment is carefully configured to enable efficient development practices and facilitate collaboration.
### Train Pipeline
This repository includes the implementation of a training pipeline.
This pipeline covers the stages, including data preprocessing, model training, and evaluation.
### Prediction Server
This repository provides an implementation of a prediction server that serves predictions based on your trained CTR prediction model.
### AWS Deployment
To showcase industry-standard practices, this repository guide you in deploying the training pipeline and inference server on AWS.
## AWS Infra Architecture
AWS Infra Architecture made by this repository.
### ML Pipeline

### Predict Server

## Requirements
| Software | Install (Mac) |
|----------------------------|----------------------------|
| [pyenv](https://github.com/pyenv/pyenv#installation) | `brew install pyenv` |
| [Poetry](https://python-poetry.org/docs/#installation) | curl -sSL https://install.python-poetry.org | python3 - |
| [direnv](https://formulae.brew.sh/formula/direnv) | `brew install direnv` |
| [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli#install-terraform) | `brew install terraform` |
| [Docker](https://docs.docker.com/desktop/install/mac-install/) | install via dmg |
| [awscli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-installjkkkkj.html) | `curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"` |
## Setup
### Install Python Dependencies
Use `pyenv` to install Python 3.9.0 environment
```bash
$ pyenv install 3.9.0
$ pyenv local 3.9.0
```
Use `poetry` to install library dependencies
```bash
$ poetry install
```
### Configure environment variable
Use `direnv` to configure environment variable
```bash
$ cp .env.example .env
$ direnv allow .
```
Set your environment variable setting
```bash
AWS_REGION=
AWS_ACCOUNT_ID=
AWS_PROFILE=
AWS_BUCKET=
AWS_ALB_DNS=
USER_NAME=
VERSION=2023-05-11
MODEL=sgd_classifier_ctr_model
```
### Create AWS Resources
move current directory to `infra`
```bash
$ cd infra
```
Use terraform to create aws resources.
Apply terraform
```bash
$ terraform init
$ terraform apply
```
### Prepare train data
unzip train data
```bash
$ unzip train_data.zip
```
upload train data to S3
```bash
$ aws s3 mv train_data s3://$AWS_BUCKET
```
## Code static analysis tool
| Tool | Usage |
|----------------------------|----------------------------|
| [isort](https://pycqa.github.io/isort/) | library import statement check |
| [black](https://pypi.org/project/black/) | format code style |
| [flake8](https://flake8.pycqa.org/en/latest/) | code quality check |
| [mypy](https://mypy.readthedocs.io/en/stable/) | static type checking |
| [pysen](https://github.com/pfnet/pysen) | manage static analysis tool |
## Usage
Build ML Pipeline
```bash
$ make build-ml
```
Run ML Pipeline
```bash
$ make run-ml
```
Build Predict API
```bash
$ make build-predictor
```
Run Predict API locally
```bash
$ make up
```
Shutdown Predict API locally
```bash
$ make down
```
Run formatter
```bash
$ make format
```
Run linter
```bash
$ make lint
```
Run pytest
```bash
$ make test
```