
An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

A guideline for basic use and installation of kubeflow in AWS.

aws eks kubeflow kubernetes ml-cloud

Last synced: 4 months ago
JSON representation

A guideline for basic use and installation of kubeflow in AWS.

Awesome Lists containing this project



## AWS-Kubeflow

`AWS-Kubeflow` is a guideline for basic use and installation of kubeflow in AWS.

##### What is kubeflow?

Kubeflow is a Cloud Native platform for machine learning based on Google’s internal machine learning pipelines. [Quickly get running with your ML Workflow]()

> The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

## Architecture

#### Introduce about Requirement for kubeflow

- [eksctl]( : is a simple CLI tool for creating clusters on EKS - Amazon's new managed Kubernetes service for EC2. It is written in Go, and uses CloudFormation.
- [kubectl]() : The Kubernetes command-line tool.
- [aws-cli]() : AWS Command Line Interface.
- aws-iam-authenticator
- [ksonnet]() : A CLI-supported framework that streamlines writing and deployment of Kubernetes configurations to multiple clusters.
- [jq]() : jq is a lightweight and flexible command-line JSON processor.

## Install Kubeflow(v0.5.0)

Start with a Ubuntu 16.04 `EC2` for `kubernetes controller` **Should >= c4.xlarge (7.5GB Memory, 20GB >= Storage), Open All TCP Port Inbound for test**.

I recommend `EC2` than docker container, because it is more easy to tunneling with DashBoard.

Connect to your EC2.

1. Install requirements

$ sudo su

$ apt update && \
apt install python python-pip curl groff vim jq gzip git -y

# install kubectl
$ curl -o kubectl && \
chmod +x kubectl && \
mv kubectl /usr/bin/

# kubectl version check
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-12-06T01:33:57Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

# install aws-iam-authenticator
$ curl -o aws-iam-authenticator && \
chmod +x aws-iam-authenticator && \
mv aws-iam-authenticator /usr/bin/

# install awscli
$ pip install awscli --upgrade

# awscli version check
$ aws --version
aws-cli/1.16.169 Python/2.7.12 Linux/4.4.0-1083-aws botocore/1.12.159

# install eksctl
$ curl --silent --location "$(uname -s)_amd64.tar.gz" | tar xz -C /tmp && \
mv /tmp/eksctl /usr/local/bin

# eksctl version check
$ eksctl version
[β„Ή] version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.33"}

2. AWS IAM key environment variable registration


3. Elastic Kubernetes Clustering using `eksctl`

# create cluster
$ eksctl create cluster eks-cpu \
--node-type=c4.xlarge \
--timeout=40m \
--nodes=2 \

- You should make node >= c4.xlarge.
- `--node-type`, `--region`, `--nodes` : select node-type, region, number of nodes.
- It takes a lot of time to make, so drink coffee. :coffee:
- `eksctl` will setting availability zones, subnets, make nodegroup with EC2 instances, Auto Scaling Group and Elastic Kubernetes Cluster(EKS), etc.

4. When the `eks` are complete, check the node using the following command:

$ kubectl get nodes ",MEMORY:.status.allocatable.memory,CPU:.status.allocatable.cpu,GPU:.status.allocatable.nvidia\.com/gpu"
ip-192-168-12-60.ap-northeast-2.compute.internal 7548168Ki 4
ip-192-168-55-153.ap-northeast-2.compute.internal 7548172Ki 4

5. (Option) If you used GPU instances

$ kubectl apply -f

6. Install ksonnet

# install ksonnet
$ wget && \
tar -xvf ks_0.13.1_linux_amd64.tar.gz && \
mv ks_0.13.1_linux_amd64/ks /usr/local/bin

# ksonnet version check
# ksonnet had ended in github, lastest version is 0.13.1
$ ks version
ksonnet version: 0.13.1
jsonnet version: v0.11.2
client-go version: kubernetes-1.10.4

#### Install Kubeflow

7. Run the following commands to download the latest [](

$ export KUBEFLOW_SRC=/tmp/kubeflow-aws
$ export KUBEFLOW_VERSION=v0.5-branch

$ mkdir -p ${KUBEFLOW_SRC} && cd ${KUBEFLOW_SRC}
$ curl | bash

$ curl -O && \
mv ${KUBEFLOW_SRC}/scripts/aws/

8. We should follow [Initial cluster setup for existing cluster]() document.

$ export KFAPP=kfapp
$ export REGION=ap-northeast-2
$ export AWS_CLUSTER_NAME=eks-cpu

# check your nodegroup role name
$ aws iam list-roles \
| jq -r ".Roles[] \
| select(.RoleName \
| startswith(\"eksctl-$AWS_CLUSTER_NAME\") and contains(\"NodeInstanceRole\")) \


$ export AWS_NODEGROUP_ROLE_NAMES=eksctl-eks-cpu-nodegroup-ng-11598-NodeInstanceRole-S6OPLB7TW3RR

9. init

$ ${KUBEFLOW_SRC}/scripts/ init ${KFAPP} --platform aws \
--awsClusterName ${AWS_CLUSTER_NAME} \
--awsRegion ${AWS_REGION} \
--awsNodegroupRoleNames ${AWS_NODEGROUP_ROLE_NAMES}

$ ls
deployment kfapp kubeflow scripts

10. Generate and apply the Kubernetes changes.

$ cd ${KFAPP}

# Generate the Kubernetes changes.
$ ${KUBEFLOW_SRC}/scripts/ generate k8s

# deploly changed kubernetes.
$ ${KUBEFLOW_SRC}/scripts/ apply k8s

Finished install kuberflow!!! :heart_eyes:

Check namespace kubeflow pods. Waiting all pods Running finish.

$ kubectl get pods -n kubeflow

##### Tips. When delete kubeflow using

`${KUBEFLOW_SRC}/scripts/ delete k8s`

#### Good Tips. Re-connected EKS

If you would like re-connected EKS(such as reconnected ssh terminal), fellow this.

$ sudo su
$ cd /tmp

$ aws eks --region ap-northeast-2 update-kubeconfig --name eks-cpu

# check kubernetes cluster
$ kubectl get nodes

## Start Kubeflow DashBoard

$ kubectl port-forward -n kubeflow `kubectl get pods -n kubeflow --selector=service=ambassador -o jsonpath='{.items[0]}'` 8080:80

# !! ssh tunneling using another terminal
$ ssh -i your_key.pem ubuntu@server-ip -L 8080:localhost:8080

Enter to [](


#### Tips. (Option) Start [Kubernetes Dashboard]()

# Deploy Kubernetes DashBoard
$ kubectl apply -f

# Deploy the heapster to monitor the container cluster and enable performance analysis of the cluster.
$ kubectl apply -f

# Deploy an influxdb backend to the cluster for the heapster
$ kubectl apply -f

# Create Heapster Cluster Role Bindings for Dashboards
$ kubectl apply -f

# Create eks-admin service account and cluster role binding
$ kubectl apply -f


# interlock Dashboard
$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')

Write token string to login Kubernetes Dashboard.

# start Dashboard
$ kubectl proxy

# !! ssh tunneling using another terminal
$ ssh -i your_key.pem ubuntu@server-ip -L 8001:localhost:8001

Enter to .

## Run Example in [kubeflow/example]()

### [github_issue_summarization](

#### 1. [NoteBook](

You can use kubeflow such as **google colaboratory**, Machine Learning Engineer don't know the cloud infrastructure, but they only need to use Jupyter notebook.

1. Enter [NoteBooks]() Tab.


2. New Server: name with `test`. And Connect to Jupyter Notbook.


See `kubectl get pods -n kubetl`

test-0 1/1 Running 0 15m

3. News > Terminal

$ git clone

# install pip package
$ pip install pandas sklearn ktext matplotlib annoy nltk pydot

$ wget && \
mv Training.ipynb examples/github_issue_summarization/notebooks

4. Run [Training.ipynb]()

#### 2. [Training the model using TFJob](


#### 3. [Distributed Training using estimator and TFJob]()


### [Pipeline-dashboard]()


1. Use [Sample] ML - TFX - Taxi Tip Prediction Model Trainer


2. Set parameter Setting and run.


#### TODO

I will add more example after getting used to kuberflow! πŸ”¨πŸ”¨

## Don't Miss delete eks Cluster after used!!!!

$ eksctl delete cluster --name eks-cpu --region ap-northeast-2

## Author

- Tae Hwan Jung(Jeff Jung) @graykode
- Author Email : [[email protected]](mailto:[email protected])

## Reference

- [Install Kubeflow](
- [Initial cluster setup for existing cluster](
- [kubeflow/examples]()
- [alicek106 / aws-cli-preset](