{"id":15650151,"url":"https://github.com/graykode/aws-kubeflow","last_synced_at":"2025-04-30T16:45:59.645Z","repository":{"id":110172802,"uuid":"189858489","full_name":"graykode/aws-kubeflow","owner":"graykode","description":"A guideline for basic use and installation of kubeflow in AWS.","archived":false,"fork":false,"pushed_at":"2019-06-02T23:31:34.000Z","size":496,"stargazers_count":38,"open_issues_count":0,"forks_count":10,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-30T17:51:15.775Z","etag":null,"topics":["aws","eks","kubeflow","kubernetes","ml-cloud"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graykode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-02T14:46:49.000Z","updated_at":"2024-07-19T11:24:02.000Z","dependencies_parsed_at":"2023-09-01T10:46:38.519Z","dependency_job_id":null,"html_url":"https://github.com/graykode/aws-kubeflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Faws-kubeflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Faws-kubeflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Faws-kubeflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Faws-kubeflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graykode","download_url":"https://codeload.github.com/graykode/aws-kubeflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251747603,"owners_count":21637404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","eks","kubeflow","kubernetes","ml-cloud"],"created_at":"2024-10-03T12:33:36.707Z","updated_at":"2025-04-30T16:45:59.607Z","avatar_url":"https://github.com/graykode.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## AWS-Kubeflow\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"100\" src=\"https://camo.githubusercontent.com/bd4adfc06b0e349c47f2bae3319544a2e547c796/68747470733a2f2f7777772e6b756265666c6f772e6f72672f696d616765732f6c6f676f2e737667\" /\u003e  \u003cimg width=\"100\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/9/93/Amazon_Web_Services_Logo.svg/300px-Amazon_Web_Services_Logo.svg.png\" /\u003e\u003c/p\u003e\n\n`AWS-Kubeflow` is a guideline for basic use and installation of kubeflow in AWS.\n\n##### What is kubeflow?\n\nKubeflow is a Cloud Native platform for machine learning based on Google’s internal machine learning pipelines. [Quickly get running with your ML Workflow](\u003chttps://www.kubeflow.org/docs/about/kubeflow/\u003e)\n\n\u003e The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.\n\n\n\n## Architecture\n\u003cp align=\"center\"\u003e\u003cimg width=\"700\" src=\"images/architecture.jpg\" /\u003e\u003c/p\u003e\n\n#### Introduce about Requirement for kubeflow\n\n- [eksctl](https://github.com/weaveworks/eksctl) : is a simple CLI tool for creating clusters on EKS - Amazon's new managed Kubernetes service for EC2. It is written in Go, and uses CloudFormation.\n- [kubectl](\u003chttps://github.com/kubernetes/kubectl\u003e) : The Kubernetes command-line tool.\n- [aws-cli]() : AWS Command Line Interface.\n- aws-iam-authenticator\n- [ksonnet](\u003chttps://github.com/ksonnet/ksonnet\u003e) : A CLI-supported framework that streamlines writing and deployment of Kubernetes configurations to multiple clusters. \n- [jq](\u003chttps://stedolan.github.io/jq/download/\u003e) : jq is a lightweight and flexible command-line JSON processor.\n\n\n\n## Install Kubeflow(v0.5.0)\n\nStart with a Ubuntu 16.04 `EC2` for `kubernetes controller` **Should \u003e= c4.xlarge (7.5GB Memory, 20GB \u003e= Storage), Open All TCP Port Inbound for test**.\n\nI recommend `EC2` than docker container, because it is more easy to tunneling with DashBoard.\n\nConnect to your EC2.\n\n1. Install requirements\n\n```shell\n$ sudo su\n\n$ apt update \u0026\u0026 \\\n  apt install python python-pip curl groff vim jq gzip git -y\n  \n# install kubectl\n$ curl -o kubectl https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/kubectl \u0026\u0026 \\\n  chmod +x kubectl \u0026\u0026 \\\n  mv kubectl /usr/bin/\n\n# kubectl version check\n$ kubectl version\nClient Version: version.Info{Major:\"1\", Minor:\"11\", GitVersion:\"v1.11.5\", GitCommit:\"753b2dbc622f5cc417845f0ff8a77f539a4213ea\", GitTreeState:\"clean\", BuildDate:\"2018-12-06T01:33:57Z\", GoVersion:\"go1.10.3\", Compiler:\"gc\", Platform:\"linux/amd64\"}\n\n\n# install aws-iam-authenticator\n$ curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.11.5/2018-12-06/bin/linux/amd64/aws-iam-authenticator \u0026\u0026 \\\n  chmod +x aws-iam-authenticator \u0026\u0026 \\\n  mv aws-iam-authenticator /usr/bin/\n  \n  \n# install awscli\n$ pip install awscli --upgrade\n\n# awscli version check\n$ aws --version\naws-cli/1.16.169 Python/2.7.12 Linux/4.4.0-1083-aws botocore/1.12.159\n\n\n# install eksctl\n$ curl --silent --location \"https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz\" | tar xz -C /tmp \u0026\u0026 \\\n  mv /tmp/eksctl /usr/local/bin\n  \n# eksctl version check\n$ eksctl version\n[ℹ]  version.Info{BuiltAt:\"\", GitCommit:\"\", GitTag:\"0.1.33\"}\n```\n\n\n\n2. AWS IAM key environment variable registration\n\n```shell\n$ export AWS_ACCESS_KEY_ID=\u003cKEY\u003e\n$ export AWS_SECRET_ACCESS_KEY=\u003cKEY\u003e\n```\n\n\n\n3. Elastic Kubernetes Clustering using `eksctl`\n\n```shell\n# create cluster\n$ eksctl create cluster eks-cpu \\\n--node-type=c4.xlarge \\\n--timeout=40m \\\n--nodes=2 \\\n--region=ap-northeast-2\n```\n\n- You should make node \u003e= c4.xlarge.\n- `--node-type`, `--region`, `--nodes` : select node-type, region, number of nodes.\n- It takes a lot of time to make, so drink coffee. :coffee:\n- `eksctl` will setting availability zones, subnets, make nodegroup with EC2 instances, Auto Scaling Group and  Elastic Kubernetes Cluster(EKS), etc.\n\n\n\n4. When the `eks` are complete, check the node using the following command:\n\n```shell\n$ kubectl get nodes \"-o=custom-columns=NAME:.metadata.name,MEMORY:.status.allocatable.memory,CPU:.status.allocatable.cpu,GPU:.status.allocatable.nvidia\\.com/gpu\"\nNAME                                                MEMORY      CPU       GPU\nip-192-168-12-60.ap-northeast-2.compute.internal    7548168Ki   4         \u003cnone\u003e\nip-192-168-55-153.ap-northeast-2.compute.internal   7548172Ki   4         \u003cnone\u003e\n```\n\n\n\n5. (Option) If you used GPU instances\n\n```shell\n$ kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml\n```\n\n\n\n6. Install ksonnet\n\n```shell\n# install ksonnet\n$ wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz \u0026\u0026 \\\n   tar -xvf ks_0.13.1_linux_amd64.tar.gz \u0026\u0026 \\\n   mv ks_0.13.1_linux_amd64/ks /usr/local/bin\n\n# ksonnet version check\n# ksonnet had ended in github, lastest version is 0.13.1\n$ ks version\nksonnet version: 0.13.1\njsonnet version: v0.11.2\nclient-go version: kubernetes-1.10.4\n```\n\n\n\n#### Install Kubeflow\n\n7. Run the following commands to download the latest [kfctl.sh](http://kfctl.sh/)\n\n```shell\n$ export KUBEFLOW_SRC=/tmp/kubeflow-aws\n$ export KUBEFLOW_VERSION=v0.5-branch\n\n$ mkdir -p ${KUBEFLOW_SRC} \u0026\u0026 cd ${KUBEFLOW_SRC}\n$ curl https://raw.githubusercontent.com/graykode/aws-kubeflow/master/kubeflow.sh | bash\n\n$ curl -O https://raw.githubusercontent.com/graykode/aws-kubeflow/master/util.sh \u0026\u0026 \\\n   mv util.sh ${KUBEFLOW_SRC}/scripts/aws/util.sh\n```\n\n\n\n8. We should follow [Initial cluster setup for existing cluster](\u003chttps://www.kubeflow.org/docs/aws/deploy/existing-cluster/\u003e) document.\n\n```shell\n$ export KFAPP=kfapp\n$ export REGION=ap-northeast-2\n$ export AWS_CLUSTER_NAME=eks-cpu\n\n# check your nodegroup role name\n$ aws iam list-roles \\\n    | jq -r \".Roles[] \\\n    | select(.RoleName \\\n    | startswith(\\\"eksctl-$AWS_CLUSTER_NAME\\\") and contains(\\\"NodeInstanceRole\\\")) \\\n    .RoleName\"\n    \neksctl-eks-cpu-nodegroup-ng-11598-NodeInstanceRole-S6OPLB7TW3RR\n\n$ export AWS_NODEGROUP_ROLE_NAMES=eksctl-eks-cpu-nodegroup-ng-11598-NodeInstanceRole-S6OPLB7TW3RR\n```\n\n\n\n9. kfctl.sh init\n\n```shell\n$ cd ${KUBEFLOW_SRC}\n$ ${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform aws \\\n--awsClusterName ${AWS_CLUSTER_NAME} \\\n--awsRegion ${AWS_REGION} \\\n--awsNodegroupRoleNames ${AWS_NODEGROUP_ROLE_NAMES}\n\n$ ls\ndeployment  kfapp  kubeflow  scripts\n```\n\n\n\n10. Generate and apply the Kubernetes changes.\n\n```shell\n$ cd ${KFAPP}\n\n# Generate the Kubernetes changes.\n$ ${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s\n\n# deploly changed kubernetes.\n$ ${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s\n```\n\nFinished install kuberflow!!! :heart_eyes:\n\n\n\nCheck namespace kubeflow pods. Waiting all pods Running finish.\n\n```shell\n$ kubectl get pods -n kubeflow\n```\n\n\n\n##### Tips. When delete kubeflow using kfctl.sh\n\n`${KUBEFLOW_SRC}/scripts/kfctl.sh delete k8s`\n\n\n\n#### Good Tips. Re-connected EKS\n\nIf you would like re-connected EKS(such as reconnected ssh terminal), fellow this.\n\n```shell\n$ sudo su\n$ cd /tmp\n\n$ export AWS_ACCESS_KEY_ID=\u003cKEY\u003e\n$ export AWS_SECRET_ACCESS_KEY=\u003cKEY\u003e\n$ aws eks --region ap-northeast-2 update-kubeconfig --name eks-cpu\n\n# check kubernetes cluster\n$ kubectl get nodes\n```\n\n\n\n## Start Kubeflow DashBoard\n\n```shell\n$ kubectl port-forward -n kubeflow `kubectl get pods -n kubeflow --selector=service=ambassador -o jsonpath='{.items[0].metadata.name}'` 8080:80\n\n# !! ssh tunneling using another terminal\n$ ssh -i your_key.pem ubuntu@server-ip -L 8080:localhost:8080\n```\n\nEnter to [http://127.0.0.1:8080](http://127.0.0.1:8080/).\n\n![](images/kubeflow-dashboard.jpg)\n\n\n\n#### Tips. (Option) Start [Kubernetes Dashboard](\u003chttps://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html\u003e)\n\n```shell\n# Deploy Kubernetes DashBoard\n$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml\n\n# Deploy the heapster to monitor the container cluster and enable performance analysis of the cluster.\n$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml\n\n# Deploy an influxdb backend to the cluster for the heapster\n$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml\n\n# Create Heapster Cluster Role Bindings for Dashboards\n$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml\n\n# Create eks-admin service account and cluster role binding\n$ kubectl apply -f https://raw.githubusercontent.com/graykode/aws-kubeflow/master/eks-admin-service-account.yaml\n\n```\n\n\n\n```shell\n# interlock Dashboard\n$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')\n```\n\nWrite token string to login Kubernetes Dashboard.\n\n\n\n```shell\n# start Dashboard\n$ kubectl proxy\n\n# !! ssh tunneling using another terminal\n$ ssh -i your_key.pem ubuntu@server-ip -L 8001:localhost:8001\n```\n\nEnter to \u003chttp://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login\u003e.\n\n\n\n## Run Example in [kubeflow/example](\u003chttps://github.com/kubeflow/examples\u003e)\n\n### [github_issue_summarization](https://github.com/kubeflow/examples/tree/master/github_issue_summarization)\n\n\n\n#### 1. [NoteBook](https://github.com/kubeflow/examples/blob/master/github_issue_summarization/02_training_the_model.md)\n\nYou can use kubeflow such as **google colaboratory**, Machine Learning Engineer don't know the cloud infrastructure, but they only need to use Jupyter notebook.\n\n1. Enter [NoteBooks](\u003chttp://127.0.0.1:8080/_/notebooks\u003e) Tab.\n\n![](images/kubeflow-notebooks.jpg)\n\n2. New Server: name with `test`. And Connect to Jupyter Notbook.\n\n![](images/kubeflow-notebooks-test.jpg)\n\nSee `kubectl get pods -n kubetl`\n\n```shell\nNAME                                                        READY     STATUS    RESTARTS   AGE\n..\ntest-0                                                      1/1       Running   0          15m\n..\n```\n\n3. News \u003e Terminal\n\n```shell\n$ git clone https://github.com/kubeflow/examples\n\n# install pip package\n$ pip install pandas sklearn ktext matplotlib annoy nltk pydot\n\n$ wget https://raw.githubusercontent.com/graykode/aws-kubeflow/master/Training.ipynb \u0026\u0026 \\\n   mv Training.ipynb examples/github_issue_summarization/notebooks\n```\n\n4. Run [Training.ipynb](\u003chttp://127.0.0.1:8080/notebook/kubeflow/test/notebooks/examples/github_issue_summarization/notebooks/Training.ipynb#\u003e)\n\n\n\n#### 2. [Training the model using TFJob](https://github.com/kubeflow/examples/blob/master/github_issue_summarization/02_training_the_model_tfjob.md)\n\n- TODO\n\n#### 3. [Distributed Training using estimator and TFJob]() \n\n- TODO\n\n\n\n### [Pipeline-dashboard](\u003chttp://127.0.0.1:8080/_/pipeline-dashboard\u003e)\n\n![](images/pipeline-dashboard.jpg)\n\n\n\n1. Use [Sample] ML - TFX - Taxi Tip Prediction Model Trainer\n\n![](images/pipeline.jpg)\n\n\n\n2. Set parameter Setting and run.\n\n![](images/parameter-setting.jpg)\n\n\n\n#### TODO\n\nI will add more example after getting used to kuberflow! 🔨🔨\n\n\n\n## Don't Miss delete eks Cluster after used!!!!\n\n```shell\n$ eksctl delete cluster --name eks-cpu --region ap-northeast-2\n```\n\n\n\n## Author\n\n- Tae Hwan Jung(Jeff Jung) @graykode\n- Author Email : [nlkey2022@gmail.com](mailto:nlkey2022@gmail.com)\n\n\n\n## Reference\n\n- [Install Kubeflow](https://www.kubeflow.org/docs/aws/deploy/install-kubeflow/)\n- [Initial cluster setup for existing cluster](https://www.kubeflow.org/docs/aws/deploy/existing-cluster/)\n- [kubeflow/examples](\u003chttps://github.com/kubeflow/examples/tree/master/github_issue_summarization\u003e)\n- [alicek106 / aws-cli-preset](https://github.com/alicek106/aws-cli-preset)\n- https://swalloow.github.io/eks-kubeflow\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Faws-kubeflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraykode%2Faws-kubeflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Faws-kubeflow/lists"}