{"id":19066171,"url":"https://github.com/epfml/kubernetes-setup","last_synced_at":"2025-06-13T07:39:13.944Z","repository":{"id":69549051,"uuid":"158382066","full_name":"epfml/kubernetes-setup","owner":"epfml","description":"MLO group setup for kubernetes cluster","archived":false,"fork":false,"pushed_at":"2020-10-14T08:38:40.000Z","size":21,"stargazers_count":12,"open_issues_count":0,"forks_count":10,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-01-02T14:27:21.381Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epfml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-20T11:54:05.000Z","updated_at":"2023-10-20T12:08:34.000Z","dependencies_parsed_at":"2023-09-16T05:34:08.406Z","dependency_job_id":null,"html_url":"https://github.com/epfml/kubernetes-setup","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fkubernetes-setup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fkubernetes-setup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fkubernetes-setup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fkubernetes-setup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epfml","download_url":"https://codeload.github.com/epfml/kubernetes-setup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240118419,"owners_count":19750491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T00:55:01.034Z","updated_at":"2025-02-22T03:16:20.567Z","avatar_url":"https://github.com/epfml.png","language":"Dockerfile","readme":"Instruction for using the container cluster (Kubernetes, k8s)\n\nYou can refer to [this repository](https://github.com/EPFL-IC/caas) for more information, below are steps for most common setups.\n\n---\n\n## Table of Contents\n- [Requesting access](#requesting-access)\n- [Installing Kubernetes](#installing-kubernetes)\n- [Setting up Kubernetes](#setting-up-kubernetes)\n- [Using Kubernetes](#using-kubernetes)\n- [Note on Storage across icclusters](#note-on-storage-across-icclusters)\n- [Other resources and deployment templates](#other-resources-and-deployment-templates)\n\n## Requesting access\nUse [this form](https://support.epfl.ch/help?id=epfl_sc_cat_item\u0026sys_id=8cd2b9284f1b1b00fe35adee0310c769\u0026sysparm_category=7707db6d4fd94300fe35adee0310c708) to request access (use Accréditation=MLO).\n\nOnce you have been approved, you will get an email from the IC with a zip file named \u003cyour-name\u003e.zip.\nUnzip it and put these files into the `.kube` folder of your home directory. Then rename `\u003cyour-name\u003e.config` to `config`.\n\n```bash\ncd ~\nmkdir .kube\n# Save your-name.zip in the .kube folder\nmv .kube/*.config .kube/config\n```\n\n## Installing the Kubernetes client on your personal machine\n\n### Ubuntu/Debian\n```bash\nsudo apt-get update \u0026\u0026 sudo apt-get install -y apt-transport-https\ncurl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -\necho \"deb https://apt.kubernetes.io/ kubernetes-xenial main\" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list\nsudo apt-get update\nsudo apt-get install -y kubectl\n```\n\nCheck that it is working by running:\n```bash\nkubectl get pods\n```\n\n### MacOS\n```bash\nbrew install kubernetes-cli\n```\n\n### Others\nFollow instructions on [the kubernetes docs](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl).\n\n\n## Setting up Kubernetes\n\nTo use a kubernetes pod, you need to:\n - [Create a Dockerfile to describe the experimental environment](#creating-a-dockerfile)\n - [Build a Docker image from it](#building-a-docker-image)\n - [Push the docker image to ic-registry.epfl.ch/mlo/](#pushing-the-docker-image)\n - [Create a kubernetes config file](#creating-a-kubernetes-config-file)\n\n### Creating a Dockerfile\n\nYou can build your own Dockerfile based on this [basic example](https://github.com/epfml/kubernetes-setup/blob/master/templates/pod-simple/Dockerfile).\\\nYou can specify the software and Python packages you need in there.\n\nTo make sure you can access data on the network storage `mlodata1` and `mloraw`, you should make sure that the main user in your docker image has the same user ID as you have on EPFL's system.\\\nTo achieve this, put your Gaspar ID after `NB_USER=` and your UID after `NB_UID=`.\\\nYou can get you uid by using the `id` command on a an iccluster node.\n\nThe `FROM` line allows you to choose an image to start from. You can choose from images on the [Dockerhub](https://hub.docker.com/) (or elsewhere).\n\n### Building a Docker image\n\nOnce you are happy with the Dockerfile, go to the directory of the Dockerfile and run:\n```bash\ndocker build . -t \u003cyour-tag\u003e\n```\nReplace `\u003cyour-tag\u003e` by the name you want to give to this Docker image.\\\nIt is good practice to put your username first, for example `jaggi-base`.\n\n### Pushing the Docker image\nWhen you start a pod on the Kubernetes cluster, you tell it to run your docker image.\\\nThe cluster will search for your image on EPFLs internal docker repository [Harbor](https://ic-registry.epfl.ch/).\\\nYou should upload your create image to this repository.\n\nGo have a look at https://ic-registry.epfl.ch and use your gaspar credentials to login in. \\\nThere already is a group project named `mlo`. Please ask someone in the lab already using kubernetes to add you to the mlo group so that you can push your Docker image to that repository.\n\nNow take the following steps:\n\n\n#### Login to Harbor on your personal machine\n\nLogin to the server by running the following command\n\n```bash\ndocker login ic-registry.epfl.ch\n```\n\nand enter the credentials:\n\nUsername: `robot$mlo-image-publisher`\nPassword:\n```\neyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1ODg5MjIwNjEsImlhdCI6MTU4NjMzMDA2MSwiaXNzIjoiaGFyYm9yLXRva2VuLWlzc3VlciIsImlkIjozLCJwaWQiOjYsImFjY2VzcyI6W3siUmVzb3VyY2UiOiIvcHJvamVjdC82L3JlcG9zaXRvcnkiLCJBY3Rpb24iOiJwdXNoIiwiRWZmZWN0IjoiIn1dfQ.RI9AhLg94Y1piWKZ4cR-UWOFw39fX3NnuoPM93Ux6T2BR6azMYNUDGSSD8-p17dX82UkaJ4jAYfa19b6e1VudT-YM21QWjbWjWnnnpLfe0wEpL8Y9ddP8kbgfWzhZKxt_RNe8Wl7QjTxQYjp-bdKw1CY1v8NGoltf3nILLYbr8g4RiODQsB-XBCVfCxFUZcQkhy39hq9ckPbEs3jh2HuN7s3IRQGfRAbkQ5DKo1wp967Zkf1LYopF6-W8hZyWq69XzsqixX6UaF8izaZVCGkqPqw1DlgKp6Lropwnb8GT9TjV_kUfX6A6Ju3yEgBtcFEOWCKDYeRlFgPuu1DW3Sy7dnHeDOUNvSeB8ANgY-QesSasQ8LSrFjIcsG9fZ8I_NBCx0CEQCenoRMQHpUqDFZoyFbMaq8zrEO9yshEhHggLoTTd6GHDByxqmWN15dfCZrHHGKmSwW34t5q_a6fsELuAtCmy8j-FvdbB3zQiJF8dj58DKDbIya4R8GdoFq0hOUopZsetUpHAhMwnJ3TRJrJVo7IXzzjT6i5q85qoOHEwPpr0UJHK05zGSXsjoKzTMG26togEnd6GlApuzWpEF21f0eYHib-pkJY1oCcQpobiFKrSwvcYVyjUMaxFMLm1le16Lpk83CEXgstSgSPx_lB1qwMK7zauqpFhrHI-Fc7mQ\n```\n\nThis is a 'one-in-a-lifetime' steps.\n\n#### Actually pushing the Docker image\nTo push an image to a private registry (and not the central Docker registry) you must tag it with the registry hostname.\\\nThen you can push it:\n```bash\ndocker tag \u003cyour-tag\u003e ic-registry.epfl.ch/mlo/\u003cyour-tag\u003e\ndocker push ic-registry.epfl.ch/mlo/\u003cyour-tag\u003e\n```\n\n### Creating a Kubernetes pod config file\nHave a look at (and download) this simple [kubernetes config file](https://github.com/epfml/kubernetes-setup/blob/master/templates/pod-simple/pod-gpu-mlodata.yaml).\nFill all elements that are in \\\u003cbrackets\\\u003e .\\\n`\u003cyour-pod-name\u003e` needs not be the same as `\u003cyour-docker-image-tag\u003e` but again it is good practice to put your name first for the pod name, for example `jaggi-pod`.\n\nIn this config file,\n - you can change: `nvidia.com/gpu: 1` to request more or fewer gpus\n - you can see at the end that mlodata1 is mounted. You can remove it or change it for mloscratch\n - you specify which command is run when launching the pod. Here it will sleep for 60 seconds and then stop\n\n #### Commands\n\n- To have a container run forever, you can use:\n   ```yaml\n   command: [sleep, infinity]\n  ```\n  and then you can [connect to the pod through ssh](#ssh-to-a-pod) and run your jobs from there.\n\n  **If you do this, make sure to [delete the pod](#deleting-a-pod) once you are done to free the resource !**\n\n- To run more complex or multiple commands, you can do:\n  ```yaml\n   command: [\"/bin/bash\", \"-c\"]\n   args: [\"command1; command2 \u0026\u0026 command3\"]\n  ```\n\n  For example:\n  ```yaml\n   command: [\"/bin/bash\", \"-c\"]\n   args: [\"cd /mlodata1/jaggi/ml \u0026\u0026 python automl.py\"]\n  ```\n\n  _The resource will be automatically freed once the command has run. The pod gets status `Completed` but is not deleted._\n\n\n## Using Kubernetes\n### Creating a pod\nGo to the directory where your kubernetes config file is and run:\n```bash\nkubectl create -f \u003cyour-configfile-name\u003e.yaml\n```\n\n### Checking pods status\n```bash\nkubectl get pods  # get all pods\nkubectl get pods -l user=jaggi  # filter by label (defined in the config file)\nkubectl get pod jaggi-pod  # get by pod name\n```\n\n### SSH to a pod\n```bash\nkubectl exec -it jaggi-pod /bin/bash\n```\n\n### Deleting a pod\n```\nkubectl delete pod jaggi-pod\n```\n\n### Getting information on a pod\nUseful for debugging\n```bash\nkubectl describe pod jaggi-pod\nkubectl get pod jaggi-pod -o yaml\nkubectl logs jaggi-pod\n```\n\n## Note on Storage across icclusters\n### (`mounting /mlo-container-scratch`)\nFollow the instructions in `Kubernetes basics`, and use\n```yaml\nvolumeMounts:\n- mountPath: /scratch\n   name: mlo-scratch\n   subPath: YOUR_USERNAME\n```\n\nand\n\n```yaml\nvolumes:\n- name: mlo-scratch\n   persistentVolumeClaim:\n   claimName: mlo-scratch\n```\n### (`mounting /mlodata1`)\n```yaml\nspec:\n  volumes:\n  - name: mlodata1\n    persistentVolumeClaim:\n      claimName: pv-mlodata1\n  containers:\n  - name:  ubuntu\n    volumeMounts:\n    - mountPath: /mlodata1\n      name: mlodata1\n```\n\n\n## Other Resources and deployment templates\nHere you can find some kubernetes templates:\n* [`job` mode](https://github.com/epfml/kubernetes-setup/tree/master/templates/pod-job).\n* [`standalone` mode](https://github.com/epfml/kubernetes-setup/tree/master/templates/pod-standalone).\n* [`cluster` mode](https://github.com/epfml/kubernetes-setup/tree/master/templates/pod-cluster).\n\nAnd some personalized Dockerfile:\n- [A Dockerfile from Thijs](https://github.com/epfml/job-monitor/blob/master/docker/worker/Dockerfile)\n- [A Dockerfile from Tao](https://github.com/IamTao/beta-kubernetes/blob/master/images/base/Dockerfile)\n\nand some more [documentation from Tao's github](https://github.com/IamTao/beta-kubernetes).\n\n## Some Tips (deprecated)\n* By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. This is solved by changing the default user in Docker (which is already done in the simple [Dockerfile](https://github.com/epfml/kubernetes-setup/blob/master/templates/pod-simple/Dockerfile#L32-L45))\n(Here another [example from Tao](https://github.com/IamTao/beta-kubernetes/blob/29515feb07e953bf602339a7548461aeeaa59de2/images/base/Dockerfile#L56-L72))\n* To avoid the error `sudo: no tty present and no askpass program specified`, please use `sudo -S xxx`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fkubernetes-setup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepfml%2Fkubernetes-setup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fkubernetes-setup/lists"}