{"id":18223798,"url":"https://github.com/kaito-project/kaito","last_synced_at":"2026-03-16T13:16:59.494Z","repository":{"id":207436651,"uuid":"689172674","full_name":"kaito-project/kaito","owner":"kaito-project","description":"Kubernetes AI Toolchain Operator","archived":false,"fork":false,"pushed_at":"2025-04-02T05:48:37.000Z","size":27399,"stargazers_count":560,"open_issues_count":49,"forks_count":75,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-02T06:16:07.094Z","etag":null,"topics":["ai","gpu","kubernetes","operator"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kaito-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing/readme.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-09T01:53:38.000Z","updated_at":"2025-04-02T03:39:33.000Z","dependencies_parsed_at":"2024-01-29T03:31:43.769Z","dependency_job_id":"9710605a-bfe3-4f2b-a870-04d1165b1d72","html_url":"https://github.com/kaito-project/kaito","commit_stats":null,"previous_names":["azure/kaito","azure/kdm","kaito-project/kaito"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaito-project%2Fkaito","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaito-project%2Fkaito/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaito-project%2Fkaito/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaito-project%2Fkaito/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kaito-project","download_url":"https://codeload.github.com/kaito-project/kaito/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246944377,"owners_count":20858773,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","gpu","kubernetes","operator"],"created_at":"2024-11-04T01:02:17.499Z","updated_at":"2026-03-16T13:16:59.489Z","avatar_url":"https://github.com/kaito-project.png","language":"Go","funding_links":[],"categories":["Go","kubernetes","Serving"],"sub_categories":["Frameworks/Servers for Serving"],"readme":"# Kubernetes AI Toolchain Operator (KAITO)\n\n![GitHub Release](https://img.shields.io/github/v/release/kaito-project/kaito)\n[![Go Report Card](https://goreportcard.com/badge/github.com/kaito-project/kaito)](https://goreportcard.com/report/github.com/kaito-project/kaito)\n![GitHub go.mod Go version](https://img.shields.io/github/go-mod/go-version/kaito-project/kaito)\n[![codecov](https://codecov.io/gh/kaito-project/kaito/graph/badge.svg?token=XAQLLPB2AR)](https://codecov.io/gh/kaito-project/kaito)\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkaito-project%2Fkaito.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkaito-project%2Fkaito?ref=badge_shield)\n\n| ![notification](website/static/img/bell.svg) What is NEW!                                                                                                                                                                                                |\n| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ALL vLLM supported modeled can be run in KAITO now, check the latest [release](https://github.com/kaito-project/kaito/releases). |\n| Latest Release: Feb 26th, 2026. KAITO v0.9.0. |\n| First Release: Nov 15th, 2023. KAITO v0.1.0. |\n\nKAITO is an operator that automates the AI/ML model inference or tuning workload in a Kubernetes cluster.\nThe target models are popular open-sourced large models such as [phi-4](https://huggingface.co/microsoft/phi-4) and [llama](https://huggingface.co/meta-llama).\nKAITO has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures:\n\n- Providing OpenAI-compatible server to perform inference calls.\n- Provide preset configurations to avoid adjusting workload parameters based on GPU hardware.\n- Provide support for popular open-sourced inference runtimes: [vLLM](https://github.com/vllm-project/vllm) and [transformers](https://github.com/huggingface/transformers).\n- Auto-provision GPU nodes based on model requirements.\n- Autoscale the inference workload based on the service monitoring metrics.\n- Leverage local NVMe as the primary storage to store model weight files.\n- Support Gateway API Inference Extension.\n\nUsing KAITO, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.\n\n## Architecture\n\nKAITO follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference or tuning specification. KAITO controllers will automate the deployment by reconciling the `workspace` custom resource.\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"website/static/img/arch.png\" width=80% title=\"KAITO architecture\" alt=\"KAITO architecture\"\u003e\n\u003c/div\u003e\n\nThe above figure presents the KAITO architecture overview. Its major components consist of:\n\n- **Workspace controller**: It reconciles the `workspace` custom resource, creates `NodeClaim` (explained below) custom resources to trigger node auto provisioning, and creates the inference or tuning workload (`deployment`, `statefulset` or `job`) based on the model preset configurations.\n- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [gpu-provisioner helm chart](https://github.com/Azure/gpu-provisioner/tree/main/charts/gpu-provisioner). It uses the `NodeClaim` CRD originated from [Karpenter](https://sigs.k8s.io/karpenter) to interact with the workspace controller. It integrates with Azure Resource Manager REST APIs to add new GPU nodes to the AKS or AKS Arc cluster.\n\u003e Note: The [*gpu-provisioner*](https://github.com/Azure/gpu-provisioner) is an open sourced component. It can be replaced by other controllers if they support [Karpenter-core](https://sigs.k8s.io/karpenter) APIs.\n\n**NEW!** Starting with version v0.5.0, KAITO releases a new operator, **RAGEngine**, which is used to streamline the process of managing a Retrieval Augmented Generation(RAG) service.\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"website/static/img/ragarch.png\" width=80% title=\"KAITO RAGEngine architecture\" alt=\"KAITO RAGEngine architecture\"\u003e\n\u003c/div\u003e\n\nAs illustrated in the above figure, the **RAGEngine controller** reconciles the `ragengine` custom resource and creates a `RAGService` deployment. The `RAGService` provides the following capabilities:\n  - **Orchestration**: use [LlamaIndex](https://github.com/run-llama/llama_index) orchestrator.\n  - **Embedding**: support both local and remote embedding services, to embed queries and documents in the vector database.\n  - **Vector database**: support a built-in [faiss](https://github.com/facebookresearch/faiss) in-memory vector database. Remote vector database support will be added soon.\n  - **Backend inference**: support any OAI compatible inference service.\n\nThe details of the service APIs can be found in this [document](https://kaito-project.github.io/kaito/docs/rag).\n\n\n## Installation\n\n- **Workspace**: Please check the installation guidance [here](https://kaito-project.github.io/kaito/docs/installation) for deployment using helm and [here](./terraform/README.md) for deployment using Terraform.\n- **RAGEngine**: Please check the installation guidance [here](https://kaito-project.github.io/kaito/docs/rag).\n\n## Workspace quick start\n\nAfter installing KAITO, one can try following commands to start a phi-3.5-mini-instruct inference service.\n\n```sh\n$ cat examples/inference/kaito_workspace_phi_3.5-instruct.yaml\napiVersion: kaito.sh/v1beta1\nkind: Workspace\nmetadata:\n  name: workspace-phi-3-5-mini\nresource:\n  instanceType: \"Standard_NC24ads_A100_v4\"\n  labelSelector:\n    matchLabels:\n      apps: phi-3-5\ninference:\n  preset:\n    name: phi-3.5-mini-instruct\n\n$ kubectl apply -f examples/inference/kaito_workspace_phi_3.5-instruct.yaml\n```\n\nThe workspace status can be tracked by running the following command. When the STATE column becomes `Ready`, the model has been deployed successfully.\n\n```sh\n$ kubectl get workspace workspace-phi-3-5-mini\nNAME                     INSTANCE                   RESOURCEREADY   INFERENCEREADY   JOBSTARTED   WORKSPACESUCCEEDED   TARGETNODECOUNT   STATE   AGE\nworkspace-phi-3-5-mini   Standard_NC24ads_A100_v4   True            True                          True                 1                 Ready   24m\n```\n\nNext, one can find the inference service's cluster ip and use a temporal `curl` pod to test the service endpoint in the cluster.\n\n```sh\n# find service endpoint\n$ kubectl get svc workspace-phi-3-5-mini\nNAME                     TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)            AGE\nworkspace-phi-3-5-mini   ClusterIP   \u003cCLUSTERIP\u003e  \u003cnone\u003e        80/TCP,29500/TCP   10m\n$ export CLUSTERIP=$(kubectl get svc workspace-phi-3-5-mini -o jsonpath=\"{.spec.clusterIPs[0]}\")\n\n# find available models\n$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -s  http://$CLUSTERIP/v1/models | jq\n{\n  \"object\": \"list\",\n  \"data\": [\n    {\n      \"id\": \"phi-3.5-mini-instruct\",\n      \"object\": \"model\",\n      \"created\": 1733370094,\n      \"owned_by\": \"vllm\",\n      \"root\": \"/workspace/vllm/weights\",\n      \"parent\": null,\n      \"max_model_len\": 16384\n    }\n  ]\n}\n\n# make an inference call using the model id (phi-3.5-mini-instruct) from previous step\n$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"phi-3.5-mini-instruct\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"What is kubernetes?\"}],\n    \"max_tokens\": 50,\n    \"temperature\": 0\n  }'\n```\n\n## Usage\n\nThe detailed usage for KAITO supported models can be found in [**HERE**](https://kaito-project.github.io/kaito/docs/presets). In case users want to deploy their own containerized models, they can provide the pod template in the `inference` field of the workspace custom resource (please see [API definitions](./api/v1alpha1/workspace_types.go) for details).\n\n\u003e Note: Currently the controller does **NOT** handle automatic model upgrade. It only creates inference workloads based on the preset configurations if the workloads do not exist.\n\nThe number of the supported models in KAITO is growing! Please check [this](https://kaito-project.github.io/kaito/docs/preset-onboarding) document to see how to add a new supported model. Refer to [tuning document](https://kaito-project.github.io/kaito/docs/tuning), [inference document](https://kaito-project.github.io/kaito/docs/inference) , [RAGEngine document](https://kaito-project.github.io/kaito/docs/rag), [LoRA adapters guide](https://kaito-project.github.io/kaito/docs/lora-adapters), and [FAQ](https://kaito-project.github.io/kaito/docs/faq) for more information.\n\n## Contributing\n\n[Read more](https://kaito-project.github.io/kaito/docs/contributing)\n\u003c!-- markdown-link-check-disable --\u003e\nThis project welcomes contributions and suggestions. The contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit [CLAs for CNCF](https://github.com/cncf/cla?tab=readme-ov-file).\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the CLAs for CNCF, please electronically sign the CLA via\nhttps://easycla.lfx.linuxfoundation.org. If you encounter issues, you can submit a ticket with the\nLinux Foundation ID group through the [Linux Foundation Support website](https://jira.linuxfoundation.org/plugins/servlet/desk/portal/4/create/143).\n\n## Get Involved!\n\n- Visit [#KAITO channel in CNCF Slack](https://cloud-native.slack.com/archives/C09B4EWCZ5M) to discuss features in development and proposals.\n- We host a weekly community meeting for contributors on Tuesdays at 4:00pm PST. Please join here: [meeting link](https://zoom-lfx.platform.linuxfoundation.org/meeting/99948431028?password=05912bb9-53fb-4b22-a634-ab5f8261e94c).\n- Reference the weekly meeting notes in our [KAITO community calls doc](https://docs.google.com/document/d/1OEC-WUQ2wn0TDQPsU09shMoXn5cW3dSrdu-M43Q79dA/edit?usp=sharing)!\n\n## License\n\nSee [Apache License 2.0](LICENSE).\n\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkaito-project%2Fkaito.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkaito-project%2Fkaito?ref=badge_large)\n\n## Code of Conduct\n\nKAITO has adopted the [Cloud Native Compute Foundation Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md). For more information see the [KAITO Code of Conduct](CODE_OF_CONDUCT.md).\n\n\u003c!-- markdown-link-check-enable --\u003e\n## Contact\n\n- Please send emails to \"KAITO devs\" \u003ckaito-dev@microsoft.com\u003e for any issues.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaito-project%2Fkaito","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkaito-project%2Fkaito","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaito-project%2Fkaito/lists"}