{"id":29138064,"url":"https://github.com/sgl-project/ome","last_synced_at":"2026-03-17T08:36:00.314Z","repository":{"id":301330214,"uuid":"986553289","full_name":"sgl-project/ome","owner":"sgl-project","description":"Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton","archived":false,"fork":false,"pushed_at":"2026-03-10T18:29:31.000Z","size":25355,"stargazers_count":390,"open_issues_count":79,"forks_count":62,"subscribers_count":13,"default_branch":"main","last_synced_at":"2026-03-11T00:15:21.920Z","etag":null,"topics":["deepseek","k8s","kimi-k2","llama","llm","llm-inference","model-as-a-service","model-serving","multi-node-kubernetes","oracle-cloud","pd-disaggregation","qwen","sglang","vllm"],"latest_commit_sha":null,"homepage":"http://docs.sglang.ai/ome/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sgl-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-19T19:30:46.000Z","updated_at":"2026-03-10T22:22:03.000Z","dependencies_parsed_at":"2025-08-13T01:17:05.427Z","dependency_job_id":"59f0dd07-e5df-4ec7-9405-43147f9ef223","html_url":"https://github.com/sgl-project/ome","commit_stats":null,"previous_names":["sgl-project/ome"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/sgl-project/ome","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgl-project%2Fome","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgl-project%2Fome/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgl-project%2Fome/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgl-project%2Fome/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sgl-project","download_url":"https://codeload.github.com/sgl-project/ome/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sgl-project%2Fome/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30619216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T08:10:05.930Z","status":"ssl_error","status_checked_at":"2026-03-17T08:10:04.972Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepseek","k8s","kimi-k2","llama","llm","llm-inference","model-as-a-service","model-serving","multi-node-kubernetes","oracle-cloud","pd-disaggregation","qwen","sglang","vllm"],"created_at":"2025-06-30T13:04:37.580Z","updated_at":"2026-03-17T08:36:00.309Z","avatar_url":"https://github.com/sgl-project.png","language":"Go","readme":"# OME (Open Model Engine) — Kubernetes Operator for LLM Serving\n\n[![Go Report](https://goreportcard.com/badge/github.com/sgl-project/ome)](https://goreportcard.com/report/github.com/sgl-project/ome)\n[![Latest Release](https://img.shields.io/github/v/release/sgl-project/ome?include_prereleases)](https://github.com/sgl-project/ome/releases/latest)\n[![API Reference](https://img.shields.io/badge/API-v1beta1-blue)](https://sgl-project.github.io/ome/docs/reference/ome.v1beta1/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/sgl-project/ome)\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/sgl-project/ome\"\u003e\n    \u003cimg src=\"site/assets/icons/logo-clear-background.png\" alt=\"OME Logo\" width=\"300\" height=\"auto\" style=\"max-width: 100%;\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n## What is OME?\nOME (Open Model Engine) is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource utilization, and sophisticated deployment patterns.\n\nRead the [documentation](https://sgl-project.github.io/ome/docs/) to learn more about OME capabilities and features.\n\n## Features Overview\n\n- **Model Management:** Models are first-class citizen custom resources in OME. Sophisticated model parsing extracts architecture, parameter count, and capabilities directly from model files. Supports distributed storage with automated repair, double encryption, namespace scoping, and multiple formats (SafeTensors, PyTorch, TensorRT, ONNX). See the [supported models reference](config/models/SUPPORTED_MODELS.md) for a comprehensive list of pre-configured models including Llama, Qwen, DeepSeek, Gemma, Phi, and 80+ other model families.\n\n- **Intelligent Runtime Selection:** Automatic matching of models to optimal runtime configurations through weighted scoring based on architecture, format, quantization, parameter size, and framework compatibility.\n\n- **Optimized Deployments:** Supports multiple deployment patterns including prefill-decode disaggregation, multi-node inference, and traditional Kubernetes deployments with advanced scaling controls.\n\n- **Resource Optimization:** Specialized GPU bin-packing scheduling with dynamic re-optimization to maximize cluster efficiency while ensuring high availability.\n\n- **Runtime Integrations:** First-class support for [**SGLang**](https://github.com/sgl-project/sglang) - the most advanced inference engine with cache-aware load balancing, multi-node deployment, prefill-decode disaggregated serving, multi-LoRA adapter serving, and much more. Also supports [**vLLM**](https://github.com/vllm-project/vllm) for high-throughput inference and Triton for general model inference.\n\n- **Accelerator Management:** Hardware-aware scheduling through AcceleratorClass resources that define GPU capabilities, discovery patterns, and cost information. Enables intelligent accelerator selection with policies like BestFit, Cheapest, or MostCapable.\n\n- **Web Console:** Modern web interface for managing models, serving runtimes, and inference services with real-time updates and HuggingFace model search integration.\n\n- **Kubernetes Ecosystem Integration:** Deep integration with modern Kubernetes components including [Kueue](https://kueue.sigs.k8s.io/) for gang scheduling of multi-pod workloads, [LeaderWorkerSet](https://github.com/kubernetes-sigs/lws) for resilient multi-node deployments, [KEDA](https://keda.sh/) for advanced custom metrics-based autoscaling, [K8s Gateway API](https://gateway-api.sigs.k8s.io/) for sophisticated traffic routing, and [Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) for standardized inference endpoints.\n\n- **Automated Benchmarking:** Built-in performance evaluation through the BenchmarkJob custom resource, supporting configurable traffic patterns, concurrent load testing, and comprehensive result storage. Enables systematic performance comparison across models and service configurations.\n\n## Production Readiness Status\n\n- ✅ API version: v1beta1\n- ✅ Comprehensive [documentation](https://sgl-project.github.io/ome/docs/)\n- ✅ Unit and integration test coverage\n- ✅ Production deployments with large-scale LLM workloads\n- ✅ Monitoring via standard metrics and Kubernetes events\n- ✅ Security: RBAC-based access control and model encryption\n- ✅ High availability mode with redundant model storage\n\n## Installation\n\n**Requires Kubernetes 1.28 or newer**\n\n### Option 1: OCI Registry (Recommended)\n\nInstall OME directly from the OCI registry:\n\n```bash\n# Install OME CRDs\nhelm upgrade --install ome-crd oci://ghcr.io/moirai-internal/charts/ome-crd --namespace ome --create-namespace\n\n# Install OME resources\nhelm upgrade --install ome oci://ghcr.io/moirai-internal/charts/ome-resources --namespace ome\n```\n\n### Option 2: Helm Repository\n\nInstall using the traditional Helm repository:\n\n```bash\n# Add the OME Helm repository\nhelm repo add ome https://sgl-project.github.io/ome\nhelm repo update\n\n# Install OME CRDs first\nhelm upgrade --install ome-crd ome/ome-crd --namespace ome --create-namespace\n\n# Install OME resources\nhelm upgrade --install ome ome/ome-resources --namespace ome\n```\n\n### Option 3: Install from Source\n\nFor development or customization:\n\n```bash\n# Clone the repository\ngit clone https://github.com/sgl-project/ome.git\ncd ome\n\n# Install from local charts\nhelm install ome-crd charts/ome-crd --namespace ome --create-namespace\nhelm install ome charts/ome-resources --namespace ome\n```\n\nRead the [installation guide](https://sgl-project.github.io/ome/docs/installation/) for more options and advanced configurations.\n\nLearn more about:\n- OME [concepts](https://sgl-project.github.io/ome/docs/concepts/)\n- Common [tasks](https://sgl-project.github.io/ome/docs/tasks/)\n\n## Architecture\n\nOME uses a component-based architecture built on Kubernetes custom resources:\n\n- **BaseModel/ClusterBaseModel:** Define model sources and metadata with automatic parsing of architecture, parameters, and capabilities\n- **FineTunedWeight:** Define LoRA adapters and fine-tuned weights that extend base models\n- **ServingRuntime/ClusterServingRuntime:** Define how models are served with runtime-specific configurations\n- **InferenceService:** Connects models to runtimes for deployment with support for prefill-decode disaggregation and multi-node inference\n- **AcceleratorClass:** Define GPU hardware classes with capabilities, discovery patterns, and cost information for intelligent scheduling\n- **BenchmarkJob:** Measures model performance under different workloads with configurable traffic patterns\n\nOME's controller automatically:\n1. Downloads and parses models to understand their characteristics\n2. Selects the optimal runtime configuration for each model\n3. Matches models to appropriate accelerators based on requirements\n4. Generates Kubernetes resources for efficient deployment\n5. Continuously optimizes resource utilization across the cluster\n\n## Roadmap\n\nHigh-level overview of the main priorities:\n\n- Enhanced model parsing for additional model families and architectures\n- Support for model quantization and optimization workflows\n- Federation across multiple Kubernetes clusters\n\n## Community and Support\n\n- [GitHub Issues](https://github.com/sgl-project/ome/issues) for bug reports and feature requests\n- [Documentation](https://sgl-project.github.io/ome/docs/) for guides and reference\n\n## License\n\nOME is licensed under the [Apache License 2.0](LICENSE).","funding_links":[],"categories":["Inference"],"sub_categories":["Inference Platform"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsgl-project%2Fome","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsgl-project%2Fome","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsgl-project%2Fome/lists"}