https://github.com/kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
https://github.com/kserve/kserve
artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost
Last synced: about 1 month ago
JSON representation
Standardized Serverless ML Inference Platform on Kubernetes
- Host: GitHub
- URL: https://github.com/kserve/kserve
- Owner: kserve
- License: apache-2.0
- Created: 2019-03-27T21:14:14.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2025-05-05T01:41:21.000Z (12 months ago)
- Last Synced: 2025-05-06T17:13:54.861Z (12 months ago)
- Topics: artificial-intelligence, genai, hacktoberfest, istio, k8s, knative, kserve, kubeflow, kubernetes, llm-inference, machine-learning, mlops, model-interpretability, model-serving, pytorch, service-mesh, sklearn, tensorflow, xgboost
- Language: Python
- Homepage: https://kserve.github.io/website/
- Size: 423 MB
- Stars: 4,135
- Watchers: 68
- Forks: 1,169
- Open Issues: 490
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project
- awesome-mlops - KServe
- awesome-llmops - Kserve - square) | (Large Scale Deployment / ML Platforms)
- awesome-kubeflow - KServe
- StarryDivineSky - kserve/kserve
- awesome-production-machine-learning - KServe - KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative ML. (Deployment and Serving)
- awesome-mlops - KFServing (KServe) - based model serving with autoscaling and inference graph support. (Model Deployment)
- awesome-cloud-native - kserve - Standardized Serverless ML Inference Platform on Kubernetes. (AI & Machine Learning Platforms)
- awesome-data-analysis - KServe - Standardized serverless inference platform for deploying and serving machine learning models on Kubernetes. (🚀 MLOps / Tools)
- awesome-opensource-ai - KServe - Kubernetes-based model serving. (📋 Contents / 📊 8. MLOps / LLMOps & Production)
- Awesome-LLMOps - Kserve - commit/kserve/kserve?color=green) (Inference / Inference Platform)
README
# KServe
[](https://pkg.go.dev/github.com/kserve/kserve)
[](https://github.com/kserve/kserve/actions/workflows/go.yml)
[](https://goreportcard.com/report/github.com/kserve/kserve)
[](https://bestpractices.coreinfrastructure.org/projects/6643)
[](https://github.com/kserve/kserve/releases)
[](https://github.com/kserve/kserve/blob/master/LICENSE)
[](https://github.com/kserve/community/blob/main/README.md#questions-and-issues)
[](https://gurubase.io/g/kserve)
KServe is a standardized distributed generative and predictive AI inference platform for scalable, multi-framework deployment on Kubernetes.
KServe is being [used by many organizations](https://kserve.github.io/website/docs/community/adopters) and is a [Cloud Native Computing Foundation (CNCF)](https://www.cncf.io/) incubating project.
For more details, visit the [KServe website](https://kserve.github.io/website/).

### Why KServe?
Single platform that unifies Generative and Predictive AI inference on Kubernetes. Simple enough for quick deployments, yet powerful enough to handle enterprise-scale AI workloads with advanced features.
### Features
**Generative AI**
* 🧮 **Optimized Backends**: Support for vLLM and llm-d for optimized performance for serving LLMs
* 📌 **Standardization**: OpenAI-compatible inference protocol for seamless integration with LLMs
* 🚅 **GPU Acceleration**: High-performance serving with GPU support and optimized memory management for large models
* 💾 **Model Caching**: Intelligent model caching to reduce loading times and improve response latency for frequently used models
* 🗂️ **KV Cache Offloading**: Advanced memory management with KV cache offloading to CPU/disk for handling longer sequences efficiently
* 📈 **Autoscaling**: Request-based autoscaling capabilities optimized for generative workload patterns
* 🔧 **Hugging Face Ready**: Native support for Hugging Face models with streamlined deployment workflows
**Predictive AI**
* 🧮 **Multi-Framework**: Support for TensorFlow, PyTorch, scikit-learn, XGBoost, ONNX, and more
* 🔀 **Intelligent Routing**: Seamless request routing between predictor, transformer, and explainer components with automatic traffic management
* 🔄 **Advanced Deployments**: Canary rollouts, inference pipelines, and ensembles with InferenceGraph
* ⚡ **Autoscaling**: Request-based autoscaling with scale-to-zero for predictive workloads
* 🔍 **Model Explainability**: Built-in support for model explanations and feature attribution to understand prediction reasoning
* 📊 **Advanced Monitoring**: Enables payload logging, outlier detection, adversarial detection, and drift detection
* 💰 **Cost Efficient**: Scale-to-zero on expensive resources when not in use, reducing infrastructure costs
### Learn More
To learn more about KServe, how to use various supported features, and how to participate in the KServe community,
please follow the [KServe website documentation](https://kserve.github.io/website).
Additionally, we have compiled a list of [presentations and demos](https://kserve.github.io/website/docs/community/presentations) to dive through various details.
### :hammer_and_wrench: Installation
#### Standalone Installation
- **[Standard Kubernetes Installation](https://kserve.github.io/website/docs/admin-guide/overview#raw-kubernetes-deployment)**: Compared to Serverless Installation, this is a more **lightweight** installation. However, this option does not support canary deployment and request based autoscaling with scale-to-zero.
- **[Knative Installation](https://kserve.github.io/website/docs/admin-guide/overview#serverless-deployment)**: KServe by default installs Knative for **serverless deployment** for InferenceService.
- **[ModelMesh Installation](https://kserve.github.io/website/docs/admin-guide/overview#modelmesh-deployment)**: You can optionally install ModelMesh to enable **high-scale**, **high-density** and **frequently-changing model serving** use cases.
- **[Quick Installation](https://kserve.github.io/website/docs/getting-started/quickstart-guide)**: Install KServe on your local machine.
#### Kubeflow Installation
KServe is an important addon component of Kubeflow, please learn more from the [Kubeflow KServe documentation](https://www.kubeflow.org/docs/external-add-ons/kserve/kserve). Check out the following guides for running [on AWS](https://awslabs.github.io/kubeflow-manifests/main/docs/component-guides/kserve) or [on OpenShift Container Platform](https://github.com/kserve/kserve/blob/master/docs/OPENSHIFT_GUIDE.md).
### :flight_departure: [Create your first InferenceService](https://kserve.github.io/website/docs/getting-started/genai-first-isvc)
### :bulb: [Roadmap](./ROADMAP.md)
### :blue_book: [InferenceService API Reference](https://kserve.github.io/website/docs/reference/crd-api)
### :toolbox: [Developer Guide](https://kserve.github.io/website/docs/developer-guide)
### :writing_hand: [Contributor Guide](https://kserve.github.io/website/docs/developer-guide/contribution)
### :handshake: [Adopters](https://kserve.github.io/website/docs/community/adopters)
### Star History
[](https://www.star-history.com/#kserve/kserve&Date)
### Contributors
Thanks to all of our amazing contributors!