An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with model-serving

A curated list of projects in awesome lists tagged with model-serving .

https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda deepseek gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch qwen rocm tpu trainium transformer xpu

Last synced: 29 Jan 2026

https://github.com/bentoml/bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 12 May 2025

https://github.com/bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 12 Mar 2025

https://github.com/FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 04 Apr 2025

https://github.com/fedml-ai/fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 08 May 2025

https://github.com/modeltc/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 13 May 2025

https://github.com/predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers

Last synced: 12 May 2025

https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning

πŸš€ Awesome System for Machine Learning ⚑️ AI System Papers and Industry Practice. ⚑️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. πŸ—ƒοΈ Llama3, Mistral, etc. πŸ§‘β€πŸ’» Video Tutorials.

ai-infra genai large-language-models llmsys mlsys model-serving model-training

Last synced: 09 Apr 2025

https://github.com/ModelTC/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 20 Mar 2025

https://github.com/beclab/Olares

Olares: An Open-Source Sovereign Cloud OS for Local AI

ai-agents ai-privacy edge-ai home-automation homelab homeserver kubernetes local-ai mcp model-serving nas self-hosted

Last synced: 02 May 2025

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 18 Feb 2026

https://github.com/zhihu/zhilight

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda deepseek-r1 gpt inference-engine llama llm llm-inference llm-serving model-serving pytorch

Last synced: 15 May 2025

https://github.com/alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

gpt inference llama llm llm-serving llmops model-serving

Last synced: 14 Oct 2025

https://github.com/mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts

Last synced: 14 May 2025

https://github.com/bentoml/yatai

Model Deployment at Scale on Kubernetes πŸ¦„οΈ

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 16 May 2025

https://github.com/bentoml/Yatai

Model Deployment at Scale on Kubernetes πŸ¦„οΈ

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 24 Mar 2025

https://github.com/kitops-ml/kitops

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 15 May 2025

https://github.com/vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

ascend inference llm llm-serving llmops mlops model-serving transformer vllm

Last synced: 27 Feb 2026

https://github.com/efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference llama2 llm llm-serving model-serving

Last synced: 21 Apr 2025

https://github.com/efeslab/nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference llama2 llm llm-serving model-serving

Last synced: 16 May 2025

https://github.com/kitops-ml/kitops?tab=readme-ov-file

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 28 Apr 2025

https://github.com/openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINOβ„’

ai cloud dag deep-learning edge genai inference kubernetes machine-learning model-serving openvino serving

Last synced: 14 May 2025

https://github.com/jozu-ai/kitops

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 16 Mar 2025

https://github.com/eightBEC/fastapi-ml-skeleton

FastAPI Skeleton App to serve machine learning models production-ready.

fastapi machine-learning model-serving python python3

Last synced: 15 Mar 2025

https://github.com/bentoml/BentoDiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 17 Aug 2025

https://github.com/bentoml/bentodiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 16 May 2025

https://github.com/ai-hypercomputer/jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 23 Oct 2025

https://github.com/AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 31 Mar 2025

https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial

Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka

akka kafka-streams model-serving

Last synced: 02 May 2025

https://github.com/FederatedAI/FATE-Serving

A scalable, high-performance serving system for federated learning models

federated-learning inference model-serving model-versioning monitor

Last synced: 16 Nov 2025

https://github.com/project-monai/monai-deploy-app-sdk

MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.

ai deep-learning deploy dicom healthcare image-processing machine-learning medical-imaging ml ml-infrastructure ml-platform mlops model-deployment model-serving monai pipeline python pytorch workflow

Last synced: 16 May 2025

https://github.com/aporia-ai/inferencedb

πŸš€ Stream inferences of real-time ML models in production to any data lake (Experimental)

kafka machine-learning mlops model-monitoring model-serving s3

Last synced: 30 Apr 2025

https://github.com/thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

deepseek gpu llm llm-serving model-serving pytorch

Last synced: 17 Mar 2025

https://github.com/alibaba/servegen

A framework for generating realistic LLM serving workloads

deepseek llm llm-serving model-serving qwen

Last synced: 14 Oct 2025

https://github.com/kspviswa/PyOMlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 10 Apr 2025

https://github.com/ai-hypercomputer/jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu

Last synced: 27 Oct 2025

https://github.com/messense/fasttext-serving

fastText model serving service

fasttext model-server model-serving nlp

Last synced: 04 Apr 2025

https://github.com/bentoml/clip-api-service

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip

Last synced: 04 May 2025

https://github.com/bentoml/BentoOCR

Turn any OCR models into online inference API endpoint πŸš€ πŸŒ–

ai-applications model-deployment model-serving ocr ocr-python

Last synced: 04 May 2025

https://github.com/bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more

llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer

Last synced: 04 May 2025

https://github.com/kspviswa/pyomlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 18 Sep 2025

https://github.com/galileo-galilei/kedro-mlflow-tutorial

A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline

kedro kedro-mlflow kedro-tutorial mlflow mlops model-serving

Last synced: 12 May 2025

https://github.com/lightbend/kubeflow-recommender

Kubeflow example of machine learning/model serving

kubeflow machine-learning model-serving

Last synced: 02 May 2025

https://github.com/ml-libs/mlserve

mlserve turns your python models into RESTful API, serves web page with form generated to match your input data.

machine-learning mlserve model-deployment model-serving scikit-learn

Last synced: 28 Jan 2026

https://github.com/yuruofeifei/mms

MXNet Model Serving

model-serving mxnet

Last synced: 17 Apr 2025

https://github.com/a2i2/surround

Surround is a framework for building AI driven microservices in Python, https://surround.readthedocs.io/en/latest/

data-science machine-learning model-serving pipeline-framework python

Last synced: 14 Jan 2026

https://github.com/animator/titus2

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus

Last synced: 22 Aug 2025

https://github.com/bentoml/fraud-detection-model-serving

Online model serving with Fraud Detection model trained with XGBoost on IEEE-CIS dataset

ai-applications fraud-detection model-deployment model-serving

Last synced: 07 Aug 2025

https://github.com/h2oai/mlops-dai-runtimes

Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/

h2o h2oai machine-learning model-deployment model-server model-serving mojo

Last synced: 07 Apr 2025

https://github.com/ksm26/efficiently-serving-llms

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

batch-processing deep-learning-techniques inference-optimization large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization serving-infrastructure text-generation

Last synced: 02 Aug 2025

https://github.com/galileo-galilei/kedro-serving

A kedro-plugin to serve Kedro Pipelines as API

fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving

Last synced: 12 May 2025

https://github.com/bentoml/diffusers-examples

API serving for your diffusers models

bentoml diffusers model-deployment model-serving

Last synced: 22 Jul 2025

https://github.com/adrien-legros/rhods-mnist

Data science pipelines and model serving using Red Hat OpenShift Data Science

data-science model-serving openshift-ai pipelines redhat rhoai rhods

Last synced: 17 Jan 2026

https://github.com/peva3/smarterrouter

SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

ai-cache ai-gateway docker fastapi gpu-monitoring llm llm-proxy llm-router local-llm model-serving ollama ollama-api openai-proxy self-hosted self-hosted-ai semantic-cache

Last synced: 27 Feb 2026

https://github.com/logicalclocks/machine-learning-api

Hopsworks Machine Learning Api πŸš€ Model management with a model registry and model serving

model-registry model-serving

Last synced: 09 Apr 2025

https://github.com/unaidedelf8777/faster-outlines

A Lazy, high throughput and blazing fast structured text generation backend.

ai llama llm llm-serving llmops model-serving performance transformer

Last synced: 27 Jun 2025

https://github.com/rapidrabbit76/fastapi-deep-learning-model-micro-batching-serving

FastAPI pytorch model serving with micro batching

fastapi model-serving pytorch

Last synced: 24 Jun 2025

https://github.com/zerohertz/yolo-serving-cookbook

πŸ“Έ YOLO Serving Cookbook based on Triton Inference Server πŸ“Έ

docker docker-compose fastapi gradio k8s kubernetes mlops model-serving onnx pytorch triton-inference-server yolo yolov5

Last synced: 18 Mar 2025

https://github.com/saivarunk/krypton

Model Server for ML and DL Models built using FastAPI

deep-learning fastapi machine-learning model-serving rest-api

Last synced: 07 Sep 2025

https://github.com/algorithmiaio/algorithmia-modeldeployment-action

Algorithmia Github Action capable of running Jupyter notebooks to create the ML model, uploading the model and updating the algorithm at Algorithmia

algorithmia ci-cd githubaction-workflow githubactions machine-learning model-deployment model-serving

Last synced: 06 Oct 2025

https://github.com/prassanna-ravishankar/modalkit

A powerful Python framework for deploying ML models on Modal with production-ready features

mlops model-serving

Last synced: 05 Sep 2025

https://github.com/amine-akrout/smoker_detection

Smoker Detection deep learning model served via a Web App using TensorFlow, tensorflow-serving, flask and Docker compose

deep-learning docker docker-compose flask inceptionv3 keras model-deployment model-serving tensorflow tesnorflow-serving transfer-learning

Last synced: 24 Mar 2025

https://github.com/mpolinowski/ray-serve-model

Using Ray Serve for ML Model Serving

consensus model-serving python ray

Last synced: 02 Aug 2025

https://github.com/algorithmiaio/githubactions-modeldeployment-demo-algorithmiaalgo

Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an algorithm hosted on Algorithmia

algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost

Last synced: 05 Apr 2025

https://github.com/algorithmiaio/githubactions-modeldeployment-template

Template ML repository to get started with Algorithmia Model Deployment Github Action integration

algorithmia cicd github-actions inference machine-learning model-deployment model-serving

Last synced: 05 Apr 2025

https://github.com/md-emon-hasan/bentoml

BentoML is a high-performance model serving framework it provides various scripts and configurations to help streamline and deployment process.

ai bentoml data-science ml-engineering mlops model-deployment model-serving

Last synced: 02 Mar 2025

https://github.com/jeongahyun/flask-server-main

λ†μž‘λ¬Ό 병해좩 탐지 μ›Ή μ„œλΉ„μŠ€

caffe flask model-serving object-detection pytorch tensorflow

Last synced: 05 Apr 2025

https://github.com/ronylpatil/mlflow-pipeline

Built an E2E MLFlow Pipeline & hosted on AWS.

mlflow-tracking model-registry model-serving

Last synced: 13 Apr 2025