Projects in Awesome Lists tagged with serving

https://github.com/ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

data-science deep-learning deployment distributed hyperparameter-optimization hyperparameter-search large-language-models llm llm-inference llm-serving machine-learning optimization parallel python pytorch ray reinforcement-learning rllib serving tensorflow

Last synced: 19 Feb 2026

https://github.com/tensorflow/serving

A flexible, high-performance serving system for machine learning models

cpp deep-learning deep-neural-networks machine-learning ml neural-network python serving tensorflow

Last synced: 12 May 2025

https://github.com/volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

ai batch-systems bigdata gene golang hpc kubernetes machine-learning serving training

Last synced: 31 Jan 2026

https://github.com/seldonio/seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving

Last synced: 14 May 2025

https://github.com/SeldonIO/seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving

Last synced: 27 Mar 2025

https://github.com/ahkarami/deep-learning-in-production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial

Last synced: 14 May 2025

https://github.com/pytorch/serve

Serve, optimize and scale PyTorch models in production

cpu deep-learning docker gpu kubernetes machine-learning metrics mlops optimization pytorch serving

Last synced: 13 May 2025

https://github.com/ahkarami/Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial

Last synced: 14 Mar 2025

https://github.com/Lightning-AI/LitServe

The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.

ai api artificial-intelligence deep-learning developer-tools fastapi rest-api serving web

Last synced: 23 Aug 2025

https://github.com/paddlepaddle/fastdeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8

Last synced: 23 Jan 2026

https://github.com/PaddlePaddle/FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8

Last synced: 20 Mar 2025

https://github.com/georgia-tech-db/evadb

Database system for AI-powered apps

agent ai auto-gpt chatgpt data-analysis database eva gpt-4 gpt4all hacktoberfest huggingface labeling langchain llm object-detection serving video-analytics

Last synced: 14 May 2025

https://github.com/tobegit3hub/tensorflow_template_application

TensorFlow template application for deep learning

cnn csv deep-learning inference libsvm lstm machine-learning mlp serving tensorboard tensorflow tfrecords wide-and-deep

Last synced: 15 May 2025

https://github.com/ray-project/llm-applications

A comprehensive guide to building RAG-based LLM applications for production.

anyscale fine-tuning llama2 llms machine-learning openai ray serving

Last synced: 11 Apr 2025

https://github.com/Delta-ML/delta

DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/

asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation

Last synced: 07 Apr 2025

https://github.com/ray-project/ray-llm

RayLLM - LLMs on Ray

distributed-systems large-language-models llm llm-inference llm-serving llmops ray serving transformers

Last synced: 19 Jul 2026

https://github.com/ray-project/aviary

RayLLM - LLMs on Ray

distributed-systems large-language-models llm llm-inference llm-serving llmops ray serving transformers

Last synced: 12 Jan 2026

https://github.com/paddlepaddle/serving

A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving

Last synced: 15 May 2025

https://github.com/PaddlePaddle/Serving

A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving

Last synced: 20 Mar 2025

https://github.com/tobegit3hub/simple_tensorflow_serving

Generic and easy-to-use serving service for machine learning models

client deep-learning http machine-learning savedmodel serving tensorflow tensorflow-models

Last synced: 25 Oct 2025

https://github.com/openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

ai cloud dag deep-learning edge genai inference kubernetes machine-learning model-serving openvino serving

Last synced: 14 May 2025

https://github.com/underneathall/pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers

Last synced: 08 Oct 2025

https://github.com/polyaxon/haupt

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

bokeh data-processing data-profiling data-science data-visualization deep-learning jupyter lineage machine-learning matplotlib mlops models plotly python pytorch serving tensorflow tracking ui visualization

Last synced: 14 May 2025

https://github.com/vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 09 May 2025

https://github.com/bodywork-ml/bodywork-core

ML pipeline orchestration and model deployments on Kubernetes.

batch cicd continuous-deployment data-science devops framework kubernetes machine-learning mlops orchestration pipeline python serving

Last synced: 19 Apr 2025

https://github.com/vectorch-ai/scalellm

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 14 Apr 2025

https://github.com/Hydrospheredata/hydro-serving

MLOps Platform

machine-learning models pipelines realtime scikit-learn scoring serverless serving spark tensorflow

Last synced: 15 Mar 2025

https://github.com/hydrospheredata/hydro-serving

MLOps Platform

machine-learning models pipelines realtime scikit-learn scoring serverless serving spark tensorflow

Last synced: 16 May 2025

https://github.com/fasterdecoding/bitdelta

llm quantization serving

Last synced: 04 Apr 2025

https://github.com/deepjavalibrary/djl-serving

A universal scalable machine learning model deployment solution

deep-learning deployment djl inference pytorch serving

Last synced: 05 Apr 2025

https://github.com/torchpipe/torchpipe

Serving Inside Pytorch

deployment inference llm-serving pipeline-parallelism pytorch ray serve serving tensorrt torch2trt triton-inference-server

Last synced: 02 Mar 2026

https://github.com/netease-media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 05 Apr 2025

https://github.com/NetEase-Media/grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 04 Nov 2025

https://github.com/krystianity/keras-serving

bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:

cpp docker grpc keras network neuronal nodejs production python serving tensorflow

Last synced: 17 Mar 2025

https://github.com/clearml/clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

ai clearml deep-learning devops kubernetes machine-learning mlops model-serving serving serving-ml serving-pytorch-models tensorflow-serving triton triton-inference-server

Last synced: 17 Jun 2025

https://github.com/lightning-ai/litserve

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

ai api serving

Last synced: 05 Apr 2025

https://github.com/notai-tech/fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket

Last synced: 13 Apr 2025

https://github.com/balavenkatesh3322/model_deployment

A collection of model deployment library and technique.

aws azure caffe data-science deep-learning keras machine-learning model model-deployment model-server model-serving mxnet neural-network pytorch serving serving-pytorch-models serving-recommendation serving-tensors tensorflow

Last synced: 22 Apr 2025

https://github.com/ai-hypercomputer/gpu-recipes

Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

benchmarks distributed-training google-cloud-platform gpu serving

Last synced: 25 Jun 2025

https://github.com/angel-ml/serving

A stand alone industrial serving system for angel.

machine-learning serving serving-recommendation

Last synced: 30 Apr 2025

https://github.com/aws/sagemaker-sparkml-serving-container

This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.

inference inference-pipeline machine-learning mleap mleap-serialized-spark pipeline sagemaker serving spark sparkml

Last synced: 20 Oct 2025

https://github.com/hydrospheredata/spark-ml-serving

Spark ML Lib serving library

inference scoring serving spark

Last synced: 07 Mar 2026

https://github.com/toshi0607/build-your-own-platform-with-knative

Knativeのコンポーネントを理解しながらFaaSプラットフォームをDIYするワークショップです

eventing gcr gke go golang knative knative-lambda-runtimes kubernetes pubsub serverless serving tekton tm watchdog

Last synced: 14 Jan 2026

https://github.com/friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

ai generative-ai gpt gpt3 inference inference-engine inference-server llama2 llm llm-inference llm-ops llm-serving llmops llms mistral ml mlops serving stable-diffusion

Last synced: 05 Apr 2025

https://github.com/deep-diver/lora-deployment

LoRA fine-tuned Stable Diffusion Deployment

generative-ai huggingface-inference-endpoint serving stable-diffusion

Last synced: 05 May 2025

https://github.com/modzy/sdk-python

Python library for Modzy Machine Learning Operations (MLOps) Platform

ai-security api-client deployment docker drift-detection explainable-ai kuberenetes machine-learning machine-learning-operations microservices mlops model-deployment model-serving production-machine-learning python serving

Last synced: 29 Jun 2025

https://github.com/laactechnology/foxcross

AsyncIO serving for data science models

async data-science dataframe http machine-learning pandas python pytorch rest-api scikit-learn serving

Last synced: 14 Mar 2026

https://github.com/daekeun-ml/genai-ko-llm

This hands-on lab walks you through a step-by-step approach to efficiently serving and fine-tuning large-scale Korean models on AWS infrastructure.

fine-tuning genai korean-llm peft sagemaker serving

Last synced: 12 Oct 2025

https://github.com/jeongukjae/lightgbm-serving

A lightweight server for LightGBM

lightgbm ml-serving serving

Last synced: 13 Apr 2025

https://github.com/secretflow/serving

SecretFlow-Serving is a serving system for privacy-preserving machine learning models.

federated-learning machine-learning privacy-preserving secure-multiparty-computation serving serving-ml

Last synced: 01 Feb 2026

https://github.com/galileo-galilei/kedro-serving

A kedro-plugin to serve Kedro Pipelines as API

fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving

Last synced: 12 May 2025

https://github.com/ovh/serving-runtime

Exposes a serialized machine learning model through a HTTP API.

hdf5 inference machine-learning onnx serving tensorflow

Last synced: 08 Apr 2025

https://github.com/feast-dev/feast-java-old

Feast Java Components

featurestore machine-learning metadata serving

Last synced: 12 Apr 2025

https://github.com/teachablehub/python-sdk

Python SDK for the TeachableHub's Machine-Learning Deployment Platform

deployment-automation machine-learning machine-learning-library mlops sdk-python serving teachable teachablehub

Last synced: 07 Jul 2025

https://github.com/yas-sim/openvino-model-server-wrapper

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

ai area-intrusion-detection cloud deep-learning edge grpc grpc-client inference intel line-crossing-detection model-serving object-tracking openvino openvino-docker openvino-model-server python serving tensorflow-serving triton-inference-server

Last synced: 01 Aug 2025

https://github.com/hydrospheredata/hydro-serving-pytorch

Pytorch ONNX model serving runtime

python pytorch serving

Last synced: 09 Nov 2025

https://github.com/assassingq/scsyerp-web

智能仓储 VUE前端

java microservices-architecture serving

Last synced: 27 Jun 2025

https://github.com/zhangjun/tf_serving_client_brpc

tensorflow serving client using brpc

brpc client deep-learning serving tensorflow tensorflow-serving

Last synced: 09 Apr 2025

https://github.com/sbcd90/machine-learning-rest-server

This project implements a common rest server which can serve tensorflow-serving & xgboost models.

machine-learning proxygen rest serving tensorflow xgboost

Last synced: 13 Jun 2025

https://github.com/modzy/sdk-go

The Golang library for Modzy Machine Learning Operations (MLOps) Platform

ai-security api-client api-client-go docker drift-detection explainable-ai golang kubernetes machine-learning-operations microservices mlops model-serving production-machine-learning serving

Last synced: 25 Jan 2026

https://github.com/torchpipe/torchpipe.github.io

Docs for torchpipe: https://github.com/torchpipe/torchpipe

deployment inference pipeline-parallelism pytorch serving tensorrt

Last synced: 02 Mar 2026

https://github.com/tataganesh/tf-serving-cnn-example

Basic example of Tensorflow Serving

model-deployment serving tensorflow-serving

Last synced: 31 Jul 2025

https://github.com/hydrospheredata/hydro-serving-cli

CLI for the Hydrosphere.io project.

cli hydrosphere serving yaml

Last synced: 15 Apr 2025

https://github.com/vjgpt/tensorflow-series

Here you can find how to train Tensorflow ML model on various algorithms and Deploying these model on production.

cnn-keras deployment devops production serving tensorflow tensorflow-models tensorflow-tutorials

Last synced: 29 Jan 2026

https://github.com/kozistr/catboost-server-rs

CatBoost server in Rust + gRPC

catboost grpc machine-learning rust server serving

Last synced: 16 May 2026

https://github.com/jaketae/image-classifier

Image classifier web application based on MobileNet, built using Flask, TensorFlow, and Matplotlib

flask image-classifier matplotlib serving tensorflow

Last synced: 27 Apr 2026

https://github.com/ahmadalsharef994/flask_mistral_mlops_example

Demonstrates how to deploy a Flask-based API for LLM inference using Mistral models, containerized with Docker for MLOps workflows.

docker flask inference llm mistral mlops nlp serving

Last synced: 12 Apr 2026

https://github.com/sauldoescode/transplacer

it reads files & suspends them in memory for performant serving/access

assetcache cache http2 http2-push memory-database push serving static

Last synced: 11 Jan 2026

https://github.com/siri1404/llm-infrastructure

End-to-end platform for serving Large Language Models with streaming capabilities, privacy-preserving audit trails, drift monitoring, and compliance support for SEC, MiFID II, FINRA, and GDPR standards.

ai audit compliance inference kafka llm mlops python serving